Pythonic way to import data from multiple files into an array

26,476

Solution 1

"But the problem with this code, is that I can only process data when it's in the for loop. "

Assuming your code works:

# Get folder path containing text files
file_list = glob.glob(source_dir + '/*.TXT')
data = []
for file_path in file_list:
    data.append(
        np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18))
# now you can access it outside the "for loop..."
for d in data:
    print d

Solution 2

Are you looking for an array that is [txt column1, txt column2, filename]?

file_list = glob.glob(source_dir + '/*.TXT') #Get folder path containing text files


for num,file_path in enumerate(file_list):
  data = np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18)
  data = np.vstack((data.T,np.ones(data.shape[0])*num)).T
  if num==0: Output=data
  else: Output=np.vstack((Output,data))

An alternative if you dont want to transpose twice.

  data = np.vstack((data,(np.ones(data.shape[0])*num).reshape(-1,1)))

Solution 3

crude but quick

listFiles=["1.txt","2.txt", ... ,"xxx.txt"]
allData=[]
for file in listFiles:
    lines = open(file,'r').readlines()

    filedata = {}
    filedata['name'] = file
    filedata['rawLines'] = lines
    col1Vals = []
    col2Vals = []
    mapValues = {}

    for line in lines:           
       values = line.split(',')
       col1Vals.append(values[0])
       col2Vals.append(values[1])
       mapValues[values[0]] = values[1]
    filedata['col1'] = col1Vals
    filedata['col2'] = col2Vals
    filedata['map'] = mapValues
    allData.append(filedata)


if you want to get a list of files from a specific directory, take a look at os.walk

Since it's not clear how you would want the data, I've shown numerous ways to store it

allData is a list of dictionaries

to get the 2nd column of data from the 3rd file you'd be able to do allData[2]['col2']

if you wanted the name of the third file alldata[2]['name']

Solution 4

IF all data is of the same shape then just append to a list.

all_data = [] 

and in your loop:

all_data.append(data)

finally you have

asarray(all_data)

which is an array of shape (10,50,2) (transpose if you want). If the shapes don't match, then this does not work though, numpy cannot handle rows of different shapes. Then you might need another loop which creates arrays of the largest shape, and copy your data over.

Share:
26,476
IanRoberts
Author by

IanRoberts

Updated on July 09, 2022

Comments

  • IanRoberts
    IanRoberts almost 2 years

    I'm relatively new to Python and wondering how best to import data from multiple files into a single array. I have quite a few text files containing 50 rows of two columns of data (column delimited) such as:

    Length=10.txt:     
    1, 10    
    2, 30    
    3, 50   
    #etc
    END OF FILE
    

    -

    Length=20.txt
    1, 50.7
    2, 90.9
    3, 10.3
    #etc
    END OF FILE
    

    Let's say I have 10 text files to import and import into a variable called data.

    I'd like to create a single 3D array containing all data. That way, I can easily plot and manipulate the data by referring to the data by data[:,:,n] where n refers to the index of the text file.

    I think the way I'd do this is to have an array of shape (50, 2, 10), but don't know how best to use python to create it. I've thought about using a loop to import each text file as a 2D array, and then stack them to create a 2D array, although couldn't find the appropriate commands to do this (I looked at vstack and column_stack in numpy but these don't seem to add an extra dimension).

    So far, I've written the import code:

        file_list = glob.glob(source_dir + '/*.TXT') #Get folder path containing text files
    
        for file_path in file_list:
          data = np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18)
    

    But the problem with this code, is that I can only process data when it's in the for loop.

    What I really want is an array of all data imported from the text files.

    Any help would be greatly appreciated thanks!

  • IanRoberts
    IanRoberts over 11 years
    Thanks, this is simple and works well - I did try something like this but missed off the 'asarray' command. I didn't realise it wouldn't be an array without it.
  • IanRoberts
    IanRoberts over 11 years
    Thanks, this improves upon cronos's answer.