How to assign columns of data to variables

31,655

Solution 1

Pandas is in fact the right solution here. The issue is that in order to robustly handle something where you aren't certain of the underlying structure there's a lot of edge cases you have to watch out for, and trying to shoe-horn it into the csv module is a recipe for headaches (though it can be done)

As far as why you can't import pandas the reason is that it doesn't come with python by default. One of the most important things to consider when picking up a language is the ecosystem of packages it gives you access to. Python happens to be one of the best in the respect, so to ignore everything that's not a part of standard python is to ignore the best part of the language.

If you're on a windows environment you should start by getting conda set up. This will allow you to seamlessly explore many of the packages available to python users with little overhead. This includes pandas, which is in fact the right way to handle this problem. See this link for more info on installing conda: http://conda.pydata.org/docs/install/quick.html

Once you're got pandas installed it's as easy as this:

import pandas
test = pandas.read_csv(<your_file>)
your_Variable = test[<column_header>]

Easy as that.

If you really, really don't want to use things that aren't in core python then you can do this with something like what follows, but you haven't given enough detail for an actual solution:

def col_var(input_file, delimiter):
    # get each line into a variable
    rows = open(input_file).read().splitlines()

    # split each row into entries
    split_rows = [row.split(delimiter) for row in rows]

    # Re-orient your list
    columns = zip(*split_rows)  

The least intuitive piece of this is the last line, so here's a little example showing you how it works:

>>> test = [[1,2], [3,4]]
>>> zip(*test)
[(1, 3), (2, 4)]

Solution 2

Well, you can use the csv module provided there is some kind of delimiter within the rows that sets the columns appart.

import csv

file_to_read_from = 'myFile.txt'

#initializing as many lists as the columns you want (not all)
col1, col2, col3 = [], [], []
with open(file_to_read_from, 'r') as file_in:
    reader = csv.reader(file_in, delimiter=';') #might as well be ',', '\t' etc
    for row in reader:
        col1.append(row[0]) # assuming col 1 in the file is one of the 3 you want
        col2.append(row[3]) # assuming col 4 in the file is one of the 3 you want
        col3.append(row[5]) # assuming col 6 in the file is one of the 3 you want
Share:
31,655
evtoh
Author by

evtoh

I write code for astrophysics. Mostly matplotlib.

Updated on June 11, 2020

Comments

  • evtoh
    evtoh almost 4 years

    I'm writing a general program to read and plot large amounts of data from .txt files. Each file has a different number of columns. I do know that each file has 8 columns that I'm not interested in, so I can figure out the number of relevant columns that way. How can I read the data and sort each relevant column's data into a separate variable?

    This is what I have so far:

    datafile = 'plotspecies.txt'
    with open(datafile) as file:
        reader = csv.reader(file, delimiter=' ', skipinitialspace=True)
        first_row = next(reader)
        num_cols = len(first_row)
        rows = csv.reader(file, delimiter = ' ', quotechar = '"')
        data = [data for data in rows]
    
    num_species = num_cols - 8
    

    I've seen people say that pandas is good for this sort of thing, but I can't seem to import it. I'd prefer a solution without it.