Trying to understand python csv .next()
Solution 1
The header row is "skipped" as a result of calling next()
. That's how iterators work.
When you loop over an iterator, its next()
method is called each time. Each call advances the iterator. When the for
loop starts, the iterator is already at the second row, and it goes from there on.
Here's the documentation on the next()
method (here's another piece).
What's important is that csv.reader
objects are iterators, just like file object returned by open()
. You can iterate over them, but they don't contain all of the lines (or any of the lines) at any given moment.
Solution 2
The csv.reader
object is an iterator. An iterator is an object with a next()
method that will return the next value available or raise StopIteration
if no value is available. The csv.reader
will returns value line by line.
The iterators objects are how python implements for
loop. At the beginning of the loop, the __iter__
object of the looped over object will be called. It must return an iterator. Then, the next
method of that object will be called and the value stored in the loop variable until the next
method raises StopIteration
exception.
In your example, by adding a call to next before using the variable in the for
loop construction, you are removing the first value from the stream of values returned by the iterator.
You can see the same effect with simpler iterators:
iterator = [0, 1, 2, 3, 4, 5].__iter__()
value = iterator.next()
for v in iterator:
print v,
1 2 3 4 5
print value
0
Solution 3
The csv.reader is an iterator. Calling .next() will obtain the next value as it iterates through the file.
In the below code the for loop is calling .next() on the iterator each time and allocating the result of next to the variable row.
for row in csv_file_object:
data.append(row)
Solution 4
csv.reader is an iterator. It reads a line from the csv every time that .next is called. Here's the documentation: http://docs.python.org/2/library/csv.html. An iterator object can actually return values from a source that is too big to read all at once. using a for loop with an iterator effectively calls .next on each time through the loop.
Related videos on Youtube
davidheller
Software product manager, Python hobbyist. Really enjoy data visualization and web scraping. Usually not in that order.
Updated on July 09, 2022Comments
-
davidheller almost 2 years
I have the following code that is part of a tutorial
import csv as csv import numpy as np csv_file_object = csv.reader(open("train.csv", 'rb')) header = csv_file_object.next() data = [] for row in csv_file_object: data.append(row) data = np.array(data)
the code works as it is supposed to but it is not clear to me why calling
.next()
on the file with the variableheader
works. Isn't csv_file_object still the entire file? How does the program know to skip the header row whenfor row in csv_file_object
is called since it doesn't appear the variable header is ever referenced once defined? -
davidheller over 11 yearsso by declaring
header
I effectively call .next(). I just tested it and if I don't make 'csv_file_object.next()' a variable it still works. For some reason because it was written as a variable I couldn't see it. I think i got it now. Thanks! -
Lev Levitsky over 11 years@muchosalsa Yes, what matters is that the
next()
method is called. -
davidheller over 11 yearsthanks - @lev was just a little faster with the response. I had looked at the docs but it just wasn't clear to me until I removed the variable.
-
Peter Wooster over 11 yearsyou should give the answer checkmark to the fastest response, but you should upvote all helpful responses.