Avoid to write '\n' to the last line of a file in python
Solution 1
This should be a simple solution:
for item in record[:-1]:
output_pass.write("%s\n" % item)
output_pass.write("%s" % record[-1])
Using join
is not recommended if you said the file was large - it will create the entire file content string in memory.
Solution 2
This requires constant additional memory:
for i, item in enumerate(record):
if i>0:
output_pass.write('\n')
output_pass.write('%s' %item)
Solution 3
do you try with some counter? like:
record = [str(x) for x in range(10)]
print record
import sys
output_pass=sys.stdout
counter = 0
while counter != (len(record))-1:
output_pass.write("%s\n" % record[counter])
counter += 1
Solution 4
You can join
them first and then write
as in
item = '\n'.join(record)
output_pass.write('%s' %item)
Note
If your list, i.e. record
doesn't contain strings, then as Martinaeu has mentioned you will have to map it to a str
that is, '\n'.join(map(str, record))
before you write to file. (In py2)
Solution 5
The following would write all but the last item in record
with newlines very quickly and then the final one without it. It will do so without requiring much additional memory.
(For Python 3 use range
instead of xrange
)
item = iter(record)
for _ in xrange(len(record)-1):
output_pass.write('%s\n' % next(item))
output_pass.write('%s' % next(item))
Zewei Song
Updated on June 09, 2022Comments
-
Zewei Song almost 2 years
I'm writing multiple lines to a new file (could be up to several GB), like this:
for item in record: output_pass.write('%s\n' %item)
However, I got a blank line due to the '\n' of my last record, such as:
Start of the file
record111111 reocrd222222 record333333 ---a blank line---
End of a file
Since my file is large, I would not want to read the file again. So, is there an easy way to prevent this, or easy way to remove the last '\n' from the file?
My solution:
Thanks for all the help!
I think I will not load the entire file to the memeory, since it may get very huge.
I actually solve this by first write the first record, then write the rest line in a loop. I put '\n' in the front so it won't appear on the last line.
But Jonathan is right. I actually have now problem with the '\n' in the last line, majorly it is my OCD.
Here is my code:
rec_first = parser_fastq.next() #This is just an iterator of my file output.write('%s' %('>'+rec_first[0].strip('@'))) output.write('\n%s' %(rec_first[1])) #I put '\n' in the front count = 1 #Write the rest of lines for rec_fastq in parser_fastq: output.write('\n%s' %('>'+rec_fastq[0].strip('@'))) output.write('\n%s' %(rec_fastq[1])) count += 1 print 'Extracting %ith record in %s ...' %(count, fastq_name) + '\b'*100, output.close()
print '\n%i records were wrote to %s' % (count, fasta_name)
-
Matteo Italia over 9 yearsAre you sure that it's really a problem? Actually, most text-based tools (e.g. most Unix utils) expect to have a newline at the end of the file (i.e. the newline is intended as a line terminator, not as a separator).
-
martineau over 9 yearsDo you really want all those other blank lines between items in your output file? It looks the each is ending up with two
'\n'
characters. -
martineau over 9 yearsIs the file huge because a single
record
has that much data in it, or are you processing many records that could total up to a size that big? The answer to that will likely affect what answer is truly the best for your needs.
-
-
Matteo Italia over 9 yearsOP is talking about a multi-gigabyte file, in that case this is definitely a bad idea (it creates the whole string in memory first).
-
Bhargav Rao over 9 years@MatteoItalia Thanks. Do inform me if it is completely wrong, so that I can delete it
-
Matteo Italia over 9 yearsIt's not wrong per se, but in this case it would become a performance nightmare (not only you are creating the whole string in memory, but the useless
'%s' % item
is going to create yet another copy of it). -
Matteo Italia over 9 yearsHow is this supposed to work? Even assuming that
record
is a list (it may not be, from the code it just looks like an enumerable object) your code just prints it backwards skipping the last element, always leaving the newline anyway. ideone.com/UvNCsJ -
Bhargav Rao over 9 yearsThis prints the contents of the file in reverse order. So you will need to write another code to do the parse in the other direction ;)
-
Bhargav Rao over 9 yearsupv for educating as to why
join
is not recommended -
yhoyo over 9 years@MatteoItalia yes, Sorry... I do not know when I was thinking; take your code and "rearrange"
-
yhoyo over 9 years@BhargavRao Now the code work fine without the last line :P ideone.com/h8mfa5
-
myaut over 9 yearsFor lists slice expression [:] creates a copy of that list, so you waste memory too.
-
martineau over 9 years@Bhargav: Whether avoiding
join
is necessary or not currently isn't clear from the question since we don't really know how many items there might be in arecord
nor how big the string representation might of each might be. Like so many questions here, the parameters of the problem are unclear. -
Bhargav Rao over 9 years@martineau Sir, thus using
join
is also not wrong? -
martineau over 9 years@Bhargav: My point was that it might work OK, but we don't have any way of knowing whether it would or not without additional information from the OP.
-
Bhargav Rao over 9 yearsThanks for your help Sir, I undeleted my answer hoping some stray programmer might find it useful. Thanks again
-
martineau over 9 yearsBhargav (& @Matteo): This probably won't work, not because
record
might be huge (although it could be and that might be an issue), but since it's probably not a sequence of strings (but again we don't know that for sure), which is what thejoin()
method requires its first argument to be -- so, as written, the most the likely result would beTypeError: sequence item 0: expected string, xxx found
. That could be easily fixed by usingmap(str, record)
assumingrecord
isn't prohibitively large (and converting each of its items to strings isn't either). -
Bhargav Rao over 9 years@martineau Thank you Sir. I have edited the answer to include your thoughts. Thanks again
-
martineau over 9 yearsEr, I was thinking more along the lines of
'\n'.join(map(str, record))
because, among other reasons you would be permanently clobbering the value ofrecord
with something only needed temporarily. Actuallyoutput_pass.write('\n'.join(map(str, record)))
might be even better. Also, please stop calling me "Sir", such formalities aren't needed here.;-)
-
martineau over 9 yearsP.S.
r'\n'
would be wrong and whether to do this has nothing to with py2. -
Bhargav Rao over 9 years@martineau Thank you ... (I call you Sir as a mark of respect and not formality ;) ... We have been taught to address every one with a title snce our young age. Apart from that I am too young to talk directly )
-
Bhargav Rao over 9 yearsThat
r
was a remnant from the previousrecord
-
loretoparisi about 5 yearsThis does not work for
io.read or io.write: TypeError: '_io.TextIOWrapper' object has no attribute '__getitem__'