Use Multiple Character Delimiter in Python Pandas read_csv
Solution 1
The solution would be to use read_table instead of read_csv:
1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25
So, we could read this with:
pd.read_table('file.csv', header=None, sep='\*\|\*')
Solution 2
Pandas does now support multi character delimiters
import panda as pd
pd.read_csv(csv_file, sep="\*\|\*")
Solution 3
As Padraic Cunningham writes in the comment above, it's unclear why you want this. The Wiki entry for the CSV Spec states about delimiters:
... separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),
It's unsurprising, that both the csv
module and pandas
don't support what you're asking.
However, if you really want to do so, you're pretty much down to using Python's string manipulations. The following example shows how to turn the dataframe to a "csv" with $$
separating lines, and %%
separating columns.
'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())
Of course, you don't have to turn it into a string like this prior to writing it into a file.
Related videos on Youtube
Comments
-
slaw over 3 years
It appears that the pandas read_csv function only allows single character delimiters/separators. Is there some way to allow for a string of characters to be used like, "*|*" or "%%" instead?
-
Padraic Cunningham almost 9 yearsWhy do you want more than one?
-
slaw almost 9 yearsBecause I have several columns with unformatted text that can contain characters such as "|", "\t", ",", etc. The likelihood of somebody typing "%%" is much lower...
-
deddu over 3 yearsFound this in datafiles in the wild because
\t
was replaced into 4 spaces by some linter.
-
-
Austin A almost 6 yearsIt should be noted that if you specify a multi-char delimiter, the parsing engine will look for your separator in all fields, even if they've been quoted as a text. When the engine finds a delimiter in a quoted field, it will detect a delimiter and you will end up with more fields in that row compared to other rows, breaking the reading process.
-
Trutane over 5 yearsNote that while read_csv() supports multi-char delimiters to_csv does not support multi-character delimiters as of as of Pandas 0.23.4. The original post actually asks about to_csv(). (Side note: including "()" in a link is not supported by Markdown, apparently)
-
Faheem Mitha over 2 yearsIt would be helpful if the poster mentioned which version this functionality was added.
-
Paul Rougieux about 2 yearsYou should specify
engine="python"
to avoid theParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators
.