Use Multiple Character Delimiter in Python Pandas read_csv

23,728

Solution 1

The solution would be to use read_table instead of read_csv:

1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25

So, we could read this with:

pd.read_table('file.csv', header=None, sep='\*\|\*')

Solution 2

Pandas does now support multi character delimiters

import panda as pd
pd.read_csv(csv_file, sep="\*\|\*")

Solution 3

As Padraic Cunningham writes in the comment above, it's unclear why you want this. The Wiki entry for the CSV Spec states about delimiters:

... separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),

It's unsurprising, that both the csv module and pandas don't support what you're asking.

However, if you really want to do so, you're pretty much down to using Python's string manipulations. The following example shows how to turn the dataframe to a "csv" with $$ separating lines, and %% separating columns.

'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())

Of course, you don't have to turn it into a string like this prior to writing it into a file.

Share:
23,728

Related videos on Youtube

slaw
Author by

slaw

Scientist, Data Wrangler, and Python Lover

Updated on February 22, 2021

Comments

  • slaw
    slaw over 3 years

    It appears that the pandas read_csv function only allows single character delimiters/separators. Is there some way to allow for a string of characters to be used like, "*|*" or "%%" instead?

    • Padraic Cunningham
      Padraic Cunningham almost 9 years
      Why do you want more than one?
    • slaw
      slaw almost 9 years
      Because I have several columns with unformatted text that can contain characters such as "|", "\t", ",", etc. The likelihood of somebody typing "%%" is much lower...
    • deddu
      deddu over 3 years
      Found this in datafiles in the wild because \t was replaced into 4 spaces by some linter.
  • Austin A
    Austin A almost 6 years
    It should be noted that if you specify a multi-char delimiter, the parsing engine will look for your separator in all fields, even if they've been quoted as a text. When the engine finds a delimiter in a quoted field, it will detect a delimiter and you will end up with more fields in that row compared to other rows, breaking the reading process.
  • Trutane
    Trutane over 5 years
    Note that while read_csv() supports multi-char delimiters to_csv does not support multi-character delimiters as of as of Pandas 0.23.4. The original post actually asks about to_csv(). (Side note: including "()" in a link is not supported by Markdown, apparently)
  • Faheem Mitha
    Faheem Mitha over 2 years
    It would be helpful if the poster mentioned which version this functionality was added.
  • Paul Rougieux
    Paul Rougieux about 2 years
    You should specify engine="python" to avoid the ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators.