Programmatically convert pandas dataframe to markdown table

44,687

Solution 1

Right, so I've taken a leaf from a question suggested by Rohit (Python - Encoding string - Swedish Letters), extended his answer, and came up with the following:

# Enforce UTF-8 encoding
import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('UTF-8')

# SQLite3 database
import sqlite3
# Pandas: Data structures and data analysis tools
import pandas as pd

# Read database, attach as Pandas dataframe
db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()
df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus']

# Parse Dataframe and apply Markdown, then save as 'table.md'
cols = df.columns
df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols)
df3 = pd.concat([df2, df])
df3.to_csv("table.md", sep="|", index=False)

An important precursor to this is that the shortest_sentence and longest_sentence columns do not contain unnecessary line breaks, as removed by applying .replace('\n', ' ').replace('\r', '') to them before submitting into the SQLite database. It appears that the solution is not to enforce the language-specific encoding (ISO-8859-1 for Norwegian), but rather that UTF-8 is used instead of the default ASCII.

I ran this through my IPython notebook (Python 2.7.10) and got a table like the following (fixed spacing for appearance here):

| Path                    | Language | Date       | Shortest Sentence                                                                            | Longest Sentence                                                                                                                                                                                                                                         | Words | Readability Consensus |
|-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|
| data/Eng/Something1.txt | Eng      | 2015-09-17 | I am able to relocate to London on short notice.                                             | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant.       | 306   | 11th and 12th grade   |
| data/Nor/NoeNorrønt.txt | Nor      | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe.                                     | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205   | 18th and 19th grade   |
| data/Nor/Ørret.txt.txt  | Nor      | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160   | 18th and 19th grade   |

Thus, a Markdown table without problems with encoding.

Solution 2

Improving the answer further, for use in IPython Notebook:

def pandas_df_to_markdown_table(df):
    from IPython.display import Markdown, display
    fmt = ['---' for i in range(len(df.columns))]
    df_fmt = pd.DataFrame([fmt], columns=df.columns)
    df_formatted = pd.concat([df_fmt, df])
    display(Markdown(df_formatted.to_csv(sep="|", index=False)))

pandas_df_to_markdown_table(infodf)

Or use tabulate:

pip install tabulate

Examples of use are in the documentation.

Update

As of pandas 1.0 DataFrame to markdown is available. Please see answer from @timvink (docs)

Solution 3

I recommend python-tabulate library for generating ascii-tables. The library supports pandas.DataFrame as well.

Here is how to use it:

from pandas import DataFrame
from tabulate import tabulate

df = DataFrame({
    "weekday": ["monday", "thursday", "wednesday"],
    "temperature": [20, 30, 25],
    "precipitation": [100, 200, 150],
}).set_index("weekday")

print(tabulate(df, tablefmt="pipe", headers="keys"))

Output:

| weekday   |   temperature |   precipitation |
|:----------|--------------:|----------------:|
| monday    |            20 |             100 |
| thursday  |            30 |             200 |
| wednesday |            25 |             150 |

Solution 4

Pandas 1.0 was released 29 january 2020 and supports markdown conversion, so you can now do this directly!

Example taken from the docs:

df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())
|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |

Or without the index:

print(df.to_markdown(index=False)) # use 'showindex' for pandas < 1.1
|   A |   B |
|----:|----:|
|   1 |   1 |
|   2 |   2 |
|   3 |   3 |

Solution 5

Try this out. I got it to work.

See the screenshot of my markdown file converted to HTML at the end of this answer.

import pandas as pd

# You don't need these two lines
# as you already have your DataFrame in memory
df = pd.read_csv("nor.txt", sep="|")
df.drop(df.columns[-1], axis=1)

# Get column names
cols = df.columns

# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)

#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])

#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)

My output in HTML format by converting HTML to Markdown

Share:
44,687
OleVik
Author by

OleVik

Updated on August 15, 2020

Comments

  • OleVik
    OleVik almost 4 years

    I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:

    +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
    | ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
    +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
    | 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
    +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
    | 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
    +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
    

    As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:

    import sys, sqlite3
    
    db = sqlite3.connect("Applications.db")
    df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
    db.close()
    
    rows = []
    for index, row in df.iterrows():
        items = (row['date'], 
                 row['path'], 
                 row['language'], 
                 row['shortest_sentence'],
                 row['longest_sentence'], 
                 row['number_words'], 
                 row['readability_consensus'])
        rows.append(items)
    
    headings = ['Date', 
                'Path', 
                'Language',
                'Shortest Sentence', 
                'Longest Sentence since', 
                'Words',
                'Grade level']
    
    fields = [0, 1, 2, 3, 4, 5, 6]
    align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
             ('^','^'), ('^','^')]
    
    table(sys.stdout, rows, fields, headings, align)
    

    However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128) error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:

    | ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
    |----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
    | 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
    | 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
    
  • OleVik
    OleVik over 8 years
    Relative to my dataframe there were a couple extra separators, but running this: cols = df.columns df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols) df3 = pd.concat([df2, df]) df3.to_csv("nor.md", sep="|", index=False) Still gives a encoding error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128). Also, out of curiosity, would it be problematic that shortest_sentence and longest_sentence may contain line breaks?
  • Rohit
    Rohit over 8 years
    I do not think the line breaks should matter as long as they show up fine in pandas DataFrame. Pandas should be able to take in unicode characters just fine so I am not sure why this does not work. Check out this question: (stackoverflow.com/questions/7315629/…). I do not know how close Swedish is to Norwegian but you may find something helpful.
  • Rohit
    Rohit over 8 years
    If you find an answer to your problem, do answer your own question for posterity.
  • OleVik
    OleVik over 8 years
    Through some further testing I was able to, and using the elegant solution you proposed. Thanks for your help!
  • Alastair McCormack
    Alastair McCormack over 8 years
    Calling sys.setdefaultencoding('UTF-8') is an ultra bad idea. It masks all kind of issues and it looks like your new code doesn't need it as you're not calling table(), which is full of implied encodings.
  • Mithril
    Mithril about 7 years
    Excellent, elegantly render in both editor and preview !
  • Tomasz Sętkowski
    Tomasz Sętkowski over 6 years
    You can do this with tablefmt="pipe". That PR was declined and there is no tablefmt="markdown".
  • Nolan Conaway
    Nolan Conaway over 6 years
    Opening another process seems vulnerable to unforeseen problems but for the simple case this is great.
  • Josiah Yoder
    Josiah Yoder almost 6 years
    To use tabulate, I used print(tabulate.tabulate(df.values,df.columns, tablefmt="pipe"))
  • scottlittle
    scottlittle about 5 years
    I changed last line of function from to_csv to to_clipboard so that I could paste the results easily.
  • abeboparebop
    abeboparebop over 4 years
    I needed to modify this function to add | to the beginning and end of each line in the DataFrame to get proper markdown output.
  • CGFoX
    CGFoX over 4 years
    Any way to remove the index when converting to markdown? Passing index=False didn't work.
  • timvink
    timvink over 4 years
    Yes: df.to_markdown(showindex=False). Pandas uses tabulate, so you can pass tabulate parameters via to_markdown().
  • Carson
    Carson almost 4 years
    FutureWarning: showindex is deprecated. Only 'index' will be used in a future version.
  • timvink
    timvink almost 4 years
    Thanks, indeed futurewarning shows up with pandas>=1.1. Updated answer.
  • yardstick17
    yardstick17 over 3 years
    Useful when usecase is to copy-paste a dataframe in markdown format.
  • Chingiz K.
    Chingiz K. over 2 years
    df.to_markdown() is available now on Pandas which is using tabulate.