Programmatically convert pandas dataframe to markdown table
Solution 1
Right, so I've taken a leaf from a question suggested by Rohit (Python - Encoding string - Swedish Letters), extended his answer, and came up with the following:
# Enforce UTF-8 encoding
import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('UTF-8')
# SQLite3 database
import sqlite3
# Pandas: Data structures and data analysis tools
import pandas as pd
# Read database, attach as Pandas dataframe
db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()
df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus']
# Parse Dataframe and apply Markdown, then save as 'table.md'
cols = df.columns
df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols)
df3 = pd.concat([df2, df])
df3.to_csv("table.md", sep="|", index=False)
An important precursor to this is that the shortest_sentence
and longest_sentence
columns do not contain unnecessary line breaks, as removed by applying .replace('\n', ' ').replace('\r', '')
to them before submitting into the SQLite database. It appears that the solution is not to enforce the language-specific encoding (ISO-8859-1
for Norwegian), but rather that UTF-8
is used instead of the default ASCII
.
I ran this through my IPython notebook (Python 2.7.10) and got a table like the following (fixed spacing for appearance here):
| Path | Language | Date | Shortest Sentence | Longest Sentence | Words | Readability Consensus |
|-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|
| data/Eng/Something1.txt | Eng | 2015-09-17 | I am able to relocate to London on short notice. | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant. | 306 | 11th and 12th grade |
| data/Nor/NoeNorrønt.txt | Nor | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205 | 18th and 19th grade |
| data/Nor/Ørret.txt.txt | Nor | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160 | 18th and 19th grade |
Thus, a Markdown table without problems with encoding.
Solution 2
Improving the answer further, for use in IPython Notebook:
def pandas_df_to_markdown_table(df):
from IPython.display import Markdown, display
fmt = ['---' for i in range(len(df.columns))]
df_fmt = pd.DataFrame([fmt], columns=df.columns)
df_formatted = pd.concat([df_fmt, df])
display(Markdown(df_formatted.to_csv(sep="|", index=False)))
pandas_df_to_markdown_table(infodf)
Or use tabulate:
pip install tabulate
Examples of use are in the documentation.
Update
As of pandas 1.0 DataFrame to markdown is available. Please see answer from @timvink (docs)
Solution 3
I recommend python-tabulate library for generating ascii-tables. The library supports pandas.DataFrame
as well.
Here is how to use it:
from pandas import DataFrame
from tabulate import tabulate
df = DataFrame({
"weekday": ["monday", "thursday", "wednesday"],
"temperature": [20, 30, 25],
"precipitation": [100, 200, 150],
}).set_index("weekday")
print(tabulate(df, tablefmt="pipe", headers="keys"))
Output:
| weekday | temperature | precipitation |
|:----------|--------------:|----------------:|
| monday | 20 | 100 |
| thursday | 30 | 200 |
| wednesday | 25 | 150 |
Solution 4
Pandas 1.0 was released 29 january 2020 and supports markdown conversion, so you can now do this directly!
Example taken from the docs:
df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())
| | A | B |
|:---|----:|----:|
| a | 1 | 1 |
| a | 2 | 2 |
| b | 3 | 3 |
Or without the index:
print(df.to_markdown(index=False)) # use 'showindex' for pandas < 1.1
| A | B |
|----:|----:|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
Solution 5
Try this out. I got it to work.
See the screenshot of my markdown file converted to HTML at the end of this answer.
import pandas as pd
# You don't need these two lines
# as you already have your DataFrame in memory
df = pd.read_csv("nor.txt", sep="|")
df.drop(df.columns[-1], axis=1)
# Get column names
cols = df.columns
# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)
#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])
#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)
![OleVik](https://i.stack.imgur.com/cGIgR.jpg?s=256&g=1)
OleVik
Updated on August 15, 2020Comments
-
OleVik almost 4 years
I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:
import sys, sqlite3 db = sqlite3.connect("Applications.db") df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db) db.close() rows = [] for index, row in df.iterrows(): items = (row['date'], row['path'], row['language'], row['shortest_sentence'], row['longest_sentence'], row['number_words'], row['readability_consensus']) rows.append(items) headings = ['Date', 'Path', 'Language', 'Shortest Sentence', 'Longest Sentence since', 'Words', 'Grade level'] fields = [0, 1, 2, 3, 4, 5, 6] align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'), ('^','^'), ('^','^')] table(sys.stdout, rows, fields, headings, align)
However, this yields an
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)
error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:| ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus | |----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------| | 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade | | 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade |
-
OleVik over 8 yearsRelative to my dataframe there were a couple extra separators, but running this:
cols = df.columns df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols) df3 = pd.concat([df2, df]) df3.to_csv("nor.md", sep="|", index=False)
Still gives a encoding error:UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)
. Also, out of curiosity, would it be problematic thatshortest_sentence
andlongest_sentence
may contain line breaks? -
Rohit over 8 yearsI do not think the line breaks should matter as long as they show up fine in pandas DataFrame. Pandas should be able to take in unicode characters just fine so I am not sure why this does not work. Check out this question: (stackoverflow.com/questions/7315629/…). I do not know how close Swedish is to Norwegian but you may find something helpful.
-
Rohit over 8 yearsIf you find an answer to your problem, do answer your own question for posterity.
-
OleVik over 8 yearsThrough some further testing I was able to, and using the elegant solution you proposed. Thanks for your help!
-
Alastair McCormack over 8 yearsCalling
sys.setdefaultencoding('UTF-8')
is an ultra bad idea. It masks all kind of issues and it looks like your new code doesn't need it as you're not callingtable()
, which is full of implied encodings. -
Mithril about 7 yearsExcellent, elegantly render in both editor and preview !
-
Tomasz Sętkowski over 6 yearsYou can do this with
tablefmt="pipe"
. That PR was declined and there is notablefmt="markdown"
. -
Nolan Conaway over 6 yearsOpening another process seems vulnerable to unforeseen problems but for the simple case this is great.
-
Josiah Yoder almost 6 yearsTo use tabulate, I used
print(tabulate.tabulate(df.values,df.columns, tablefmt="pipe"))
-
scottlittle about 5 yearsI changed last line of function from
to_csv
toto_clipboard
so that I could paste the results easily. -
abeboparebop over 4 yearsI needed to modify this function to add
|
to the beginning and end of each line in the DataFrame to get proper markdown output. -
CGFoX over 4 yearsAny way to remove the index when converting to markdown? Passing
index=False
didn't work. -
timvink over 4 yearsYes:
df.to_markdown(showindex=False)
. Pandas uses tabulate, so you can pass tabulate parameters via to_markdown(). -
Carson almost 4 yearsFutureWarning:
showindex
is deprecated. Only 'index' will be used in a future version. -
timvink almost 4 yearsThanks, indeed futurewarning shows up with
pandas>=1.1
. Updated answer. -
yardstick17 over 3 yearsUseful when usecase is to copy-paste a dataframe in markdown format.
-
Chingiz K. over 2 years
df.to_markdown()
is available now on Pandas which is usingtabulate
.