Way to convert dbf to csv in python?

16,407

Solution 1

Looking online, there's a few options:


With simpledbf:

dbf = Dbf5('fake_file_name.dbf')
df = dbf.to_dataframe()

Tweaked from the gist:

import pysal as ps

def dbf2DF(dbfile, upper=True):
    "Read dbf file and return pandas DataFrame"
    with ps.open(dbfile) as db:  # I suspect just using open will work too
        df = pd.DataFrame({col: db.by_col(col) for col in db.header})
        if upper == True: 
           df.columns = map(str.upper, db.header) 
        return df

Solution 2

Here is my solution that I've been using for years. I have a solution for Python 2.7 and one for Python 3.5 (probably also 3.6).

Python 2.7:

import csv
from dbfpy import dbf

def dbf_to_csv(out_table):#Input a dbf, output a csv
    csv_fn = out_table[:-4]+ ".csv" #Set the table as .csv format
    with open(csv_fn,'wb') as csvfile: #Create a csv file and write contents from dbf
        in_db = dbf.Dbf(out_table)
        out_csv = csv.writer(csvfile)
        names = []
        for field in in_db.header.fields: #Write headers
            names.append(field.name)
        out_csv.writerow(names)
        for rec in in_db: #Write records
            out_csv.writerow(rec.fieldData)
        in_db.close()
    return csv_fn

Python 3.5:

import csv
from dbfread import DBF

def dbf_to_csv(dbf_table_pth):#Input a dbf, output a csv, same name, same path, except extension
    csv_fn = dbf_table_pth[:-4]+ ".csv" #Set the csv file name
    table = DBF(dbf_table_pth)# table variable is a DBF object
    with open(csv_fn, 'w', newline = '') as f:# create a csv file, fill it with dbf content
        writer = csv.writer(f)
        writer.writerow(table.field_names)# write the column name
        for record in table:# write the rows
            writer.writerow(list(record.values()))
    return csv_fn# return the csv name

You can get dbfpy and dbfread from pip install.

Solution 3

Using my dbf library you could do something like:

import sys
import dbf
for arg in sys.argv[1:]:
    dbf.export(arg)

which will create a .csv file of the same name as each dbf file. If you put that code into a script named dbf2csv.py you could then call it as

python dbf2csv.py dbfname dbf2name dbf3name ...

Solution 4

EDIT#2:

It's possible to read a dbf file, line by line and without conversion into csv, with dbfread (simply install with pip install dbfread):

>>> from dbfread import DBF
>>> for row in DBF('southamerica_adm0.dbf'):
...     print row
... 
OrderedDict([(u'COUNTRY', u'ARGENTINA')])
OrderedDict([(u'COUNTRY', u'BOLIVIA')])
OrderedDict([(u'COUNTRY', u'BRASIL')])
OrderedDict([(u'COUNTRY', u'CHILE')])
OrderedDict([(u'COUNTRY', u'COLOMBIA')])
OrderedDict([(u'COUNTRY', u'ECUADOR')])
OrderedDict([(u'COUNTRY', u'GUYANA')])
OrderedDict([(u'COUNTRY', u'GUYANE')])
OrderedDict([(u'COUNTRY', u'PARAGUAY')])
OrderedDict([(u'COUNTRY', u'PERU')])
OrderedDict([(u'COUNTRY', u'SURINAME')])
OrderedDict([(u'COUNTRY', u'U.K.')])
OrderedDict([(u'COUNTRY', u'URUGUAY')])
OrderedDict([(u'COUNTRY', u'VENEZUELA')])

My updated references:

official project site: http://pandas.pydata.org

official documentation: http://pandas-docs.github.io/pandas-docs-travis/

dbfread: https://pypi.python.org/pypi/dbfread/2.0.6

geopandas: http://geopandas.org/

shp and dbf with geopandas: https://gis.stackexchange.com/questions/129414/only-read-specific-attribute-columns-of-a-shapefile-with-geopandas-fiona

Solution 5

First you should know what version of Dbf you have, só read the first byte of the file:

path = "/path/to/dbf/file.dbf"
with open(path, "rb") as f:
     byte = f.read(1)
     print(f"You have a DBF {int.from_bytes(byte)} file.")

Example:

> You have a DBF 3 file.

If you have a Dbf 5 file, everything will be fine, but if, which is my case most of the times, you have a Dbf 3 file, you have to tweek @andy-hayden solution by using simpledbf:

Following this issue, basically you should create a class Dbf3 that inherits Dbf5, but you need to add an new condition to the _get_recs method.

import struct

from simpledbf import Dbf5

class Dbf3(Dbf5):
   def __init__(self, dbf, codec='utf-8'):
       super().__init__(dbf, codec)
   
   def _get_recs(self, chunk=None):
#[...copy the code from the original class up until line 664...]
               elif typ == 'M':
                   value = self._na
#[...copy the code from the original class after 664...]

Original Dbf code for reference

Then your new class Dbf3 will be able to read and convert Dbf3 files with ease:

dbf = Dbf3(filename, codec="iso-8859-1") #codec specific to this dataset 
dbf.to_csv("converted_dbf.csv")
Share:
16,407
Stefano Potter
Author by

Stefano Potter

Updated on June 27, 2022

Comments

  • Stefano Potter
    Stefano Potter almost 2 years

    I have a folder with a bunch of dbf files I would like to convert to csv. I have tried using a code to just change the extension from .dbf to .csv, and these files open fine when I use Excel, but when I open them in pandas they look like this:

                                                    s\t�
    0                                                NaN
    1            1       176 1.58400000000e+005-3.385...
    

    This is not what I want, and those characters don't appear in the real file.
    How should I read in the dbf file correctly?

  • Alessandro Trinca Tornidor
    Alessandro Trinca Tornidor over 8 years
    Yes, I added the documentation link under the word "documentation", now I have reported it explicitly.
  • Andy Hayden
    Andy Hayden over 8 years
    Note that's not actually the official pandas documentation site, I think PANDA is something else entirely (but I'm not clear what)
  • Alessandro Trinca Tornidor
    Alessandro Trinca Tornidor over 8 years
    I notice now that my solution was not optimal. It's better dbfread.
  • Dobedani
    Dobedani almost 6 years
    I invoked your function dbf2DF from a script of only a few other lines. The call to open caused the following error:AttributeError: exit
  • Andy Hayden
    Andy Hayden almost 6 years
    Strange. __exit__ is needed for the with block, perhaps for some reason they deprecated that? Try db = ps.open(dbfile) and dedent.
  • Dobedani
    Dobedani almost 6 years
    It's true that without the "with" keyword, the code works all right. Thanks!
  • N4v
    N4v about 5 years
    Ethan, is there any documentation for your library?
  • Ethan Furman
    Ethan Furman about 5 years
    @N4v: Not really. Lots of neat stuff here on Stackoverflow, though.