Python, PyDot and DecisionTree

20,621

Solution 1

I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.

  1. I tried installing official pydot packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot.
  2. I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
  3. After successful install, in Environment Variables (Control Panel\All Control Panel Items\System\Advanced system settings > click Environment Variables button > under System variables I found the variable path > click Edit... > I added ;C:\Program Files (x86)\Graphviz2.38\bin to the end in the Variable value: field.
  4. To confirm I can now use dot commands in the Command Line (Windows Command Processor), I typed dot -V which returned dot - graphviz version 2.38.0 (20140413.2041).

In the below code, keep in mind that I'm reading a dataframe from my clipboard. You might be reading it from file or whathaveyou.

In IPython Notebook:

import pandas as pd
import numpy as np
from sklearn import tree
import pydot
from IPython.display import Image
from sklearn.externals.six import StringIO

df = pd.read_clipboard()
X = df[df.columns[:-1]]
y = df[df.columns[-1]]

dtr = tree.DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)

dot_data = StringIO()  
tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)  
graph = pydot.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png()) 

Decision Tree Visualization

Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:

tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns)

then open up command prompt where the treepic.dot file is and enter this command line:

dot -T png treepic.dot -o treepic.png

A .png file should be created with your decision tree.

Solution 2

In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip.

import pydotplus

<your code>

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Share:
20,621
Polly
Author by

Polly

Updated on April 07, 2020

Comments

  • Polly
    Polly about 4 years

    I'm trying to visualize my DecisionTree, but getting the error The code is:

    X = [i[1:] for i in dataset]#attribute
    y = [i[0] for i in dataset]
    clf = tree.DecisionTreeClassifier()
    
    dot_data = StringIO()
    tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data)
    graph = pydot.graph_from_dot_data(dot_data.getvalue())
    graph.write_pdf("tree.pdf")
    

    And the error is

    Traceback (most recent call last):
    if data.startswith(codecs.BOM_UTF8):
    TypeError: startswith first arg must be str or a tuple of str, not bytes
    

    Can anyone explain me whats the problem? Thank you a lot!

  • Rick
    Rick almost 9 years
    One should note that those two lines aren't quite equivalent, if data needs to start with it then the second one might not work.
  • Sina Khelil
    Sina Khelil almost 9 years
    He is looking for a unicode in a string method. Not likely to work. Although they may not be equivalent, the BOM is usually at the beginning of a file and not used anywhere else (unless you really mussed up your file) see en.wikipedia.org/wiki/Byte_order_mark
  • Polly
    Polly almost 9 years
    I guess the problem is in my data file, does anybody knows how should it look like? I have a csv file, where the first string contains names of attributes in each column, and further strings contain numeric data. So my X and Y are the numeric data from a file, i've got them making "skiprows=1" when opening my file
  • Sina Khelil
    Sina Khelil almost 9 years
    @Polly without seeing the file, we are all going to be guessing. You need to provide more details if you want more constructive answers. My answer above will likely deal with your original issue but within a certain context.
  • user-asterix
    user-asterix over 6 years
    This is the best advise - thank you +1 I used it with Image(graph.create_png()) on Jupyter instead of writing it into a pdf and worked line a charm
  • mgcdanny
    mgcdanny over 6 years
    You can also do dot_data = tree.export_graphviz(clf, out_file=None)