Python, PyDot and DecisionTree
Solution 1
I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.
- I tried installing official
pydot
packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot. - I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
- After successful install, in Environment Variables (
Control Panel\All Control Panel Items\System\Advanced system settings
> clickEnvironment Variables
button > underSystem variables
I found the variablepath
> clickEdit...
> I added;C:\Program Files (x86)\Graphviz2.38\bin
to the end in theVariable value:
field. - To confirm I can now use
dot
commands in the Command Line (Windows Command Processor), I typeddot -V
which returneddot - graphviz version 2.38.0 (20140413.2041)
.
In the below code, keep in mind that I'm reading a dataframe
from my clipboard. You might be reading it from file or whathaveyou.
In IPython Notebook:
import pandas as pd
import numpy as np
from sklearn import tree
import pydot
from IPython.display import Image
from sklearn.externals.six import StringIO
df = pd.read_clipboard()
X = df[df.columns[:-1]]
y = df[df.columns[-1]]
dtr = tree.DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)
dot_data = StringIO()
tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:
tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns)
then open up command prompt where the treepic.dot
file is and enter this command line:
dot -T png treepic.dot -o treepic.png
A .png file should be created with your decision tree.
Solution 2
In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip.
import pydotplus
<your code>
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Polly
Updated on April 07, 2020Comments
-
Polly about 4 years
I'm trying to visualize my DecisionTree, but getting the error The code is:
X = [i[1:] for i in dataset]#attribute y = [i[0] for i in dataset] clf = tree.DecisionTreeClassifier() dot_data = StringIO() tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data) graph = pydot.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("tree.pdf")
And the error is
Traceback (most recent call last): if data.startswith(codecs.BOM_UTF8): TypeError: startswith first arg must be str or a tuple of str, not bytes
Can anyone explain me whats the problem? Thank you a lot!
-
Rick almost 9 yearsOne should note that those two lines aren't quite equivalent, if data needs to start with it then the second one might not work.
-
Sina Khelil almost 9 yearsHe is looking for a unicode in a string method. Not likely to work. Although they may not be equivalent, the BOM is usually at the beginning of a file and not used anywhere else (unless you really mussed up your file) see en.wikipedia.org/wiki/Byte_order_mark
-
Polly almost 9 yearsI guess the problem is in my data file, does anybody knows how should it look like? I have a csv file, where the first string contains names of attributes in each column, and further strings contain numeric data. So my X and Y are the numeric data from a file, i've got them making "skiprows=1" when opening my file
-
Sina Khelil almost 9 years@Polly without seeing the file, we are all going to be guessing. You need to provide more details if you want more constructive answers. My answer above will likely deal with your original issue but within a certain context.
-
user-asterix over 6 yearsThis is the best advise - thank you +1 I used it with
Image(graph.create_png())
on Jupyter instead of writing it into a pdf and worked line a charm -
mgcdanny over 6 yearsYou can also do
dot_data = tree.export_graphviz(clf, out_file=None)