How do I run graphx with Python / pyspark?
Solution 1
It looks like the python bindings to GraphX are delayed at least to Spark 1.4 1.5 ∞. It is waiting behind the Java API.
You can track the status at SPARK-3789 GRAPHX Python bindings for GraphX - ASF JIRA
Solution 2
You should look at GraphFrames (https://github.com/graphframes/graphframes), which wraps GraphX algorithms under the DataFrames API and it provides Python interface.
Here is a quick example from https://graphframes.github.io/graphframes/docs/_site/quick-start.html, with slight modification so that it works
first start pyspark with the graphframes pkg loaded
pyspark --packages graphframes:graphframes:0.1.0-spark1.6
python code:
from graphframes import *
# Create a Vertex DataFrame with unique ID column "id"
v = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])
# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)
# Query: Get in-degree of each vertex.
g.inDegrees.show()
# Query: Count the number of "follow" connections in the graph.
g.edges.filter("relationship = 'follow'").count()
# Run PageRank algorithm, and show results.
results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()
Solution 3
GraphX 0.9.0 doesn't have python API yet. It's expected in upcoming releases.
Glenn Strycker
Ph.D. Physics 2010 Univ of Michigan. Currently works at ValueClick/Dotomi as a Decision Sciences Analyst.
Updated on May 14, 2020Comments
-
Glenn Strycker almost 4 years
I am attempting to run Spark graphx with Python using pyspark. My installation appears correct, as I am able to run the pyspark tutorials and the (Java) GraphX tutorials just fine. Presumably since GraphX is part of Spark, pyspark should be able to interface it, correct?
Here are the tutorials for pyspark: http://spark.apache.org/docs/0.9.0/quick-start.html http://spark.apache.org/docs/0.9.0/python-programming-guide.html
Here are the ones for GraphX: http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html
Can anyone convert the GraphX tutorial to be in Python?
-
Matthew Cornell over 9 yearsSo basically GraphX is a Scala-only system since it does not have a Java API either?
-
Wildfire over 9 yearsAFAIK it's still Scala-only
-
A T about 9 yearsActually I think they do have one. See here: github.com/amplab/graphx/tree/master/python/examples
-
Javier de la Rosa about 9 yearsThe original implementation by amplab included a couple of examples, transitive closure and PageRank, but without using the actual GraphX API, just regular PySpark API. GraphX includes a lot of handy functions and classes that are not exposed yet to Python.
-
Javier de la Rosa about 9 yearsI just found that it's done now: github.com/kdatta/spark/tree/SPARK-3789/python/pyspark/graphx Maybe it'll be included in the next release.
-
sonus21 over 8 yearsHi, Misty do you have any idea when it will be released ? I have checked it's not available till now even on 1.5.1.
-
graffe about 8 yearsThis is a terrible shame. It seems igraph-python is also partly dead. Is there any other option for handling large graphs in python?
-
Ian almost 8 yearsYou could put more explanation other than the links
-
Evan Zamir over 7 years@JavierdelaRosa That looks like a fork.
-
Javier de la Rosa over 7 years@EvanZamir I think now GraphFrames is the way to go.