Fastest way to perform bulk add/insert in Neo4j with Python?
Solution 1
There are several ways to do a bulk create with py2neo, each making only a single call to the server.
- Use the
create
method to build a number of nodes and relationships in a single batch. - Use a cypher CREATE statement.
- Use the new
WriteBatch
class (just released this week) to manually make a batch of nodes and relationships (this is really just a manual version of 1).
If you have some code, I'm happy to look at it and make suggestions on performance tweaks. There are also quite a few tests you may be able to get inspiration from.
Cheers, Nige
Solution 2
Neo4j's write performance is slow unless you are doing a batch insert.
The Neo4j batch importer (https://github.com/jexp/batch-import) is the fastest way to load data into Neo4j. It's a Java utility, but you don't need to know any Java because you're just running the executable. It handles typed data and indexes, and it imports from a CSV file.
To use it with Bulbs (http://bulbflow.com/) Models, use the model get_bundle()
method to get the data, index name, and index keys, which is prepared for insert, and then output the data to a CSV file. Or if you don't want to model your data, just output your data from Python to the CSV file.
Will that work for you?
Solution 3
There's so many old answers to this question online, that it took me forever to realize there's an import tool that comes with neo4j. It's very fast and the best tool I was able to find.
Here's a simple example if we want to import student nodes:
bin/neo4j-import --into [path-to-your-neo4j-directory]/data/graph.db --nodes students
The students file contains data that looks like this, for example:
studentID:Id(Student),name,year:int,:LABEL
1111,Amy,2000,Student
2222,Jane,2012,Student
3333,John,2013,Student
Explanation:
- The header explains how the data below it should be interpreted.
- studentID is a property with type Id(Student).
- name is of type string which is the default.
- year is an integer
- :LABEL is the label you want for these nodes, in this case it is "Student"
Here's the documentation for it: http://neo4j.com/docs/stable/import-tool-usage.html
Note: I realize the question specifically mentions python, but another useful answer mentions a non-python solution.
Solution 4
Well, I myself had need for massive performance from neo4j. I end up doing following things to improve graph performance.
- Ditched py2neo, since there were lot of issues with it. Besides it is very convenient to use REST endpoint provided by neo4j, just make sure to use request sessions.
- Use raw cypher queries for bulk insert, instead of any OGM(Object-Graph Mapper). That is very crucial if you need an high-performant system.
- Performance was not still enough for my needs, so I ended writing a custom system that merges 6-10 queries together using WITH * AND UNION clauses. That improved performance by a factor of 3 to 5 times.
- Use larger transaction size with atleast 1000 queries.
Related videos on Youtube
Comments
-
wodow almost 2 years
I am finding Neo4j slow to add nodes and relationships/arcs/edges when using the REST API via py2neo for Python. I understand that this is due to each REST API call executing as a single self-contained transaction.
Specifically, adding a few hundred pairs of nodes with relationships between them takes a number of seconds, running on localhost.
What is the best approach to significantly improve performance whilst staying with Python?
Would using bulbflow and Gremlin be a way of constructing a bulk insert transaction?
Thanks!
-
wodow over 11 yearsGood answer with options to try. Thank you for the offer of your time too - I will get in touch if I come unstuck.
-
Will over 11 yearsI still find it takes hours to create 600k simple relationships between a category node and a data node with get_or_create_relationships(). Any ideas?
-
songololo about 10 yearsIs the Neo4j batch importer still the best way to go?
-
Nigel Small about 9 yearsI'd be interested to hear what issues you had with py2neo.
-
confused00 almost 8 yearsAre these still the fastest ways to write to Neo4j? What about creating elements within a transaction, and committing when everything is done?
-
zelusp over 7 years@NigelSmall, Is this preferred over the two step process where you create a GEOFF file and then batch import using Load2Neo?
-
zelusp over 7 years@NigelSmall, any pointers for this py2neo SO post?