How can I get the number of nodes of a Neo4j graph database from Python?

18,012

Solution 1

Update:

Since I first wrote this, the answer has changed. The database now keeps exact counts of total nodes, as well as counts by label. Unlike most databases, this is not a heuristic, these counters are transactionally kept in sync with the rest of the datastore.

This means you can get exact node counts in O(1) time from Neo4j. You get access to them by asking Cypher:

MATCH (n) RETURN count(*)

Original reply:

There are two ways to get the number of nodes in a neo4j database. The first one is to actually iterate through all the nodes, and counting them.

Alternative two is to use the "number of node ids in use" statistic provided by the db kernel, which does not guarantee to be exact, but will be at least the number of nodes in use. In a high-load db it will be higher, since it also contains ids of deleted nodes that have not been reclaimed yet.

Alt one is reasonably exact (depending on how many are created/deleted while you iterate), but can be super slow. Alt two is potentially way off, but is a O(1) operation.

You currently don't have much choice, because alt one is the only one that works. It isn't officially supported, so doing it today looks a bit dirty:

from neo4j import GraphDatabase
db = GraphDatabase('..')
node_count = sum(1 for _ in db.getAllNodes().iterator())

I've added two issues for this, one to add support for accessing management info (eg. support the alt two method), and one to add support for these use cases:

node_count = sum(1 for _ in db.nodes)
node_count = len(db.nodes)

Follow these issues here:

https://github.com/neo4j/python-embedded/issues/7

https://github.com/neo4j/python-embedded/issues/6

Please let us now if you run into any other trouble with neo4j-embedded, add a ticket to the github issues if you discover any bugs or think of any other enhancements!

Solution 2

Alternatively (might be able to execute this query from Python somehow), you can

count the total number of nodes

and return it by executing a CYPHER query via the default neo4j browser interface @ http://localhost:7474/browser/. The precise command follows:

MATCH (`n: *`) RETURN count(*)+" nodes" as total;

Hope this helps.

Solution 3

If you're willing to make a REST API query, this answer will get you the rough "number of node ids in use" value.

Share:
18,012
Marc Pou
Author by

Marc Pou

My dream is building a better and a fair world! I want to contribute to this dream investigating behavior patterns of people. In other words, I'm entering in the world of SNA (Social Network Analysis), using Graph Databases (like Neo4j) to bring innovative solutions and knowledge through fast pretotype built with Python, Ruby or Java. Studying people behavior (in other words, data that Data Analyst usually don't look for) we can understand why people do what do, and how we can help the community. The challenge is big, but the passion to reach it is bigger! I've been working as a Product Manager in Marketing Department and previously as a Project Manager in the Technical Team, both cases with International Teams. I've a high capacity to adapt myself to different situations, environments and challenges. I like to learn from the team and people than knows more than me. And I like to share all the knowledge that I have, to empower teams. Marc Pou Professional Profile at InfoJobs Marc's Innovation Blog 999 Things that make me happy

Updated on June 20, 2022

Comments

  • Marc Pou
    Marc Pou about 2 years

    I'm trying to get the number of nodes of a Neo4j graph database using Python, but I don't find any method or property to do that.

    Does anybody how can I get this information?

    Other Python packages like NetworkX has a method to get this information.

    >>> G = nx.Graph()   # or DiGraph, MultiGraph, MultiDiGraph, etc
    >>> G.add_path([0,1,2])
    >>> len(G)
    3