Neo4j: Generic/Vague relationship names

Mark NeedhamOctober 2nd, 2014Last Updated: October 2nd, 2014

0 87 4 minutes read

An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node.

Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in.

However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out.

I created 4 different databases which had one node with 60,000 outgoing relationships – 10,000 which we wanted to retrieve and 50,000 that were irrelevant.

I modelled the ‘relationship’ in 4 different ways…

Using a specific relationship type
(node)-[:HAS_ADDRESS]->(address)
Using a generic relationship type and then filtering by end node label
(node)-[:HAS]->(address:Address)
Using a generic relationship type and then filtering by relationship property
(node)-[:HAS {type: “address”}]->(address)
Using a generic relationship type and then filtering by end node property
(node)-[:HAS]->(address {type: “address”})

…and then measured how long it took to retrieve the ‘has address’ relationships.

The code is on github if you want to take a look.

Although it’s obviously not as precise as a JMH micro benchmark I think it’s good enough to get a feel for the difference between the approaches.

I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms):

Using a generic relationship type and then filtering by end node label
50%ile: 6.0    75%ile: 6.0    99%ile: 402.60999999999825
 
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0   75%ile: 22.0   99%ile: 504.85999999999785
 
Using a generic relationship type and then filtering by end node label
50%ile: 4.0    75%ile: 4.0    99%ile: 145.65999999999931
 
Using a specific relationship type
50%ile: 0.0    75%ile: 1.0    99%ile: 25.749999999999872

We can drill further into why there’s a difference in the times for each of the approaches by profiling the equivalent cypher query. We’ll start with the one which uses a specific relationship name:

Using a specific relationship type

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS_ADDRESS]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +SimplePatternMatcher
      |
      +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+-----------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                 Other |
+----------------------+-------+--------+-----------------------------+-----------------------+
|         ColumnFilter |     1 |      0 |                             | keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                       |
| SimplePatternMatcher | 10000 |  10000 | n,   UNNAMED53,   UNNAMED35 |                       |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |          {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+-----------------------+
 
Total database accesses: 10001

Here we can see that there were 10,002 database accesses in order to get a count of our 10,000 HAS_ADDRESS relationships. We get a database access each time we load a node, relationship or property.

By contrast the other approaches have to load in a lot more data only to then filter it out:

Using a generic relationship type and then filtering by end node label

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->(:Address) return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+----------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                            Other |
+----------------------+-------+--------+-----------------------------+----------------------------------+
|         ColumnFilter |     1 |      0 |                             |            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                  |
|               Filter | 10000 |  10000 |                             | hasLabel(  UNNAMED45:Address(0)) |
| SimplePatternMatcher | 10000 |  60000 | n,   UNNAMED45,   UNNAMED35 |                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+----------------------------------+
 
Total database accesses: 70001

Using a generic relationship type and then filtering by relationship property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS {type: "address"}]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED35,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED63,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
 
Total database accesses: 140001

Using a generic relationship type and then filtering by end node property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->({type: "address"}) return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED45,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED45,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
 
Total database accesses: 140001

So in summary…specific relationships #ftw!

Reference:

Neo4j: Generic/Vague relationship names from our JCG partner Mark Needham at the Mark Needham Blog blog.