Home » Tag Archives: Neo4j (page 2)

Tag Archives: Neo4j

Neo4j: Detecting rogue spaces in CSV headers with LOAD CSV

Last week I was helping someone load the data from a CSV file into Neo4j and we were having trouble filtering out rows which contained a null value in one of the columns. This is what the data looked like: load csv with headers from "file:///foo.csv" as row RETURN row ╒══════════════════════════════════╕ │row │ ╞══════════════════════════════════╡ │{key1: a, key2: (null), key3: c}│ ...

Read More »

Neo4j: Cypher – Detecting duplicates using relationships

I’ve been building a graph of computer science papers on and off for a couple of months and now that I’ve got a few thousand loaded in I realised that there are quite a few duplicates. They’re not duplicates in the sense that there are multiple entries with the same identifier but rather have different identifiers but seem to be ...

Read More »

Neo4j vs Relational: Refactoring – Extracting node/table

In my previous blog post I showed how to add a new property/field to a node with a label/record in a table for a football transfers dataset that I’ve been playing with. After introducing this ‘nationality’ property I realised that I now had some duplication in the model:               players.nationality and clubs.country are referring ...

Read More »

Neo4j: A procedure for the SLM clustering algorithm

In the middle of last year I blogged about the Smart Local Moving algorithm which is used for community detection in networks and with the upcoming introduction of procedures in Neo4j I thought it’d be fun to make that code accessible as one. If you want to grab the code and follow along it’s sitting on the SLM repository on ...

Read More »

Neo4j: Specific relationship vs Generic relationship + property

For optimal traversal speed in Neo4j queries we should make our relationship types as specific as possible. Let’s take a look at an example from the ‘modelling a recommendations engine‘ talk I presented at Skillsmatter a couple of weeks ago. I needed to decided how to model the ‘RSVP’ relationship between a Member and an Event. A person can RSVP ...

Read More »

NoSQL vs. SQL: Choosing a Data Management Solution

Table Of Contents 1. Introduction 2. Distributed systems: the CAP theorem 3. Relational data stores 3.1. MySQL / MariaDB 3.2. PostgreSQL 3.3. Others 4. Why NoSQL? 5. Key/Value data stores 5.1. DynamoDB 5.2. Memcached 5.3. Redis 5.4. Riak 5.5. Aerospike 5.6. FoundationDB 6. Columnar data stores 6.1. Accumulo 6.2. Cassandra 6.3. HBase 7. Graph data stores 7.1. Neo4J 7.2. Titan ...

Read More »

Neo4j: The football transfers graph

Given we’re still in pre season transfer madness as far as European football is concerned I thought it’d be interesting to put together a football transfers graph to see whether there are any interesting insights to be had. It took me a while to find an appropriate source but I eventually came across transfermarkt.co.uk which contains transfers going back at ...

Read More »

R: Scraping Neo4j release dates with rvest

As part of my log analysis I wanted to get the Neo4j release dates which are accessible from the release notes and decided to try out Hadley Wickham’s rvest scraping library which he released at the end of 2014. rvest is based on Python’s beautifulsoup which has become my scraping library of choice so I didn’t find it too difficult ...

Read More »