Home » Author Archives: Mark Needham

Author Archives: Mark Needham

R: Speeding up the Wimbledon scraping job

software-development-2-logo

Over the past few days I’ve written a few blog posts about a Wimbledon data set I’ve been building and after running the scripts a few times I noticed that it was taking much longer to run that I expected. To recap, I started out with the following function which takes in a URI and returns a data frame containing ...

Read More »

R: Scraping the release dates of github projects

software-development-2-logo

Continuing on from my blog post about scraping Neo4j’s release dates I thought it’d be even more interesting to chart the release dates of some github projects. In theory the release dates should be accessible through the github API but the few that I looked at weren’t returning any data so I scraped the data together. We’ll be using rvest ...

Read More »

R: Scraping Neo4j release dates with rvest

neo4j-logo

As part of my log analysis I wanted to get the Neo4j release dates which are accessible from the release notes and decided to try out Hadley Wickham’s rvest scraping library which he released at the end of 2014. rvest is based on Python’s beautifulsoup which has become my scraping library of choice so I didn’t find it too difficult ...

Read More »

Netty: Testing encoders/decoders

jboss-netty-logo

I’ve been working with Netty a bit recently and having built a pipeline of encoders/decoders as described in this excellent tutorial wanted to test that the encoders and decoders were working without having to send real messages around. Luckily there is a EmbeddedChannel which makes our life very easy indeed. Let’s say we’ve got a message ‘Foo’ that we want ...

Read More »

Neo4j: The BBC Champions League graph

neo4j-logo

A couple of weekends ago I started scraping the BBC live text feed of the Bayern Munich/Barcelona match, initially starting out with just the fouls and building the foul graph. I’ve spent a bit more time on it since then and have managed to model several other events as well including attempts, goals, cards and free kicks. I started doing ...

Read More »

Neo4j: The foul revenge graph

neo4j-logo

Last week I was showing the foul graph to my colleague Alistair who came up with the idea of running a ‘foul revenge’ query to find out which players gained revenge for a foul with one of their own later in them match. Queries like this are very path centric and therefore work well in a graph. To recap, this ...

Read More »

Neo4j: Finding all shortest paths

neo4j-logo

One of the Cypher language features we show in Neo4j training courses is the shortest path function which allows you to find the shortest path in terms of number of relationships between two nodes. Using the movie graph, which you can import via the ‘:play movies’ command in the browser, we’ll first create a ‘KNOWS’ relationship between any people that ...

Read More »

Coding: Visualising a bitmap

java-logo

Over the last month or so I’ve spent some time each day reading a new part of the Neo4j code base to get more familiar with it, and one of my favourite classes is the Bits class which does all things low level on the wire and to disk. In particular I like its toString method which returns a binary ...

Read More »

R: Replacing for loops with data frames

software-development-2-logo

In my last blog post I showed how to derive posterior probabilities for the Think Bayes dice problem: Suppose I have a box of dice that contains a 4-sided die, a 6-sided die, an 8-sided die, a 12-sided die, and a 20-sided die. If you have ever played Dungeons & Dragons, you know what I am talking about. Suppose I ...

Read More »
Do you want to know how to develop your skillset and become a ...

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!
Get ready to Rock!
To download the books, please verify your email address by following the instructions found on the email we just sent you.

THANK YOU!

Close