Home » Author Archives: Mark Needham (page 3)

## Coding: Visualising a bitmap

Over the last month or so I’ve spent some time each day reading a new part of the Neo4j code base to get more familiar with it, and one of my favourite classes is the Bits class which does all things low level on the wire and to disk. In particular I like its toString method which returns a binary ...

## R: Replacing for loops with data frames

In my last blog post I showed how to derive posterior probabilities for the Think Bayes dice problem: Suppose I have a box of dice that contains a 4-sided die, a 6-sided die, an 8-sided die, a 12-sided die, and a 20-sided die. If you have ever played Dungeons & Dragons, you know what I am talking about. Suppose I ...

## R: Removing for loops

In my last blog post I showed the translation of a likelihood function from Think Bayes into R and in my first attempt at this function I used a couple of nested for loops. likelihoods = function(names, mixes, observations) { scores = rep(1, length(names)) names(scores) = names   for(name in names) { for(observation in observations) { scores[name] = scores[name] * ...

## Spark: Generating CSV files to import into Neo4j

About a year ago Ian pointed me at a Chicago Crime data set which seemed like a good fit for Neo4j and after much procrastination I’ve finally got around to importing it. The data set covers crimes committed from 2001 until now. It contains around 4 million crimes and meta data around those crimes such as the location, type of ...

## R: Snakes and ladders markov chain

A few days ago I read a really cool blog post explaining how Markov chains can be used to model the possible state transitions in a game of snakes and ladders, a use of Markov chains I hadn’t even thought of! While the example is very helpful for understanding the concept, my understanding of the code is that it works ...

## Neo4j: The learning to cycle dependency graph

Over the past couple of weeks I’ve been reading about skill building and the break down of skills into more manageable chunks, and recently had a chance to break down the skills required to learn to cycle. I initially sketched out the skill progression but quickly realised I had drawn a dependency graph and thought that putting it into Neo4j ...

## Neo4j: Cypher – Building the query for a movie’s profile page

Yesterday I spent the day in Berlin delivering a workshop as part of the Data Science Retreat and one of the exercises we did was write a query that would pull back all the information you’d need to create the IMDB page for a movie. Scanning the page we can see that need to get some basic meta data including ...

## Neo4j: Generating real time recommendations with Cypher

One of the most common uses of Neo4j is for building real time recommendation engines and a common theme is that they make use of lots of different bits of data to come up with an interesting recommendation. For example in this video Amanda shows how dating websites build real time recommendation engines by starting with social connections and then ...

Since I upgraded to Yosemite I’ve noticed that attempts to resolve localhost on my home network have been taking ages (sometimes over a minute) so I thought I’d try and work out why. This is what my initial /etc/hosts file looked like based on the assumption that my machine’s hostname was teetotal: \$ cat /etc/hosts ## # Host Database # ...

## Topic Modelling: Working out the optimal number of topics

In my continued exploration of topic modelling I came across The Programming Historian blog and a post showing how to derive topics from a corpus using the Java library mallet. The instructions on the blog make it very easy to get up and running but as with other libraries I’ve used, you have to specify how many topics the corpus ...

Want to take your Java skills to the next level?