Home » Archives for Mark Needham » Page 7

Author Archives: Mark Needham

R: Snakes and ladders markov chain

A few days ago I read a really cool blog post explaining how Markov chains can be used to model the possible state transitions in a game of snakes and ladders, a use of Markov chains I hadn’t even thought of! While the example is very helpful for understanding the concept, my understanding of the code is that it works ...

Read More »

Neo4j: The learning to cycle dependency graph

Over the past couple of weeks I’ve been reading about skill building and the break down of skills into more manageable chunks, and recently had a chance to break down the skills required to learn to cycle. I initially sketched out the skill progression but quickly realised I had drawn a dependency graph and thought that putting it into Neo4j ...

Read More »

Neo4j: Generating real time recommendations with Cypher

One of the most common uses of Neo4j is for building real time recommendation engines and a common theme is that they make use of lots of different bits of data to come up with an interesting recommendation. For example in this video Amanda shows how dating websites build real time recommendation engines by starting with social connections and then ...

Read More »

InetAddressImpl#lookupAllHostAddr slow/hangs

Since I upgraded to Yosemite I’ve noticed that attempts to resolve localhost on my home network have been taking ages (sometimes over a minute) so I thought I’d try and work out why. This is what my initial /etc/hosts file looked like based on the assumption that my machine’s hostname was teetotal: $ cat /etc/hosts ## # Host Database # ...

Read More »

Topic Modelling: Working out the optimal number of topics

In my continued exploration of topic modelling I came across The Programming Historian blog and a post showing how to derive topics from a corpus using the Java library mallet. The instructions on the blog make it very easy to get up and running but as with other libraries I’ve used, you have to specify how many topics the corpus ...

Read More »

Neo4j: TF/IDF (and variants) with cypher

A few weeks ago I wrote a blog post on running TF/IDF over HIMYM transcripts using scikit-learn to find the most important phrases by episode and afterwards I was curious how difficult it’d be to do in Neo4j. I started by translating one of wikipedia’s TF/IDF examples to cypher to see what the algorithm would look like:       ...

Read More »

R: Weather vs attendance at NoSQL meetups

A few weeks ago I came across a tweet by Sean Taylor asking for a weather data set with a few years worth of recording and I was surprised to learn that R already has such a thing – the weatherData package. Winner is: @UTVilla! library(weatherData) df <- getWeatherForYear(“SFO”, 2013) ggplot(df, aes(x=Date, y = Mean_TemperatureF)) + geom_line() — Sean J. ...

Read More »

R: Featuring engineering for a linear model

I previously wrote about a linear model I created to predict how many people would RSVP ‘yes’ to a meetup event and having not found much correlation between any of my independent variables and RSVPs was a bit stuck. As luck would have it I bumped into Antonios at a meetup a month ago and he offered to take a ...

Read More »