Home » Author Archives: Mark Needham (page 2)

Author Archives: Mark Needham

InetAddressImpl#lookupAllHostAddr slow/hangs

java-logo

Since I upgraded to Yosemite I’ve noticed that attempts to resolve localhost on my home network have been taking ages (sometimes over a minute) so I thought I’d try and work out why. This is what my initial /etc/hosts file looked like based on the assumption that my machine’s hostname was teetotal: $ cat /etc/hosts ## # Host Database # ...

Read More »

Topic Modelling: Working out the optimal number of topics

software-development-2-logo

In my continued exploration of topic modelling I came across The Programming Historian blog and a post showing how to derive topics from a corpus using the Java library mallet. The instructions on the blog make it very easy to get up and running but as with other libraries I’ve used, you have to specify how many topics the corpus ...

Read More »

Neo4j: TF/IDF (and variants) with cypher

neo4j-logo

A few weeks ago I wrote a blog post on running TF/IDF over HIMYM transcripts using scikit-learn to find the most important phrases by episode and afterwards I was curious how difficult it’d be to do in Neo4j. I started by translating one of wikipedia’s TF/IDF examples to cypher to see what the algorithm would look like:       ...

Read More »

R: Weather vs attendance at NoSQL meetups

software-development-2-logo

A few weeks ago I came across a tweet by Sean Taylor asking for a weather data set with a few years worth of recording and I was surprised to learn that R already has such a thing – the weatherData package. Winner is: @UTVilla! library(weatherData) df <- getWeatherForYear(“SFO”, 2013) ggplot(df, aes(x=Date, y = Mean_TemperatureF)) + geom_line() — Sean J. ...

Read More »

R: Featuring engineering for a linear model

software-development-2-logo

I previously wrote about a linear model I created to predict how many people would RSVP ‘yes’ to a meetup event and having not found much correlation between any of my independent variables and RSVPs was a bit stuck. As luck would have it I bumped into Antonios at a meetup a month ago and he offered to take a ...

Read More »

R: Vectorising all the things

software-development-2-logo

After my last post about finding the distance a date/time is from the weekend Hadley Wickham suggested I could improve the function by vectorising it…                 @markhneedham vectorise with pmin(pmax(dateToLookup – before, 0), pmax(after – dateToLookup, 0)) / dhours(1) — Hadley Wickham (@hadleywickham) December 14, 2014 …so I thought I’d try and vectorise ...

Read More »

R: Time to/from the weekend

software-development-2-logo

In my last post I showed some examples using R’s lubridate package and another problem it made really easy to solve was working out how close a particular date time was to the weekend. I wanted to write a function which would return the previous Sunday or upcoming Saturday depending on which was closer. lubridate’s floor_date and ceiling_date functions make ...

Read More »

R: Cleaning up and plotting Google Trends data

software-development-2-logo

I recently came across an excellent article written by Stian Haklev in which he describes things he wishes he’d been told before starting out with R, one being to do all data clean up in code which I thought I’d give a try.                 My goal is to leave the raw data completely ...

Read More »

R: Applying a function to every row of a data frame

software-development-2-logo

In my continued exploration of London’s meetups I wanted to calculate the distance from meetup venues to a centre point in London. I’ve created a gist containing the coordinates of some of the venues that host NoSQL meetups in London town if you want to follow along:           library(dplyr)   # https://gist.github.com/mneedham/7e926a213bf76febf5ed venues = read.csv("/tmp/venues.csv")   ...

Read More »

Spark: Write to CSV file

scala-logo

A couple of weeks ago I wrote how I’d been using Spark to explore a City of Chicago Crime data set and having worked out how many of each crime had been committed I wanted to write that to a CSV file. Spark provides a saveAsTextFile function which allows us to save RDD’s so I refactored my code into the ...

Read More »
Do you want to know how to develop your skillset and become a ...

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!
Get ready to Rock!
To download the books, please verify your email address by following the instructions found on the email we just sent you.

THANK YOU!

Close