Home » Author Archives: Mark Needham (page 3)

Author Archives: Mark Needham

R: Featuring engineering for a linear model

software-development-2-logo

I previously wrote about a linear model I created to predict how many people would RSVP ‘yes’ to a meetup event and having not found much correlation between any of my independent variables and RSVPs was a bit stuck. As luck would have it I bumped into Antonios at a meetup a month ago and he offered to take a ...

Read More »

R: Vectorising all the things

software-development-2-logo

After my last post about finding the distance a date/time is from the weekend Hadley Wickham suggested I could improve the function by vectorising it…                 @markhneedham vectorise with pmin(pmax(dateToLookup – before, 0), pmax(after – dateToLookup, 0)) / dhours(1) — Hadley Wickham (@hadleywickham) December 14, 2014 …so I thought I’d try and vectorise ...

Read More »

R: Time to/from the weekend

software-development-2-logo

In my last post I showed some examples using R’s lubridate package and another problem it made really easy to solve was working out how close a particular date time was to the weekend. I wanted to write a function which would return the previous Sunday or upcoming Saturday depending on which was closer. lubridate’s floor_date and ceiling_date functions make ...

Read More »

R: Cleaning up and plotting Google Trends data

software-development-2-logo

I recently came across an excellent article written by Stian Haklev in which he describes things he wishes he’d been told before starting out with R, one being to do all data clean up in code which I thought I’d give a try.                 My goal is to leave the raw data completely ...

Read More »

R: Applying a function to every row of a data frame

software-development-2-logo

In my continued exploration of London’s meetups I wanted to calculate the distance from meetup venues to a centre point in London. I’ve created a gist containing the coordinates of some of the venues that host NoSQL meetups in London town if you want to follow along:           library(dplyr)   # https://gist.github.com/mneedham/7e926a213bf76febf5ed venues = read.csv("/tmp/venues.csv")   ...

Read More »

Spark: Write to CSV file

scala-logo

A couple of weeks ago I wrote how I’d been using Spark to explore a City of Chicago Crime data set and having worked out how many of each crime had been committed I wanted to write that to a CSV file. Spark provides a saveAsTextFile function which allows us to save RDD’s so I refactored my code into the ...

Read More »

Spark: Write to CSV file with header using saveAsFile

scala-logo

In my last blog post I showed how to write to a single CSV file using Spark and Hadoop and the next thing I wanted to do was add a header row to the resulting row. Hadoop’s FileUtil#copyMerge function does take a String parameter but it adds this text to the end of each partition file which isn’t quite what ...

Read More »

Spark: Parse CSV file and group by column value

scala-logo

I’ve found myself working with large CSV files quite frequently and realising that my existing toolset didn’t let me explore them quickly I thought I’d spend a bit of time looking at Spark to see if it could help. I’m working with a crime data set released by the City of Chicago: it’s 1GB in size and contains details of ...

Read More »

Neo4j: Cypher – Avoiding the Eager

neo4j-logo

  Although I love how easy Cypher’s LOAD CSV command makes it to get data into Neo4j, it currently breaks the rule of least surprise in the way it eagerly loads in all rows for some queries even those using periodic commit. This is something that my colleague Michael noted in the second of his blog posts explaining how to ...

Read More »

Conceptual Model vs Graph Model

software-development-2-logo

We’ve started running some sessions on graph modelling in London and during the first session it was pointed out that the process I’d described was very similar to that when modelling for a relational database. I thought I better do some reading on the way relational models are derived and I came across an excellent video by Joe Maguire titled ...

Read More »
Do you want to know how to develop your skillset and become a ...

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!
Get ready to Rock!
To download the books, please verify your email address by following the instructions found on the email we just sent you.

THANK YOU!

Close