Home » Author Archives: Mark Needham (page 2)

Author Archives: Mark Needham

Neo4j: A procedure for the SLM clustering algorithm

neo4j-logo

In the middle of last year I blogged about the Smart Local Moving algorithm which is used for community detection in networks and with the upcoming introduction of procedures in Neo4j I thought it’d be fun to make that code accessible as one. If you want to grab the code and follow along it’s sitting on the SLM repository on ...

Read More »

Clojure: First steps with reducers

clojure-logo

I’ve been playing around with Clojure a bit today in preparation for a talk I’m giving next week and found myself writing the following code to apply the same function to three different scores: (defn log2 [n] (/ (Math/log n) (Math/log 2)))   (defn score-item [n] (if (= n 0) 0 (log2 n)))   (+ (score-item 12) (score-item 13) (score-item ...

Read More »

Neo4j: Specific relationship vs Generic relationship + property

neo4j-logo

For optimal traversal speed in Neo4j queries we should make our relationship types as specific as possible. Let’s take a look at an example from the ‘modelling a recommendations engine‘ talk I presented at Skillsmatter a couple of weeks ago. I needed to decided how to model the ‘RSVP’ relationship between a Member and an Event. A person can RSVP ...

Read More »

Hadoop: HDFS – java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSOutputSummer.(Ljava/util/zip/Checksum;II)V

apache-hadoop-logo

I wanted to write a little program to check that one machine could communicate a HDFS server running on the other and adapted some code from the Hadoop wiki as follows: package org.playground;   import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path;   import java.io.IOException;   public class HadoopDFSFileReadWrite {   static void printAndExit(String str) { System.err.println( str ...

Read More »

SparkR: Add new column to data frame by concatenating other columns

software-development-2-logo

Continuing with my exploration of the Land Registry open data set using SparkR I wanted to see which road in the UK has had the most property sales over the last 20 years. To recap, this is what the data frame looks like: ./spark-1.5.0-bin-hadoop2.6/bin/sparkR --packages com.databricks:spark-csv_2.11:1.2.0   > sales <- read.df(sqlContext, "pp-complete.csv", "com.databricks.spark.csv", header="false")   > head(sales) C0 C1 C2 ...

Read More »

Unix: Redirecting stderr to stdout

software-development-2-logo

I’ve been trying to optimise some Neo4j import queries over the last couple of days and as part of the script I’ve been executed I wanted to redirect the output of a couple of commands into a file to parse afterwards. I started with the following script which doesn’t do any explicit redirection of the output: #!/bin/sh   ./neo4j-community-2.2.3/bin/neo4j start ...

Read More »

Sed: Using environment variables

software-development-2-logo

I’ve been playing around with the BBC football data set that I wrote about a couple of months ago and I wanted to write some code that would take the import script and replace all instances of remote URIs with a file system path. For example the import file contains several lines similar to this: LOAD CSV WITH HEADERS FROM ...

Read More »

Record Linkage: Playing around with Duke

java-logo

I’ve become quite interesting in record linkage recently and came across the Duke project which provides some tools to help solve this problem. I thought I’d give it a try. The typical problem when doing record linkage is that we have two records from different data sets which represent the same entity but don’t have a common key that we ...

Read More »

R: Bootstrap confidence intervals

software-development-2-logo

I recently came across an interesting post on Julia Evans’ blog showing how to generate a bigger set of data points by sampling the small set of data points that we actually have using bootstrapping. Julia’s examples are all in Python so I thought it’d be a fun exercise to translate them into R. We’re doing the bootstrapping to simulate ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns