Home » Author Archives: Mark Needham

Author Archives: Mark Needham

Hadoop: DataNode not starting

apache-hadoop-logo

In my continued playing with Mahout I eventually decided to give up using my local file system and use a local Hadoop instead since that seems to have much less friction when following any examples. Unfortunately all my attempts to upload any files from my local file system to HDFS were being met with the following exception: java.io.IOException: File /user/markneedham/book2.txt ...

Read More »

Neo4j: Cypher – Detecting duplicates using relationships

neo4j-logo

I’ve been building a graph of computer science papers on and off for a couple of months and now that I’ve got a few thousand loaded in I realised that there are quite a few duplicates. They’re not duplicates in the sense that there are multiple entries with the same identifier but rather have different identifiers but seem to be ...

Read More »

Neo4j vs Relational: Refactoring – Extracting node/table

neo4j-logo

In my previous blog post I showed how to add a new property/field to a node with a label/record in a table for a football transfers dataset that I’ve been playing with. After introducing this ‘nationality’ property I realised that I now had some duplication in the model:               players.nationality and clubs.country are referring ...

Read More »

Neo4j: A procedure for the SLM clustering algorithm

neo4j-logo

In the middle of last year I blogged about the Smart Local Moving algorithm which is used for community detection in networks and with the upcoming introduction of procedures in Neo4j I thought it’d be fun to make that code accessible as one. If you want to grab the code and follow along it’s sitting on the SLM repository on ...

Read More »

Clojure: First steps with reducers

clojure-logo

I’ve been playing around with Clojure a bit today in preparation for a talk I’m giving next week and found myself writing the following code to apply the same function to three different scores: (defn log2 [n] (/ (Math/log n) (Math/log 2)))   (defn score-item [n] (if (= n 0) 0 (log2 n)))   (+ (score-item 12) (score-item 13) (score-item ...

Read More »

Neo4j: Specific relationship vs Generic relationship + property

neo4j-logo

For optimal traversal speed in Neo4j queries we should make our relationship types as specific as possible. Let’s take a look at an example from the ‘modelling a recommendations engine‘ talk I presented at Skillsmatter a couple of weeks ago. I needed to decided how to model the ‘RSVP’ relationship between a Member and an Event. A person can RSVP ...

Read More »

Hadoop: HDFS – java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSOutputSummer.(Ljava/util/zip/Checksum;II)V

apache-hadoop-logo

I wanted to write a little program to check that one machine could communicate a HDFS server running on the other and adapted some code from the Hadoop wiki as follows: package org.playground;   import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path;   import java.io.IOException;   public class HadoopDFSFileReadWrite {   static void printAndExit(String str) { System.err.println( str ...

Read More »

SparkR: Add new column to data frame by concatenating other columns

software-development-2-logo

Continuing with my exploration of the Land Registry open data set using SparkR I wanted to see which road in the UK has had the most property sales over the last 20 years. To recap, this is what the data frame looks like: ./spark-1.5.0-bin-hadoop2.6/bin/sparkR --packages com.databricks:spark-csv_2.11:1.2.0   > sales <- read.df(sqlContext, "pp-complete.csv", "com.databricks.spark.csv", header="false")   > head(sales) C0 C1 C2 ...

Read More »

Unix: Redirecting stderr to stdout

software-development-2-logo

I’ve been trying to optimise some Neo4j import queries over the last couple of days and as part of the script I’ve been executed I wanted to redirect the output of a couple of commands into a file to parse afterwards. I started with the following script which doesn’t do any explicit redirection of the output: #!/bin/sh   ./neo4j-community-2.2.3/bin/neo4j start ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Advanced Java Guide
  • Java Design Patterns
  • JMeter Tutorial
  • Java 8 Features Tutorial
  • JUnit Tutorial
  • JSF Programming Cookbook
  • Java Concurrency Essentials