Home » Archives for Brian ONeill

Author Archives: Brian ONeill

Charting PagerDuty Incidents over Time (using pandas)

We churn out charts for board meetings to show the health of our system (uptime, etc.). Historically, we did that once per quarter, manually.  Recently, I endeavored to create a live dashboard for the same information, starting with production incidents over time. We use PagerDuty to alert on-call staff.  Each incident is stored in PagerDuty, which is queryable via the PagerDuty ...

Read More »

Integrating Syslog w/ Kinesis : Anticipating use of the Firehose

On the heals of the Kinesis Firehose announcement, more people are going to be looking to integrate Kinesis with logging systems. (to expedite/simplify the ingestion of logs into S3 and Redshift)  Here is one take on solving that problem that integrates syslog-ng with Kinesis. First, let’s have a look at the syslog-ng configuration. In the syslog-ng configuration, you wire sources ...

Read More »

Streaming data into HPCC using Java


High Performance Computing Cluster (HPCC) is a distributed processing framework akin to Hadoop, except that it runs programs written in its own Domain Specific Language (DSL) called Enterprise Control Language (ECL).   ECL is great, but occasionally you will want to call out to perform heavy lifting in other languages.  For example, you may want to leverage an NLP library ...

Read More »

Tuning Hadoop & Cassandra : Beware of vNodes, Splits and Pages

When running Hadoop jobs against Cassandra, you will want to be careful about a few parameters. Specifically, pay special attention to vNodes, Splits and Page Sizes. vNodes were introduced in Cassandra 1.2. vNodes allow a host to have multiple portions of the token range.  This allows for more evenly distributed data, which means nodes can share the burden of a ...

Read More »

High-Performance Computing Clusters (HPCC) and Cassandra on OS X

Our new parent company, LexisNexis, has one of the world’s largest public records database: “…our comprehensive collection of more than 46 billion records from more than 10,000 diverse sources—including public, private, regulated, and derived data. You get comprehensive information on approximately 269 million individuals and 277 million unique businesses.” http://www.lexisnexis.com/en-us/products/public-records.page And they’ve been managing, analyzing and searching this database for ...

Read More »

Delta Architectures: Unifying the Lambda Architecture and leveraging Storm from Hadoop/REST

Recently, I’ve been asked by a bunch of people to go into more detail on the Druid/Storm integration that I wrote for our book: Storm Blueprints for Distributed Real-time Computation.  Druid is great. Storm is great. And the two together appear to solve the real-time dimensional query/aggregations problem. In fact, it looks like people are taking it mainstream, calling it ...

Read More »

Diction in Software Development (i.e. Don’t be a d1ck!)

Over the years, I’ve come to realize how important diction is in software development (and life in general). It may mean the difference between a 15 minute meeting where everyone nods their heads, and a day long battle of egos (especially when you have a room full of passionate people). Here are a couple key words and phrases, I’ve incorporated into ...

Read More »