Home » Tag Archives: Apache Hadoop

Tag Archives: Apache Hadoop

Delta Architectures: Unifying the Lambda Architecture and leveraging Storm from Hadoop/REST

apache-hadoop-logo

Recently, I’ve been asked by a bunch of people to go into more detail on the Druid/Storm integration that I wrote for our book: Storm Blueprints for Distributed Real-time Computation.  Druid is great. Storm is great. And the two together appear to solve the real-time dimensional query/aggregations problem. In fact, it looks like people are taking it mainstream, calling it ...

Read More »

Running PageRank Hadoop job on AWS Elastic MapReduce

apache-hadoop-logo

In a previous post I described an example to perform a PageRank calculation which is part of the Mining Massive Dataset course with Apache Hadoop. In that post I took an existing Hadoop job in Java and modified it somewhat (added unit tests and made file paths set by a parameter). This post shows how to use this job on ...

Read More »

Calculate PageRanks with Apache Hadoop

apache-hadoop-logo

Currently I am following the Coursera training ‘Mining Massive Datasets‘. I have been interested in MapReduce and Apache Hadoop for some time and with this course I hope to get more insight in when and how MapReduce can help to fix some real world business problems (another way to do so I described here). This Coursera course is mainly focussing ...

Read More »

Hadoop and the OpenDataPlatform

apache-hadoop-logo

Pivotal, IBM and Hortonworks announced today the “Open Data Platform” (ODP) – an attempt to standardize Hadoop. This move seems to be backed up by IBM, Teradata and others that appear as sponsors on the initiative site. This move has a lot of potential and a few possible downsides. ODP promises standardization – Cloudera’s Mike Olson downplays the importance of this ...

Read More »

Lambda Architecture for Big Data

apache-hadoop-logo

An increasing number of systems are being built to handle the Volume, Velocity and Variety of Big Data, and hopefully help gain new insights and make better business decisions. Here, we will look at ways to deal with Big Data’s Volume and Velocity simultaneously, within a single architecture solution. Volume + Velocity Apache Hadoop provides both reliable storage (HDFS) and a processing system (MapReduce) for large data ...

Read More »

Open Source Cloud Formation with Minotaur for Mesos, Kafka and Hadoop

apache-hadoop-logo

Today I am happy to announce “Minotaur” which is our Open Source AWS based infrastructure for managing big data open source projects including (but not limited too): Apache Kafka, Apache Mesos and Cloudera’s Distribution of Hadoop. Minotaur is based on AWS Cloud Formation. The following labs are currently supported:           Apache Mesos Apache Kafka Apache Zookeeper Cloudera Hadoop ...

Read More »

Hazelcast member discovery using Curator and ZooKeeper

apache-hadoop-logo

At one project I was setting up Hazelcast cluster in a private cloud. Within cluster all nodes must see each other, so during bootstrapping Hazelcast will try to locate other cluster members. There is no server and all nodes are made equal. There are couple techniques of discovering members implemented in Hazelcast; unfortunately it wasn’t AWS so we couldn’t use ...

Read More »

Big Data… Is Hadoop the good way to start?

apache-hadoop-logo

In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing. TL;TR You believe that you have a big data project?           Do not start with the installation of an Hadoop Cluster — the “how“ Start to ...

Read More »

Data as a Service: JBoss Data Virtualization and Hadoop powering your Big Data solutions

apache-hadoop-logo

Red Hat and Cloudera, announce the formation of a strategic alliance. From JBoss perspective, the key objective of the alliance is to leverage big data enterprise-wide and not let Hadoop become another data silo. Cloudera combined with Red Hat JBoss Data Virtualization integrates Hadoop with existing information sources including data warehouses, SQL and NoSQL databases, enterprise and cloud applications, and ...

Read More »

Introducing Hadoop Development Tools

apache-hadoop-logo

Few days back Apache Hadoop Development Tools a.k.a. HDT was released.  The projects aims at bringing plugins in eclipse to simplify development on Hadoop platform. This blog aims to provide an overview of few great features of HDT. Single Endpoint The project can act as a single endpoint for your HDFS, Zookeeper and MR Cluster. You can connect to your HDFS/Zookeeper instance ...

Read More »
Do you want to know how to develop your skillset and become a ...

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!
Get ready to Rock!
To download the books, please verify your email address by following the instructions found on the email we just sent you.

THANK YOU!

Close