Home » Tag Archives: Apache Hadoop

Tag Archives: Apache Hadoop

Hadoop: DataNode not starting

apache-hadoop-logo

In my continued playing with Mahout I eventually decided to give up using my local file system and use a local Hadoop instead since that seems to have much less friction when following any examples. Unfortunately all my attempts to upload any files from my local file system to HDFS were being met with the following exception: java.io.IOException: File /user/markneedham/book2.txt ...

Read More »

Apache Hadoop HDFS Data Node Apache Mesos Framework

apache-hadoop-logo

Intro This project allows running HDFS on Mesos. You should be familiar with HDFS and Mesos basics: http://mesos.apache.org/documentation/latest/ https://hadoop.apache.org/docs/r2.7.2/hdfs_design.html         Project requires: Mesos 0.23.0+ JDK 1.7.x Hadoop 1.2.x or 2.7.x Mesos in Vagrant Project includes vagrant environment, that allows to run Mesos cluster locally. If you are going to use external Mesos cluster, you can skip this section. 1. ...

Read More »

Solving Problems with the Right Technology: Hadoop and RDBMS

apache-hadoop-logo

In some circles today there is a sort of ‘Hadoop vs. RDBMS’ debate ongoing. Often the discussion casts Hadoop as the obvious heir apparent in the data processing world, with RDBMS cast as your father’s Oldsmobile. This debate is somewhat misdirected and the discussion could lead organizations away from the strategy they really should be following, namely a strategy of ...

Read More »

Key Tips for Managing Passwords in Sqoop

apache-sqoop-logo

Sqoop is a popular data transfer tool for Hadoop. Sqoop allows easy import and export of data from structured data stores like relational databases, enterprise data warehouses, and NoSQL datastores. Sqoop also integrates with Hadoop-based systems such as Hive, HBase, and Oozie. In this blog post, I will cover the different options available for managing passwords in Sqoop. Sqoop is ...

Read More »

Apache Hadoop Tutorial – The ULTIMATE Guide (PDF Download)

apache-hadoop-logo

EDITORIAL NOTE: Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. Hadoop has become the de-facto tool ...

Read More »

The Lord of the Things: Spark or Hadoop?

apache-hadoop-logo

Are people in your data analytics organization contemplating the impending data avalanche from the internet of things and thus asking this question: “Spark or Hadoop?” That’s the wrong question! The internet of things (IOT) will generate massive quantities of data. In most cases, these will be streaming data from ubiquitous sensors and devices. Often, we will need to make real-time ...

Read More »

Mesos and YARN: A tale of two clusters

apache-hadoop-logo

This is a tale of two siloed clusters. The first cluster is an Apache Hadoop cluster. This is an island whose resources are completely isolated to Hadoop and its processes. The second cluster is the description I give to all resources that are not a part of the Hadoop cluster. I break them up this way because Hadoop manages its ...

Read More »

What Are The Advanced Apache Hadoop MapReduce Features?

apache-hadoop-logo

Overview The basic MapReduce programming explains the work flow details. But it does not cover the actual working details inside the MapReduce programming framework. This article will explain the data movement through the MapReduce architecture and the API calls used to do the actual processing. We will also discuss the customization techniques and function overriding for application specific needs. Introduction ...

Read More »

Tuning Hadoop & Cassandra : Beware of vNodes, Splits and Pages

apache-cassandra-logo

When running Hadoop jobs against Cassandra, you will want to be careful about a few parameters. Specifically, pay special attention to vNodes, Splits and Page Sizes. vNodes were introduced in Cassandra 1.2. vNodes allow a host to have multiple portions of the token range.  This allows for more evenly distributed data, which means nodes can share the burden of a ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns