Home » Tag Archives: Apache Hadoop

Tag Archives: Apache Hadoop

Run Scala implemented Hadoop Jobs on HDInsight

Previously we set up a Scala application in order to execute a simple word count on hadoop. What comes next is uploading our application to HDInsight. So we shall proceed in creating a Hadoop cluster on HDInsight.               Then we will create the hadoop cluster. As you can see we specify the admin console ...

Read More »

WordCount on Hadoop with Scala

Hadoop is a great technology built with java. Today we will use Scala to implement a simple map reduce job and then run it using HDInsight. We shall add the assembly plugin on our assembly.sbt addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3") Then we will add the Hadoop core dependency on our build.sbt file. Also will we apply some configuration in the ...

Read More »

Hadoop: DataNode not starting

In my continued playing with Mahout I eventually decided to give up using my local file system and use a local Hadoop instead since that seems to have much less friction when following any examples. Unfortunately all my attempts to upload any files from my local file system to HDFS were being met with the following exception: java.io.IOException: File /user/markneedham/book2.txt ...

Read More »

Apache Hadoop HDFS Data Node Apache Mesos Framework

Intro This project allows running HDFS on Mesos. You should be familiar with HDFS and Mesos basics: http://mesos.apache.org/documentation/latest/ https://hadoop.apache.org/docs/r2.7.2/hdfs_design.html         Project requires: Mesos 0.23.0+ JDK 1.7.x Hadoop 1.2.x or 2.7.x Mesos in Vagrant Project includes vagrant environment, that allows to run Mesos cluster locally. If you are going to use external Mesos cluster, you can skip this section. 1. ...

Read More »

Solving Problems with the Right Technology: Hadoop and RDBMS

In some circles today there is a sort of ‘Hadoop vs. RDBMS’ debate ongoing. Often the discussion casts Hadoop as the obvious heir apparent in the data processing world, with RDBMS cast as your father’s Oldsmobile. This debate is somewhat misdirected and the discussion could lead organizations away from the strategy they really should be following, namely a strategy of ...

Read More »

Key Tips for Managing Passwords in Sqoop

Sqoop is a popular data transfer tool for Hadoop. Sqoop allows easy import and export of data from structured data stores like relational databases, enterprise data warehouses, and NoSQL datastores. Sqoop also integrates with Hadoop-based systems such as Hive, HBase, and Oozie. In this blog post, I will cover the different options available for managing passwords in Sqoop. Sqoop is ...

Read More »

Apache Hadoop Tutorial – The ULTIMATE Guide (PDF Download)

EDITORIAL NOTE: Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. Hadoop has become the de-facto tool ...

Read More »

The Lord of the Things: Spark or Hadoop?

Are people in your data analytics organization contemplating the impending data avalanche from the internet of things and thus asking this question: “Spark or Hadoop?” That’s the wrong question! The internet of things (IOT) will generate massive quantities of data. In most cases, these will be streaming data from ubiquitous sensors and devices. Often, we will need to make real-time ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns