Apache Hadoop Tutorial

About the author

Martin is a software engineer with more than 10 years of experience in software development. He has been involved in different positions in application development in a variety of software projects ranging from reusable software components, mobile applications over fat-client GUI projects up to larg-scale, clustered enterprise applications with real-time requirements.

Martin is a Java EE enthusiast and works for an international operating company. He is interested in clean code and the software craftsmanship approach. He also strongly believes in automated testing and continuous integration. His current interests include Java EE, web applications with focus on HTML5 and performance optimizations. When time permits, he works on open source projects.

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework.

Hadoop has become the de-facto tool used for Distributed computing. For this reason we have provided an abundance of tutorials here at Java Code Geeks, most of which can be found here.

Now, we wanted to create a standalone, reference post to provide a framework on how to work with Hadoop and help you quickly kick-start your own applications. Enjoy!

 

Apache Hadoop Tutorial includes:

  1. Introduction
  2. Setup
  3. HDFS
  4. MapReduce
  5. YARN
  6. Download
JCG eBooks are professionally designed, downloadable collections of popular JCG content – articles, interviews, presentations, and research – covering the latest software development technologies, trends, and topics.
Back to top button