Home » Tag Archives: Apache Hadoop (page 2)

Tag Archives: Apache Hadoop

Open Source Cloud Formation with Minotaur for Mesos, Kafka and Hadoop

apache-hadoop-logo

Today I am happy to announce “Minotaur” which is our Open Source AWS based infrastructure for managing big data open source projects including (but not limited too): Apache Kafka, Apache Mesos and Cloudera’s Distribution of Hadoop. Minotaur is based on AWS Cloud Formation. The following labs are currently supported:           Apache Mesos Apache Kafka Apache Zookeeper Cloudera Hadoop ...

Read More »

Hazelcast member discovery using Curator and ZooKeeper

apache-hadoop-logo

At one project I was setting up Hazelcast cluster in a private cloud. Within cluster all nodes must see each other, so during bootstrapping Hazelcast will try to locate other cluster members. There is no server and all nodes are made equal. There are couple techniques of discovering members implemented in Hazelcast; unfortunately it wasn’t AWS so we couldn’t use ...

Read More »

Big Data… Is Hadoop the good way to start?

apache-hadoop-logo

In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing. TL;TR You believe that you have a big data project?           Do not start with the installation of an Hadoop Cluster — the “how“ Start to ...

Read More »

Data as a Service: JBoss Data Virtualization and Hadoop powering your Big Data solutions

apache-hadoop-logo

Red Hat and Cloudera, announce the formation of a strategic alliance. From JBoss perspective, the key objective of the alliance is to leverage big data enterprise-wide and not let Hadoop become another data silo. Cloudera combined with Red Hat JBoss Data Virtualization integrates Hadoop with existing information sources including data warehouses, SQL and NoSQL databases, enterprise and cloud applications, and ...

Read More »

Introducing Hadoop Development Tools

apache-hadoop-logo

Few days back Apache Hadoop Development Tools a.k.a. HDT was released.  The projects aims at bringing plugins in eclipse to simplify development on Hadoop platform. This blog aims to provide an overview of few great features of HDT. Single Endpoint The project can act as a single endpoint for your HDFS, Zookeeper and MR Cluster. You can connect to your HDFS/Zookeeper instance ...

Read More »

Graph Degree Distributions using R over Hadoop

software-development-2-logo

There are two common types of graph engines. One type is focused on providing real-time, traversal-based algorithms over linked-list graphs represented on a single-server. Such engines are typically called graph databases and some of the vendors include Neo4j, OrientDB, DEX, and InfiniteGraph. The other type of graph engine is focused on batch-processing using vertex-centric message passing within a graph represented ...

Read More »

Understanding the World using Tables and Graphs

apache-hadoop-logo

Organizations make use of data to drive their decision making, enhance their product features, and to increase the efficiency of their everyday operations. Data by itself is not useful. However, with data analysis, patterns such as trends, clusters, predictions, etc. can be distilled. The way in which data is analyzed is predicated on the way in which data is structured. ...

Read More »

ElasticSearch-Hadoop: Indexing product views count and customer top search query from Hadoop to ElasticSearch

apache-hadoop-logo

This post covers to use ElasticSearch-Hadoop to read data from Hadoop system and index that in ElasticSearch. The functionality it covers is to index product views count and top search query per customer in last n number of days. The analyzed data can further be used on website to display customer recently viewed, product views count and top search query string. ...

Read More »

Apache Hadoop 2.4.0

apache-hadoop-logo

The Apache community has voted to release Apache Hadoop 2.4.0, so the new release is now available and consists of important improvements. The improvements are related not only to HDFS but also to MapReduce. The important improvement in HDFS is about NameNodes. Multiple independent Namenodes and Namespaces are now used that do not require coordination with each other. Datanodes are ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Advanced Java Guide
  • Java Design Patterns
  • JMeter Tutorial
  • Java 8 Features Tutorial
  • JUnit Tutorial
  • JSF Programming Cookbook
  • Java Concurrency Essentials