Home » Tag Archives: Big Data (page 5)

Tag Archives: Big Data

Even Doctors Will Be Data Scientists

software-development-2-logo

We all know how it works. You walk into a doctor’s office complaining about some pain in your leg or otherwise. They take your temperature, get you on the scale, check your blood pressure, and perhaps even get out the rubber hammer. These measurements are simply snapshots at one particular instant in time and may be subject to error. This ...

Read More »

How to: Refine Hive ZooKeeper Lock Manager Implementation

apache-zookeeper-logo

Hive has been using ZooKeeper as distributed lock manager to support concurrency in HiveServer2. The ZooKeeper-based lock manager works fine in a small scale environment. However, as more and more users move to HiveServer2 from HiveServer and start to create a large number of concurrent sessions, problems can arise. The major problem is that the number of open connections between ...

Read More »

How to Analyze Highly Dynamic Datasets with Apache Drill

java-interview-questions-answers

Today’s data is dynamic and application-driven. The growth of a new era of business applications driven by industry trends such as web/social/mobile/IOT are generating datasets with new data types and new data models. These applications are iterative, and the associated data models typically are semi-structured, schema-less and constantly evolving. Semi-structured where an element can be complex/nested, and schema-less with its ...

Read More »

Hadoop and the OpenDataPlatform

apache-hadoop-logo

Pivotal, IBM and Hortonworks announced today the “Open Data Platform” (ODP) – an attempt to standardize Hadoop. This move seems to be backed up by IBM, Teradata and others that appear as sponsors on the initiative site. This move has a lot of potential and a few possible downsides. ODP promises standardization – Cloudera’s Mike Olson downplays the importance of this ...

Read More »

Streaming Big Data: Storm, Spark and Samza

apache-spark-logo

There are a number of distributed computation systems that can process Big Data in real time or near-real time. This article will start with a short description of three Apache frameworks, and attempt to provide a quick, high-level overview of some of their similarities and differences. Apache Storm In Storm, you design a graph of real-time computation called a topology, and feed it to the ...

Read More »

Lambda Architecture for Big Data

apache-hadoop-logo

An increasing number of systems are being built to handle the Volume, Velocity and Variety of Big Data, and hopefully help gain new insights and make better business decisions. Here, we will look at ways to deal with Big Data’s Volume and Velocity simultaneously, within a single architecture solution. Volume + Velocity Apache Hadoop provides both reliable storage (HDFS) and a processing system (MapReduce) for large data ...

Read More »

Open Source Cloud Formation with Minotaur for Mesos, Kafka and Hadoop

apache-hadoop-logo

Today I am happy to announce “Minotaur” which is our Open Source AWS based infrastructure for managing big data open source projects including (but not limited too): Apache Kafka, Apache Mesos and Cloudera’s Distribution of Hadoop. Minotaur is based on AWS Cloud Formation. The following labs are currently supported:           Apache Mesos Apache Kafka Apache Zookeeper Cloudera Hadoop ...

Read More »

Hazelcast member discovery using Curator and ZooKeeper

apache-hadoop-logo

At one project I was setting up Hazelcast cluster in a private cloud. Within cluster all nodes must see each other, so during bootstrapping Hazelcast will try to locate other cluster members. There is no server and all nodes are made equal. There are couple techniques of discovering members implemented in Hazelcast; unfortunately it wasn’t AWS so we couldn’t use ...

Read More »

Big Data… Is Hadoop the good way to start?

apache-hadoop-logo

In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing. TL;TR You believe that you have a big data project?           Do not start with the installation of an Hadoop Cluster — the “how“ Start to ...

Read More »

ZooKeeper on Kubernetes

apache-zookeeper-logo

The last couple of weeks I’ve been playing around with docker and kubernetes. If you are not familiar with kubernetes let’s just say for now that its an open source container cluster management implementation, which I find really really awesome. One of the first things I wanted to try out was running an Apache ZooKeeper ensemble inside kubernetes and I ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns