Home » Tag Archives: Big Data

Tag Archives: Big Data

Big Data Skills Spectrum

software-development-2-logo

Big Data has been a hype for many years. I’ve seen a few “Big Data” projects start in the past with a lot of fanfare and promise.The promise has always been that “we will start getting a holistic picture of our departmental silos and gain numerous insights from our data that will help us get ahead of our competitors”. The ...

Read More »

What are the 5 Trends for Testing in the Era of Big Data?

software-development-2-logo

In today’s world of data explosion, big data applications and their implementations are growing dramatically. As data is at the heart of any big data application, it is important to understand the characteristics of big data. The three most unique characteristics of big data are ‘Volume’, ‘Velocity’ and ‘Variety’. And these data comes in different format from multiple channels. All ...

Read More »

Running PageRank Hadoop job on AWS Elastic MapReduce

apache-hadoop-logo

In a previous post I described an example to perform a PageRank calculation which is part of the Mining Massive Dataset course with Apache Hadoop. In that post I took an existing Hadoop job in Java and modified it somewhat (added unit tests and made file paths set by a parameter). This post shows how to use this job on ...

Read More »

Calculate PageRanks with Apache Hadoop

apache-hadoop-logo

Currently I am following the Coursera training ‘Mining Massive Datasets‘. I have been interested in MapReduce and Apache Hadoop for some time and with this course I hope to get more insight in when and how MapReduce can help to fix some real world business problems (another way to do so I described here). This Coursera course is mainly focussing ...

Read More »

Even Doctors Will Be Data Scientists

software-development-2-logo

We all know how it works. You walk into a doctor’s office complaining about some pain in your leg or otherwise. They take your temperature, get you on the scale, check your blood pressure, and perhaps even get out the rubber hammer. These measurements are simply snapshots at one particular instant in time and may be subject to error. This ...

Read More »

How to: Refine Hive ZooKeeper Lock Manager Implementation

apache-zookeeper-logo

Hive has been using ZooKeeper as distributed lock manager to support concurrency in HiveServer2. The ZooKeeper-based lock manager works fine in a small scale environment. However, as more and more users move to HiveServer2 from HiveServer and start to create a large number of concurrent sessions, problems can arise. The major problem is that the number of open connections between ...

Read More »

How to Analyze Highly Dynamic Datasets with Apache Drill

java-interview-questions-answers

Today’s data is dynamic and application-driven. The growth of a new era of business applications driven by industry trends such as web/social/mobile/IOT are generating datasets with new data types and new data models. These applications are iterative, and the associated data models typically are semi-structured, schema-less and constantly evolving. Semi-structured where an element can be complex/nested, and schema-less with its ...

Read More »

Hadoop and the OpenDataPlatform

apache-hadoop-logo

Pivotal, IBM and Hortonworks announced today the “Open Data Platform” (ODP) – an attempt to standardize Hadoop. This move seems to be backed up by IBM, Teradata and others that appear as sponsors on the initiative site. This move has a lot of potential and a few possible downsides. ODP promises standardization – Cloudera’s Mike Olson downplays the importance of this ...

Read More »

Streaming Big Data: Storm, Spark and Samza

apache-spark-logo

There are a number of distributed computation systems that can process Big Data in real time or near-real time. This article will start with a short description of three Apache frameworks, and attempt to provide a quick, high-level overview of some of their similarities and differences. Apache Storm In Storm, you design a graph of real-time computation called a topology, and feed it to the ...

Read More »

Lambda Architecture for Big Data

apache-hadoop-logo

An increasing number of systems are being built to handle the Volume, Velocity and Variety of Big Data, and hopefully help gain new insights and make better business decisions. Here, we will look at ways to deal with Big Data’s Volume and Velocity simultaneously, within a single architecture solution. Volume + Velocity Apache Hadoop provides both reliable storage (HDFS) and a processing system (MapReduce) for large data ...

Read More »
Want to take your Java Skills to the next level?
Grab our programming books for FREE!
  • Save time by leveraging our field-tested solutions to common problems.
  • The books cover a wide range of topics, from JPA and JUnit, to JMeter and Android.
  • Each book comes as a standalone guide (with source code provided), so that you use it as reference.
Last Step ...

Where should we send the free eBooks?

Good Work!
To download the books, please verify your email address by following the instructions found on the email we just sent you.