Home » Tag Archives: Big Data (page 4)

Tag Archives: Big Data

Distributed Stream and Graph Processing with Apache Flink

software-development-2-logo

Apache Flink is a top-level Apache project that allows unifying distributed stream and batch processing. In the core of Apache Flink is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. On August 27, the Bay Area Apache Flink Meetup had another event hosted by MapR. This time, the main topics ...

Read More »

Architecting Data Intensive Applications – Part 1

software-development-2-logo

Introduction Every software application can, in essence, be divided into two types : Compute Intensive Applications and Data Intensive Applications. And then there are applications that fall somewhere between these two extremes. Today I would be talking about how to define the High Level Architecture for applications that are focused on leveraging the data of the enterprise in order to ...

Read More »

In-memory Data Model and Persistence for Big Data

java-interview-questions-answers

ORM frameworks help developers when they want to interact with relational databases. There are many excellent ORM frameworks for relational databases such as Hibernate and Apache OpenJPA and some of them are really good. Nowadays, big data is emerging and more and more people develops applications which runs on big data. There have been developed different kinds of NoSQL databases to store such ...

Read More »

Big Data Skills Spectrum

software-development-2-logo

Big Data has been a hype for many years. I’ve seen a few “Big Data” projects start in the past with a lot of fanfare and promise.The promise has always been that “we will start getting a holistic picture of our departmental silos and gain numerous insights from our data that will help us get ahead of our competitors”. The ...

Read More »

What are the 5 Trends for Testing in the Era of Big Data?

software-development-2-logo

In today’s world of data explosion, big data applications and their implementations are growing dramatically. As data is at the heart of any big data application, it is important to understand the characteristics of big data. The three most unique characteristics of big data are ‘Volume’, ‘Velocity’ and ‘Variety’. And these data comes in different format from multiple channels. All ...

Read More »

Running PageRank Hadoop job on AWS Elastic MapReduce

apache-hadoop-logo

In a previous post I described an example to perform a PageRank calculation which is part of the Mining Massive Dataset course with Apache Hadoop. In that post I took an existing Hadoop job in Java and modified it somewhat (added unit tests and made file paths set by a parameter). This post shows how to use this job on ...

Read More »

Calculate PageRanks with Apache Hadoop

apache-hadoop-logo

Currently I am following the Coursera training ‘Mining Massive Datasets‘. I have been interested in MapReduce and Apache Hadoop for some time and with this course I hope to get more insight in when and how MapReduce can help to fix some real world business problems (another way to do so I described here). This Coursera course is mainly focussing ...

Read More »

Even Doctors Will Be Data Scientists

software-development-2-logo

We all know how it works. You walk into a doctor’s office complaining about some pain in your leg or otherwise. They take your temperature, get you on the scale, check your blood pressure, and perhaps even get out the rubber hammer. These measurements are simply snapshots at one particular instant in time and may be subject to error. This ...

Read More »

How to: Refine Hive ZooKeeper Lock Manager Implementation

apache-zookeeper-logo

Hive has been using ZooKeeper as distributed lock manager to support concurrency in HiveServer2. The ZooKeeper-based lock manager works fine in a small scale environment. However, as more and more users move to HiveServer2 from HiveServer and start to create a large number of concurrent sessions, problems can arise. The major problem is that the number of open connections between ...

Read More »

How to Analyze Highly Dynamic Datasets with Apache Drill

java-interview-questions-answers

Today’s data is dynamic and application-driven. The growth of a new era of business applications driven by industry trends such as web/social/mobile/IOT are generating datasets with new data types and new data models. These applications are iterative, and the associated data models typically are semi-structured, schema-less and constantly evolving. Semi-structured where an element can be complex/nested, and schema-less with its ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Advanced Java Guide
  • Java Design Patterns
  • JMeter Tutorial
  • Java 8 Features Tutorial
  • JUnit Tutorial
  • JSF Programming Cookbook
  • Java Concurrency Essentials