Home » Apache Kafka » Page 3

Tag Archives: Apache Kafka

Monitoring Real-Time Uber Data Using Spark Machine Learning, Streaming, and the Kafka API (Part 2)

This post is the second part in a series where we will build a real-time example for analysis and monitoring of Uber car GPS trip data. If you have not already read the first part of this series, you should read that first. The first post discussed creating a machine learning model using Apache Spark’s K-means algorithm to cluster Uber data based ...

Read More »

Kafka Connect on MapR

java-interview-questions-answers

In this week’s Whiteboard Walkthrough, Ankur Desai, Senior Product Marketing Manager at MapR, describes how Apache Kafka Connect and a REST API simplify and improve agility in working with streaming data from a variety of data sources including legacy database or data warehouse. He also explains the differences in this architecture when you use MapR Streams versus Kafka for data ...

Read More »

Kafka Connect and Kafka REST API on MapR: Streaming Just Became a Whole Lot Easier!

In my previous blogpost, I explained the three major components of a streaming architecture. Most streaming architectures have three major components – producers, a streaming system, and consumers. Producers (such as Apache Flume) publish event data into a streaming system after collecting it from the data source, transforming it into the desired format, and optionally filtering, aggregating, and enriching it. ...

Read More »

Using Kafka with Junit

One of the neat features that the excellent Spring Kafka project provides, apart from a easier to use abstraction over raw Kafka Producer and Consumer, is a way to use Kafka in tests. It does this by providing an embedded version of Kafka that can be set-up and torn down very easily. All that a project needs to include this support is ...

Read More »

Spring Kafka Producer/Consumer sample

spring-interview-questions-answers

My objective here is to show how Spring Kafka provides an abstraction to raw Kafka Producer and Consumer API’s that is easy to use and is familiar to someone with a Spring background. Sample scenario The sample scenario is a simple one, I have a system which produces a message and another which processes it         Implementation using ...

Read More »

How to Get Started with Spark Streaming and MapR Streams Using the Kafka API

This post will help you get started using Apache Spark Streaming for consuming and publishing messages with MapR Streams and the Kafka API. Spark Streaming is an extension of the core Spark API that enables continuous data stream processing. MapR Streams is a distributed messaging system for streaming event data at scale. MapR Streams enables producers and consumers to exchange events in real time via ...

Read More »

Achieving Order Guarnetee in Kafka with Partitioning

java-interview-questions-answers

One of the most important features of Kafka is to do load balancing of messages and guarantee ordering in a distributed cluster, which otherwise would not be possible in a traditional queue. Lets first try and understand the problem statement Let us assume we have a topic where messages are sent and there is a consumer who is consuming these ...

Read More »

How Apache Kafka and MapR Streams Handle Topic Partitions

Streaming data can be used as a long-term auditable history when you choose a messaging system with persistence, but is this approach practical in terms of the cost of storing years of data at scale?  The answer is “yes”, particularly because of the way topic partitions are handled in MapR Streams. Here’s how it works. Streaming Data as a Long ...

Read More »