Home » MapR

Tag Archives: MapR

Perfecting Lambda Architecture with Oracle Data Integrator (and Kafka / MapR Streams)

java-interview-questions-answers

“Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch– and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation. The rise ...

Read More »

Getting Started with Kafka REST Proxy for MapR Streams

java-interview-questions-answers

MapR Ecosystem Package 2.0 (MEP) is coming with some new features related to MapR Streams: Kafka REST Proxy for MapR Streams provides a RESTful interface to MapR Streams and Kafka clusters, making it easy to consume and produce messages as well as perform administrative operations. Kafka Connect for MapR Streams is a utility for streaming data between MapR Streams and Apache Kafka ...

Read More »

Getting Started With Kafka REST Proxy for MapR Streams

java-interview-questions-answers

Introduction MapR Ecosystem Package 2.0 (MEP) is coming with some new features related to MapR Streams: Kafka REST Proxy for MapR Streams provides a RESTful interface to MapR Streams and Kafka clusters to consume and product messages and to perform administrative operations. Kafka Connect for MapR Streams is a utility for streaming data between MapR Streams and Apache Kafka and ...

Read More »

Processing Image Documents on MapR at Scale

There has been a lot of research in document image processing over the past 20 years, but not much research has been done in terms of parallel processing. Some of the solutions proposed for parallel processing have been to create threads of execution for each image, or to use GNU Parallel. In this blog post, you will learn how to ...

Read More »

Connecting Pentaho Data Integration to MapR Using Apache Drill

Pentaho Data Integration (PDI) provides the ETL capabilities that facilitate the process of capturing, cleansing, and storing data. Its uniform and consistent format makes it accessible and relevant to end-users and IoT technologies. Apache Drill is a schema-free SQL-on-Hadoop engine that lets you run SQL queries against different data sets with various formats, e.g. JSON, CSV, Parquet, HBase, etc. By ...

Read More »

How to Get Started with Spark Streaming and MapR Streams Using the Kafka API

This post will help you get started using Apache Spark Streaming for consuming and publishing messages with MapR Streams and the Kafka API. Spark Streaming is an extension of the core Spark API that enables continuous data stream processing. MapR Streams is a distributed messaging system for streaming event data at scale. MapR Streams enables producers and consumers to exchange events in real time via ...

Read More »