“Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch– and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation. The rise ...
Read More »Home »
Getting Started with Kafka REST Proxy for MapR Streams
MapR Ecosystem Package 2.0 (MEP) is coming with some new features related to MapR Streams: Kafka REST Proxy for MapR Streams provides a RESTful interface to MapR Streams and Kafka clusters, making it easy to consume and produce messages as well as perform administrative operations. Kafka Connect for MapR Streams is a utility for streaming data between MapR Streams and Apache Kafka ...
Read More »Getting Started With Kafka REST Proxy for MapR Streams
Introduction MapR Ecosystem Package 2.0 (MEP) is coming with some new features related to MapR Streams: Kafka REST Proxy for MapR Streams provides a RESTful interface to MapR Streams and Kafka clusters to consume and product messages and to perform administrative operations. Kafka Connect for MapR Streams is a utility for streaming data between MapR Streams and Apache Kafka and ...
Read More »Deploying a Secure Mini MapR Cluster with Docker on a Single AWS Instance
Introduction If you want to try out the MapR Converged Data Platform to see its unique big data capabilities but don’t have a cluster of hardware immediately available, you still have a few other options. For example, you can spin up a MapR cluster in the cloud using multiple node instances on one of our IaaS partners (Amazon, Azure, etc.). ...
Read More »Processing Image Documents on MapR at Scale
There has been a lot of research in document image processing over the past 20 years, but not much research has been done in terms of parallel processing. Some of the solutions proposed for parallel processing have been to create threads of execution for each image, or to use GNU Parallel. In this blog post, you will learn how to ...
Read More »Connecting Pentaho Data Integration to MapR Using Apache Drill
Pentaho Data Integration (PDI) provides the ETL capabilities that facilitate the process of capturing, cleansing, and storing data. Its uniform and consistent format makes it accessible and relevant to end-users and IoT technologies. Apache Drill is a schema-free SQL-on-Hadoop engine that lets you run SQL queries against different data sets with various formats, e.g. JSON, CSV, Parquet, HBase, etc. By ...
Read More »Connecting a Drill-enabled MapR Cluster to Azure Resources (Part 2)
In my last post, I deployed a MapR cluster to the Azure cloud using the template available through the Azure Marketplace. My goal in doing this was to get a Drill-enabled cluster up and going in Azure as quickly as possible. My emphasis on Azure indicates that I am probably making use of the Microsoft cloud for a broader range ...
Read More »How to Get Started with Spark Streaming and MapR Streams Using the Kafka API
This post will help you get started using Apache Spark Streaming for consuming and publishing messages with MapR Streams and the Kafka API. Spark Streaming is an extension of the core Spark API that enables continuous data stream processing. MapR Streams is a distributed messaging system for streaming event data at scale. MapR Streams enables producers and consumers to exchange events in real time via ...
Read More »Integrating MapR With Ruby: Getting started with MapR-DB and MapR Streams on JRuby
MapR Streams and MapR-DB are both very exciting developments in the MapR Converged Data Platform. In this blog post, I’m going to show you how to get Ruby code to natively interact with MapR-DB and MapR Streams. I am a Ruby developer, and existing Ruby clients/libraries for HBase and Kafka just weren’t working properly with the MapR equivalents. So I ...
Read More »