Home » Author Archives: Chase Hooley

Author Archives: Chase Hooley

Handling the Extremes: Scaling and Streaming in Finance

Editor’s Note: At Strata+Hadoop World 2016 in New York, MapR Director of Enterprise Strategy & Architecture Jim Scott gave a presentation on “Handling the Extremes: Scaling and Streaming in Finance.” As Jim explains, agility is king in the world of finance, and a message-driven architecture is a mechanism for building and managing discrete business functionality to enable agility. In order to ...

Read More »

How to Secure Elasticsearch and Kibana

Introduction Elasticsearch (ES) is a search engine based on Lucene. It provides a distributed, multitenant-capable, full-text search engine with an HTTP web interface and schema-free JSON documents. Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line, and scatter plots, or pie charts and maps on top of large volumes of data. ...

Read More »

The Rationale for Securing Big Data

This blog post is the first in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer. In his book, Davi outlines the rationale and key challenges of securing big data systems and applications. He does so using some great anecdotes and with good humor, making the book a good ...

Read More »

Apache Spark Packages, from XML to JSON

The Apache Spark community has put a lot of effort into extending Spark. Recently, we wanted to transform an XML dataset into something that was easier to query. We were mainly interested in doing data exploration on top of the billions of transactions that we get every day. XML is a well-known format, but sometimes it can be complicated to ...

Read More »

Real-Time Event Streaming: What Are Your Options?

With the Internet of Things expected to bring at least 21 billion devices online by 2020 (according to Gartner), a lot of people are excited about the potential value of event streaming, that is, ingesting and analyzing lots of real-time data for immediate decision-making. But streaming also introduces new concepts and components that need a closer look. This blog post is ...

Read More »

Counting in Streams: A Hierarchy of Needs

This post is based on the talk I gave at the Strata/Hadoop World conference in San Jose on March 31, 2016. You can find the slide set here, and you can also find this on the dataArtisans blog here. Continuous counting In this post, we focus on a seemingly simple, extremely widespread, but surprisingly difficult (in fact, an unsolved) problem in ...

Read More »

Can Event Streaming Make My Business More Productive?

Editor’s Note: Download the free O’Reilly ebook, “Streaming Architecture: New Designs Using Apache Kafka and MapR Streams” to learn how event streaming can make your business more productive. Can we agree at the outset that modern businesses rely heavily on data to make critical decisions, and the ability to make decisions in real time is very valuable? Good. So what keeps us from ...

Read More »

Key Steps for Removing the Hive Metastore Password from the Hive Configuration

In a typical Hive installation with metadata in a MySQL configuration, a password is configured in a configuration file in clear text. This presents a few risks: 1) Unauthorized access could destroy/modify Hive metadata and disrupt workflows. A malicious user could alter Hive permissions or damage metadata. 2) This password permits hiveserver2-thrift-MySQL communication. To avoid this problem, you should use ...

Read More »

Spark Data Source API: Extending Our Spark SQL Query Engine

In my last post, Apache Spark as a Distributed SQL Engine, we explained how we could use SQL to query our data stored within Hadoop. Our engine is capable of reading CSV files from a distributed file system, auto discovering the schema from the files and exposing them as tables through the Hive meta store. All this was done to ...

Read More »

Key Tips for Managing Passwords in Sqoop

Sqoop is a popular data transfer tool for Hadoop. Sqoop allows easy import and export of data from structured data stores like relational databases, enterprise data warehouses, and NoSQL datastores. Sqoop also integrates with Hadoop-based systems such as Hive, HBase, and Oozie. In this blog post, I will cover the different options available for managing passwords in Sqoop. Sqoop is ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns