1. Introduction This series of articles shows how one can process real time stream data using a number of technologies. The input data stream is flight data arriving real-time from a sensor (either one you can buy from Amazon or more advanced ones like civilian or military radars). To avoid the hassle of buying a sensor or connecting to one ...
Read More »Home »
Open-source collaboration, or how we finally added merge-on-refresh to Apache Lucene
The open-source software movement is a clearly a powerful phenomenon. A diverse (in time, geography, interests, gender (hmm not really, not yet, hrmph), race, skills, use-cases, age, corporate employer, motivation, IDEs (or,Emacs (with all of its recursive parens)), operating system, …) group of passionate developers work together, using surprisingly primitive digital tooling and asynchronous communication channels, devoid of emotion and ...
Read More »Processing real-time data with Storm, Kafka and ElasticSearch – Part 4
1. Introduction In the third part of this series of articles about real-time stream processing we learned how to import the .json flight data files to ElasticSearch using its bulk API as well as its low-level and high-level REST APIs. In this article we will introduce yet another way, Logstash. 2. What is Logstash Logstash is an open-source data collection ...
Read More »Processing real-time data with Storm, Kafka and ElasticSearch – Part 3
This is the third part of the article series: Processing real-time data with Storm, Kafka, and ElasticSearch. 1. Introduction In the second part, we learned how to perform searches in ElasticSearch. However, we failed to import the .json flight data files to ElasticSearch using its bulk API. In this article, we will do some programming, and learn some ways on ...
Read More »Processing real-time data with Storm, Kafka and ElasticSearch – Part 2
This is the second part of the article series: Processing real-time data with Storm, Kafka, and ElasticSearch. 1. Introduction In the first part we described the problem and how we are going to solve it. To refresh your memory, the plan is to create a Data Reduction System of historic flight data (which you can freely download from here). We ...
Read More »Processing real-time data with Storm, Kafka and ElasticSearch – Part 1
This is an article of processing real-time data with Storm, Kafka and ElasticSearch. 1. Introduction How would you process a stream of real or near-real time data? In the era of Big Data, there are a number of technologies available that can help you in this task. In this series of articles we shall see a real example scenario and ...
Read More »ElasticSearch Multitenancy With Routing
Elasticsearch is great, but optimizing it for high load is always tricky. This won’t be yet another “Tips and tricks for optimizing Elasticsearch” article – there are many great ones out there. I’m going to focus on one narrow use-case – multitenant systems, i.e. those that support multiple customers/users (tenants). You can build a multitenant search engine in three different ...
Read More »An AWS Elasticsearch Post-Mortem
So it happened that we had a production issue on the SaaS version of LogSentinel – our Elasticsearch stopped indexing new data. There was no data loss, as elasticsearch is just a secondary storage, but it caused some issues for our customers (they could not see the real-time data on their dashboards). Below is a post-mortem analysis – what happened, ...
Read More »Elasticsearch SQL
The Elasticsearch engine Elasticsearch is one of the most widely search engines being used in a number of production deployments today. It is based on the Lucene search library and one of the key features it provides is a JSON-based query DSL on top of Lucene that provides an easier to use mechanism for interacting with the search engine. However ...
Read More »