Home » Apache Drill

Tag Archives: Apache Drill

Connecting Pentaho Data Integration to MapR Using Apache Drill

Pentaho Data Integration (PDI) provides the ETL capabilities that facilitate the process of capturing, cleansing, and storing data. Its uniform and consistent format makes it accessible and relevant to end-users and IoT technologies. Apache Drill is a schema-free SQL-on-Hadoop engine that lets you run SQL queries against different data sets with various formats, e.g. JSON, CSV, Parquet, HBase, etc. By ...

Read More »

Connecting to Apache Drill with Power BI (Part 3)

In my first post, I showed how you might quickly deploy a Drill-enabled cluster to the Azure cloud using the MapR template available in the Azure Marketplace. In my next post, I showed you how you might get that Drill-enabled cluster to query an Azure Storage account as well as an Azure SQL Database. In this post, I want to ...

Read More »

Deploying Drill on MapR in the Azure Cloud

Earlier this year, I published a series of posts on the deployment of Apache Drill to Azure. While the steps covered in those posts work, I’d like to speed up the process significantly.  With the MapR Converged Data Platform available in the Azure Marketplace, I can have a Drill-enabled MapR cluster up and running much faster and with much less ...

Read More »

Apache Drill SQL Queries on Parquet Data

In this week’s Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet. Additional Apache Drill resources: ”Overview Apache Drill’s Query Execution Capabilities” Whiteboard Walkthrough video “SQL Query ...

Read More »

How to Guide: Getting Started with Apache Drill

java-interview-questions-answers

Apache Drill is an engine that can connect to many different data sources, and provide a SQL interface to them. It’s not just a wanna-be SQL interface that trips over at anything complex – it’s a hugely functional one including support for many built in functions as well as windowing functions. Whilst it can connect to standard data sources that ...

Read More »

Big Data SQL: Overview of Apache Drill Query Execution Capabilities – Whiteboard Walkthrough

In this week’s Whiteboard Walkthrough, Neeraja Rentachintala, Senior Director of Product Management at MapR Technologies, gives an overview of how open source Apache Drill achieves low latency for interactive SQL queries carried out on large datasets. With Drill, you can use familiar ANSI SQL BI tools, such as Tableau or MicroStrategy, plus do exploration directly on big data. For additional ...

Read More »

SQL Query on Mixed Schema Data Using Apache Drill

You may have heard this statement before:      Apache Drill does schema discovery on-the-fly. What does that mean, and why should it matter to you? The power of SQL for business analytics is a given, but the challenge in big data settings is that SQL is normally a static language that assumes pre-defined, fixed and well-known schema. SQL also needs flat ...

Read More »

Resolving JSON schema Changes with Drill and Python

Drill is a fantastic tool for querying JSON data. But Drill isn’t magical, and sometimes it runs into some data that it can’t quite handle (yet). This post walks through an example of such a scenario, and how you might work through the issue using a little bit of Python code. Scenario You have data where the schema changes. In ...

Read More »