Apache Drill – Interactive Query and Analysis at Scale (video)
Michael Hausenblas introduces Apache Drill, a distributed system for interactive analysis of large-scale datasets, including its architecture and typical use cases.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google’s Dremel system which is available as an IaaS service called Google BigQuery.
Drilling into Big Data with Apache Drill
Apache’s Drill goal is striving to do nothing less than answer queries from petabytes of data and trillions of records in less than a second.
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.
Google I/O 2012 – Crunching Big Data with BigQuery
Google BigQuery is a data analysis tool born from Google internal technologies. It enables developers to analyze terabyte data sets in seconds using a RESTful API. This session will dive into best practices for getting fast answers to business questions. We’ll provide insight into how we process queries under the hood and how to construct SQL queries for complex analysis.