Veuillez nous excuser, ce PDF peut uniquement être téléchargé

Apache Hadoop Framework Spotlight: Apache Hive

Query and Manage Large Data Sets with Apache Hive*

Many people who are familiar with the Apache Hadoop* framework think of Hive* as a SQL engine, a way to automatically compile a SQL query into a set of MapReduce jobs and then run them on a Hadoop* cluster. While this is accurate, the things that make Hive* really novel are its facilities for data and metadata management.

MapReduce is a very flexible programming paradigm, but most users find it too low-level for everyday data analysis tasks. Almost from the day Hadoop* was introduced, people began looking for ways to express their data analysis tasks using higher-level abstractions built on top of MapReduce. The engineers at Facebook*, who built the first version of Hive*, decided to use SQL as their higher-level language due to its widespread adoption and also because a majority of their analysts already knew how to use it.

Anyone with even a little bit of prior SQL experience will be able to come up to speed quickly with Hive*. Hive* supports large portions of the SQL-92* standard, as well as several extensions that are designed to make it easier to interact with the underlying Hadoop* platform.

Read the full Query and Manage Large Data Sets with Apache Hive* Spotlight.

Vidéos associées