Опубликован: 11 октября 2010
Hadoop provides a distributed storage and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm. Hadoop is partitioning data and computation across thousands of hosts, and executes application computations in parallel close to their data. A Hadoop cluster scales computation capacity, storage capacity and IO bandwidth by simply adding commodity servers. Hadoop is an Apache Software Foundation project; it unites hundreds of developers, and hundreds of organizations worldwide report using Hadoop. This presentation will give an overview of the Hadoop family projects with a focus on its distributed storage solutions.