HadoopT - breaking the scalability limits of Hadoop

Show full item record

Redirect: RIT Scholars content from RIT Digital Media Library has moved from http://ritdml.rit.edu/handle/1850/13321 to RIT Scholar Works http://scholarworks.rit.edu/theses/467, please update your feeds & links!
Title: HadoopT - breaking the scalability limits of Hadoop
Author: Talwalkar, Anup
Abstract: The increasing use of computing resources in our daily lives leads to data generation at an astonishing rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data. It has encouraged the development of cluster based storage systems. Hadoop is a popular open source framework known for its massive cluster based storage. Hadoop is widely used in the computer industry because of its scalability, reliability and low cost of implementation. The data storage of the Hadoop cluster is managed by a user level distributed file system. To provide a scalable storage on the cluster, the file system metadata is decoupled and is managed by a centralized namespace server known as NameNode. Compute Nodes are primarily responsible for the data storage and processing. In this work, we analyze the limitations of Hadoop such as single point of access of the file system and fault tolerance of the cluster. The entire namespace of the Hadoop cluster is stored on a single centralized server which restricts the growth and data storage capacity. The efficiency and scalability of the cluster depends heavily on the performance of the single NameNode. Based on thorough investigation of Hadoop limitations, this thesis proposes a new architecture based on distributed metadata storage. The solution involves three layered architecture of Hadoop, first two layers for the metadata storage and a third layer storing actual data. The solution allows the Hadoop cluster to scale up further with the use of multiple NameNodes. The evaluation demonstrates effectiveness of the design by comparing its performance with the default Hadoop implementation.
Record URI: http://hdl.handle.net/1850/13321
Date: 2011

Files in this item

Files Size Format View
ATalwalkarThesis1-2011.pdf 1.049Mb PDF View/Open

The following license files are associated with this item:

This item appears in the following Collection(s)

Show full item record

Search RIT DML

Advanced Search