HadoopT - breaking the scalability limits of Hadoop

Show simple item record

dc.contributor.advisor Kwon, Minseok en_US
dc.contributor.author Talwalkar, Anup
dc.date.accessioned 2011-02-24T16:55:39Z
dc.date.available 2011-02-24T16:55:39Z
dc.date.issued 2011
dc.identifier.uri http://hdl.handle.net/1850/13321
dc.description.abstract The increasing use of computing resources in our daily lives leads to data generation at an astonishing rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data. It has encouraged the development of cluster based storage systems. Hadoop is a popular open source framework known for its massive cluster based storage. Hadoop is widely used in the computer industry because of its scalability, reliability and low cost of implementation. The data storage of the Hadoop cluster is managed by a user level distributed file system. To provide a scalable storage on the cluster, the file system metadata is decoupled and is managed by a centralized namespace server known as NameNode. Compute Nodes are primarily responsible for the data storage and processing. In this work, we analyze the limitations of Hadoop such as single point of access of the file system and fault tolerance of the cluster. The entire namespace of the Hadoop cluster is stored on a single centralized server which restricts the growth and data storage capacity. The efficiency and scalability of the cluster depends heavily on the performance of the single NameNode. Based on thorough investigation of Hadoop limitations, this thesis proposes a new architecture based on distributed metadata storage. The solution involves three layered architecture of Hadoop, first two layers for the metadata storage and a third layer storing actual data. The solution allows the Hadoop cluster to scale up further with the use of multiple NameNodes. The evaluation demonstrates effectiveness of the design by comparing its performance with the default Hadoop implementation.
dc.language.iso en_US
dc.subject Hadoop en_US
dc.subject.lcc QA76.9.D5 T34 2011
dc.subject.lcsh Electronic data processing--Distributed processing
dc.subject.lcsh Open source software
dc.title HadoopT - breaking the scalability limits of Hadoop
dc.type Thesis
dc.description.college Thomas Golisano College of Computing and Information Sciences
dc.description.department Department of Computer Science

Files in this item

Files Size Format View
ATalwalkarThesis1-2011.pdf 1.049Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record

Search RIT DML


Advanced Search

Browse