Identifying the largest complete data set from ALFRED

Show simple item record

dc.contributor.advisor Osier, Michael - Chair en_US
dc.contributor.advisor Reynolds, Carl en_US
dc.contributor.advisor Halavin, James en_US
dc.contributor.author Uduman, Mohamed en_US
dc.date.accessioned 2006-05-24T15:03:01Z en_US
dc.date.available 2006-05-24T15:03:01Z en_US
dc.date.issued 2006-05-24T15:03:01Z en_US
dc.identifier.uri http://hdl.handle.net/1850/1876 en_US
dc.description.abstract ALFRED is a central and curated repository for allele frequency data for anthropologically defined human populations. To study and estimate the relationships and similarities between populations, researchers require a large and complete data set. However, the data set within ALFRED is not complete. Specifically, not all the populations in the database have been typed for all the polymorphisms. Mining ALFRED for the largest complete data set is equivalent to the 'Maximal Biclique' problem in graph theory. This is proven to be NP-Complete and no single algorithm can find the perfect solution in polynomial time. This project describes a heuristic (Largest Maximal Biclique Heuristic) which finds the largest complete data set from ALFRED, in real time. The program is compared to various other methods, including Wen- Chieh Chang's implementation of the 'maximal biclique' algorithm proposed by Alexe et.al. The algorithm efficiently mines ALFRED to extract the largest complete data set, and the results are made available for researchers in uniform data exchange format, through a Web site. Since ALFRED is updated frequently, the LMBH program is set up to mine ALFRED on a regular basis and provide researchers with the most up-to-date, largest complete data set from ALFRED. en_US
dc.format.extent 254592 bytes en_US
dc.format.extent 670155 bytes en_US
dc.format.mimetype application/pdf en_US
dc.format.mimetype application/pdf en_US
dc.language.iso en_US en_US
dc.subject ALFRED en_US
dc.subject Allele frequency en_US
dc.subject Biclique en_US
dc.subject Maximal biclique en_US
dc.subject Polymorphisms en_US
dc.subject Population genetics en_US
dc.subject SNP en_US
dc.subject.lcc QH455 .U48 2006 en_US
dc.subject.lcsh Population genetics--Data processing en_US
dc.subject.lcsh Genetic algorithms en_US
dc.subject.lcsh Data mining en_US
dc.subject.lcsh Allelomorphism en_US
dc.subject.lcsh Graph theory en_US
dc.subject.lcsh Bipartite graphs en_US
dc.title Identifying the largest complete data set from ALFRED en_US
dc.type Thesis en_US
dc.description.college College of Science en_US
dc.description.department Bioinformatics en_US
dc.description.approval 2006-05 en_US

Files in this item

Files Size Format View
MUdumanThesis052006.pdf 670.1Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record

Search RIT DML


Advanced Search

Browse