Connected in a small world: Rapid integration of heterogenous biology resources

Show full item record

Redirect: RIT Scholars content from RIT Digital Media Library has moved from to RIT Scholar Works, please update your feeds & links!
Title: Connected in a small world: Rapid integration of heterogenous biology resources
Author: Park, Sang P.; Song, Carol X.; Topkara, Umut; Woo, Jungha
Abstract: Timely access to the most up to date versions of resources, such as data and software, is of paramount importance for researchers in an active field like Biology. We introduce a grid enabled biological data and software collection portal architecture, SALSA (a Scalable Simple Architecture), that is tailored towards fast integration of new computational resources made available by ever faster advancing and diversifying research in this area. We identify two models that guide the design of SALSA: heterogeneous database model and network growth model with preferential attachment. SALSA recognizes the challenges that are noted by the previous research on heterogeneous database model inherent in biological database resources; these resources are autonomously managed and lack a common database schema. SALSA is also guided by a model for the growth of the portal’s collection (of data and associated software to process this data) from previous research on related collections (e.g. citation networks and software package dependencies). This model suggests that in the presence of components that have a higher likelihood of gaining new connections (e.g., popular resources such as BLAST or FASTA sequences), the relationships between components tend to organize in a small-world scale-free network. The growth model helps the portal developers identify important hub components that emerge by taking part in increasing number of tasks as the portal grows. In order to effectively improve the overall user experience, developers can direct expensive development efforts (e.g., query optimization, user interface, documentation, etc.) to hub components, rather than to specialized components that have a lesser likelihood of developing to become hubs. In this paper we discuss a grid enabled web portal implementation that is built to contain a growing collection of biological data and software to process this data. The implementation that we present is a realization of Scalable Simple Architecture (SALSA) that strives to rapidly integrate newly published components into the existing collection in a sustainable fashion. Notably, this implementation uses flexibility of XML for component management, XSL for web user interface, SRB and MCAT for large data storage.
Record URI:
Date: 2006

Files in this item

Files Size Format View
UTopkaraConfProc11-12-2006.pdf 1.175Mb PDF View/Open

The following license files are associated with this item:

This item appears in the following Collection(s)

  • International Workshop on Grid Computing Environments--Open Access (2006)
    The purpose of this Workshop is to bring together the community to discuss and foster the exchange of ideas for the development of tools, methodologies and frameworks to build Grid computing environments. Last year’s conference at SC05 focussed on tools for Grid Portals. This years conference will broaden the discussion to also include efforts of other tools that make the development and use of Grids more easily accessible from the desktop.

Show full item record

Search RIT DML

Advanced Search