Knowledge-based algorithms (KBAs) of GE data analysis in complex disease
Maria Stepanova, Ganiraju Manyam, Zobair Younossi, Ancha Baranova
This is a collaborative project between
Molecular and Microbiology Department, College of Science,George Mason University, Fairfax, VA
Translational Reseach Institute, Inova Hospital, VA
In translational research, an analysis of the high-throughput data derived using microarrays is a typical first step of knowledge extraction. Usually, it culminates in the identification of genes and molecular pathways consistently affected (namely, up- or down-regulated) across a certain cohort of patient’s samples as compared to the normal tissue controls. However, current state-of-art in the translational research no longer recognizes obtaining gene expression (GE) profiles as the major challenge, but rather emphasizes on the statistically correct and biologically relevant interpretation of the results. Unfortunately, statistically enabled automated filtering of the biological information poses substantial problems that preclude a majority of the translational researchers from using these instruments.
Here we propose a pioneering project that will facilitate an introduction of the knowledge-based bioinformatics into translational research. Knowledge-based algorithms (KBAs) of GE data analysis relies on the previously documented knowledge of the molecular interactions and considers a collection of GE values not as a set of independent parameters but as tightly connected members of multi-components objects. Thus, this approach avoids model over-fitting and generation of biologically meaningless outputs, two major obstacles hampering the performance of the traditional algorithms. The development of the KBAs became possible only recently, as the sufficient amount of data systematically describing interactions between various biomolecules accumulated in public databases.
In frame of the proposed project, KBAs will be adapted to the problem of automated extraction of the biologically relevant data from the genome-wide GE profiles of human pathological specimens. As a model, we will use extensively studied, but insufficiently understood combination of the morbid obesity and insulin resistance. Recognized pathogenetic connection between these clinical conditions negatively affects their analysis by methods of traditional transcriptomics. Using this model, we will examine the applicability of the knowledge-based gene set enrichment analysis (GSEA) algorithm to translational medicine by selection of the diabetes related pathogenic features from the relatively “noisy” transcriptome background of the morbid obesity. We will use the cohort of 100 bariatric surgery patients whose liver and visceral fat samples have been previously collected and profiled in Inova (Baranova et al., 2004). We will validate our findings using comparable microarray datasets with complete annotation reported by other research groups and deposited in publicly available GEO database. The meta-analysis of these dataset will allow us to modify existing KBA techniques of the GE analysis in order to adapt them to the problems posed by translational research in a broad sense. Additionally, this project lies within the specific scope of our research group, as it aims to reveal molecular underpinnings of the insulin resistance and other pathologies developing in morbidly obese patients mechanisms.