Progress beyond the state-of-the-art

Complex diseases like diabetes, allergy and cancer depend on altered interactions between large numbers of genes, many of which do not belong to known disease mechanisms. Genome-wide association studies performed by one of the applicants have for example described such genes in diabetes (Sladek et al. Nature 2007). In allergic disease DNA microarray have shown changes of expression of hundreds of genes (Benson et al. J Allergy Clin Immunol 2004, 2006). Emerging high-throughput technologies indicate disease-associated changes in other layers and regulatory elements, for example copy number variants, DNA methylation and microRNAs (Hardiman et al. Pharmacogenomics 2006). On top of this complexity there is considerable individual heterogeneity. A clinical consequence of this is variable response to treatment, which increases both suffering and cost. Personalized medication has therefore been highlighted as a priority. At present, however, there are only a few examples that have reached the clinic (Fox JL. Nat Biotechnol 2007).

One approach to functionally understand gene expression changes in complex diseases may be to change the scale from individual genes to groups of functionally related genes. Such genes may be identified with bioinformatics methods, like cluster analysis, that group genes whose expression levels correlate. These clusters can be used for classification, for example of different lymphomas (Alizadeh et al. Nature 2000). However, the analysis in itself does not give any functional understanding of disease mechanisms. One possibility to obtain such understanding is to search the gene expression data for genes known to belong to specific pathways (Benson et al. Cytokine 2002). A problem with this is that complex diseases often involve multiple interacting pathways, which may be difficult to separate from each other. Rather, they form sub-networks or modules. Such modules have been identified in studies of cancer and functionally annotated (Segal et al. Nat Genet 2005). Networks provide a compelling framework to organize and functionally understand complex systems (Barabasi et al. Nat Rev Genet 2004, Mustacchi et al. Yeast 2006). In the context of gene expression data in human cells, network-based analysis has been applied to form networks of interacting genes and dissect those networks to find modules and pathways (Jenssen et al. Nat Genet 2001, Calvano et al. Nature 2005). The same analytical methods have also been applied to human disease and used to go from modules to individual genes (figure 1). The corresponding proteins have been tried as diagnostic markers in human disease (Benson et al 2006. J Allergy Clin Immunol 2006). The latter study was also based on computational methods developed by one of the applicants in studies of inbred mice, in which altered gene expression patterns data were used to find genetic variants that caused those alterations (Chesler et al. Nat Genet 2005). Linking gene expression changes to genetic variants has also been performed in human cells (Bystrykh et al. Nat Genet 2005). These studies show how changes in a transcriptional module can be used as a template for further studies:

  • to find the corresponding changes in other layers (in the examples above the DNA and protein layers)
  • since the layers are interdependent these dependencies can be used to cross-validate findings in different layers and build multi-layer modules (MLM) that include data ranging from DNA to protein.
  • such MLM can be used clinically, to find diagnostic protein markers. To our knowledge this has not been previously performed. Most high-throughput analyses of complex diseases focus on one layer and rarely perform clinical or experimental validation studies. This would require solving several and diverse problems that are outlined below:

Problems in high-throughput studies of complex diseases

  1. Finding an optimal disease model for large-scale studies. Many complex diseases are heterogenous or have unknown or diverse causes (e.g. cancer). The disease-causing cells or tissues may only be partially known or not readily accessible in humans (e.g. stroke).
  2. Many genes have unknown or partially known functions
  3. The need to develop computational and bioinformatics methods to build multi-layer modules
  4. The need to assess goodness-of-fit taking into account the large number of possibilities, which can lead to problems of multiple testing and over-fitting to noisy data.
  5. Experimental validation of disease mechanisms that may involve hundreds of genes, many of which have unknown or poorly defined functions

In this project we address these problems as follows:

  1. We focus on seasonal allergic rhinitis (SAR), because it is common, relatively homogenous and has a known external cause (pollen). It is possible to reduce heterogeneity by studying unique materials, such as concordant and discordant monozygous twins. The main disease-causing cell, CD4 + cells, is known. We use two experimental models; 1) allergen-challenged CD4 + cells from patients and controls are analyzed with high-throughput methods to find modules, pathways and key regulatory genes as described in figure 1 (Benson et al. Genes Immun 2006). In addition, these cells are used for experimental studies of individual genes; 2) a mouse model of allergy in which wild type and knockouts are compared is also used for functional studies of individual genes (Benson et al. J Clin Invest 2007, submitted). In order to study human allergic inflammation in vivo and test diagnostic markers we examine nasal fluids and biopsies as well as skin from patients and controls before and after allergen challenge.
  2. Figure 1. Network-based analysis of DNA microarray data. A) Genes identified by DNA microarray analysis of allergic disease (red) are mapped on to an interaction network formed by all human genes (grey). B) Modules of interacting genes that represent disease-associated biological functions are identified C) A module is dissected to find a pathway D) The pathway is analysed to find a putative disease-causing gene (in this case exemplified by an up-stream gene)

  3. the roles of genes with unknown or partially known functions are defined using a combination of bioinformatics and high-throughput RNAi as described by the applicants (Murali et al. Nat Biotechnol 2006, Sonnichsen B et al. Nature 2005, Echeverri CJ et al. Nat Methods 2006, Nat Rev Genet 2006). In addition text mining algorithms through the customization of the one of the partners technology (Jenssen et al. Nat Genet 2001) will provide predictions of gene functions in a context-specific way.
  4. novel statistical algorithms will be developed as described in Workpackage 4.
  5. we develop and integrate state-of-the art methods to build MLM; combinatorial algorithms to find putatively co-regulated genes (Chesler et al. Nat Genet 2005), organize those genes into modules using network models of the human interactome from results of context specific text mining algorithms (Jenssen et al. Nat Genet 2001) and manual curation (Calvano et al. Nature 2005) as well as other bioinformatics sources.
  6. Validation studies are performed with a combination of experimental, genomic and bioinformatics methods. Examples include blocking experiments with antibodies or RNAi.

Progress

To our knowledge this is the first project that aims to define multi-layer modules (MLM) in a complex disease and use them for a clinical goal, to personalize medication. This involves development and integration of novel computational and bioinformatics methods based on a systems biological framework. If successful, the project may serve as a model for studies of other complex diseases. The analytical methods will be made available on the Internet in a standardized format for such studies. The project may also increase understanding of the relative role of different layers and elements, as well as of genes with presently unknown functions in complex diseases.

Navigation