Blogs written by Savita Jayaram, Ph.D., Bioinformatics Scientist

Independent of the platform used, all high-throughput data coming from genomics, transcriptomics, microRNAomics, proteomics or metabolic experiments give us a list of differentially expressed entities. The common challenge for analysis of such large-scale data involves understanding the complex interactions taking place in the context of pathways and interpreting the underlying biological phenomenon. In addition to the technical and experimental biases based on sample variability and handling, existing methods for pathway analysis come with their own set of biases. The conventional approaches used Gene Ontologies to categorize the over or under-represented data in a given condition under study and this approach (Over Representation Approach; ORA) is still integrated into several of the currently available software tools. The main drawback of this approach was that the changes were considered independently without a unifying approach to interpret the changes. That is when pathway analysis came in to account for the systems-level dependencies and interactions. More than 500 different pathway-related resources such as KEGG, BioCarta, Reactome, BioCyc and others have been listed in the pathway resource list (PathGuide; http://www.pathguide.org) to enable the pathway analysis.

However, interpreting based on pathways alone may be unsatisfactory due to the following reasons:

  1. The trigger for a given pathway may be a single gene product (receptor/ligand), in which case the pathway can be completely shut off if that ligand or receptor is affected. However, if the pathway is regulated by several receptors or ligands, the expression level changes in the pathway will be under multiple regulatory controls and shutting off one will not turn off the pathway.
  2. Some genes are multifaceted and involved in many pathways where they may have similar or different functions subject to different triggers. Biological interpretation in such cases becomes more complex and expression level changes downstream will need to be looked at more carefully considering these biases.
  3. Some pathways such as TGFb behave in complex ways in different stages of disease; it is anti-proliferative in the early stages of tumor growth, while in late stages it causes immunosuppression and angiogenesis promoting invasion and disease progression. Patients in whom different components of TGFb may be deleted, behave differently.
  4. All data in pathway resources suffer from low coverage of the human proteins. Currently, Reactome contains roughly 9000 of the 20,500 human proteins represented in pathways.

From our experience of mapping, the differential data to pathways using GeneSpring or Ingenuity Pathway Analysis tools showed half of the differential data were unmapped to pathways. This is not surprising as nearly half of the human proteome is not even represented in the said pathways. In contrast, networks built using the protein-protein interaction (PPI) databases such as, HPRD, STRING, BioGrid, IntACT, MINT and others offer higher coverage of the human proteome. Networks can better represent the interconnectivity between different pathway entities or offer a way to connect entities for which pathway information is not available. However, a word of caution here as a physical interaction does not always indicate a biologically functional relationship. Also, often the networks use indirect lines of evidence including, text mining, co-expression, gene neighborhoods and orthologous information that can yield a lot of false positives. One way around this problem is to create ‘pathway-informed’ functional interaction networks that combine curated experimental information from the pathway and network databases pertaining to a functional module.

Finally, a systems biology approach (considering changes at DNA, mRNA, miRNA, protein and metabolite levels) under the same biological perturbation keeping other biologically and technical biases minimal, is essential for biological interpretation as the molecular changes that are happening at multiple levels can impact the system as a whole.

Leave a comment

Tag Cloud