We are an interdisciplinary group dedicated to pioneering innovative approaches and algorithms that integrate multi-dimensional data across scales to understand, predict and manipulate complex biological systems. The key research activities in the IBSE are – (i) Biological network analysis, which includes the prediction of molecular function in cells and populations, and environmental and evolution-dependent modulation of networks; (ii) Development of algorithms using machine learning, graph mining, linear programming, to study protein folding pathways, predict protein function, define novel metabolic pathways and understand pathogenesis by disease-causing mutations. This page lists a sampling of some of the ongoing projects at IBSE.

Ongoing Projects
Single cell analysis for understanding infection in macrophages

Karthik Raman, Varadharajan Sundaramurthy (NCBS)

Single cell experiments were carried out to see what morphological features help or inhibit infection of cells. The data contains cell features along with lysosome and bacterial intensities for single cells after infection by. Further, different genes were silenced to study their effects on infection.

miRNA-34b specific regulatory network in cervical cancer

Nalini Venkatesan, Ashley Xavier, Himanshu Sinha, D Karunagaran

miR-34 family consists of three microRNAs, namely miR-34a , miR-34b and miR-34c. The loss of expression of miR-34 has been linked to resistance against apoptosis and is frequently found to have aberrant expression profiles in various tumors. We use the TCGA RNA-seq and miRNA-seq data on cervical cancer to try and predict the more prominent targets of these miRs in the specific tumor site. We are also exploring for any links to the Epithelial-Mesenchymal Transition(EMT) pathway which contributes to metastasis in cervical cancer and the differences in microRNA regulation between cancers originating from different tissues by using the TCGA data on uterine, ovarian, colorectal and oral cancers since they are epithelial in origin just like cervical cancer.

Longitudinal data analyses for prediction of onset of gestation diabetes

Sriraghav Srinivasan, Himanshu Sinha, Radha Venkatesan (MDRF)

Mining evolutionary gene expression patterns across tissues in amniotes

Gaurav Bilolikar, Janu Sahana, Himanshu Sinha, Manikandan Narayanan

A common approach towards the study of the biological evolution is through whole genome sequence comparisons. Although this approach provides the overall history of life, it remains inadequate while describing the phenotypic divergence observed between closely related species. Changes in gene expression has been thought to underlie many of these phenotypic differences between species and hallmarking the tissue specific phenotypes. Transcriptome data from multiple tissues and species provides an opportunity to investigate genes based on their differential expression patterns in tissue evolution. In this study, we used transcriptome data of 6 tissues from 10 species, viz. human, gorilla, orangutan, chimpanzee, bonobo, macaque, mouse, opossum, platypus and chicken (evolutionary out-group for mammals) to obtain gene expression patterns with divergence time from humans as longitudinal constraint. We observed that genes involved in similar biological processes show similar patterns and cluster together in tissues, which might help us unmask the genetic causes that underlie the phenotypic divergence between species.

Some previous studies have illustrated that it is possible to predict protein essentiality, to a high degree of accuracy, based merely on network data for organisms such as yeast [17, 18]. This begs the question if such approaches can be applied to other organisms, and if more network properties can be leveraged to make better predictions of essentiality, as has been done routinely in other fields. The key questions we seek to answer here are: (i) Is the essential role of a protein heavily determined by its position in the network, and its interacting partners and ‘social neighbourhood’? (ii) Is it possible to reliably infer essentiality of genes, based on the massive amounts of interactome data?

Evolution of expression profiles was studied using data generated by Brawand et al. (2011). The data contains RNA sequencing data from 6 tissues across 10 species constituting 132 samples in total. Expression levels of 5320 orthologous genes across all species were considered for this analysis. The aim of the analysis was to identify set of genes which had the greatest influence on the evolution of expression of profiles.Analysis of the data revealed that multiple gene subsets could be identified with the ability to differentiate between tissues. Since expression of genes are correlated, feature selection picks any one of these correlated genes.

Building biological networks for specific biological functionalities is one of the major problems of synthetic biology. Given the complexity and the diverse nature of biology, it is always challenging to establish a mapping between the model structure space to function space. Previous efforts in this area have attempted a brute force method exploring the entire topology parameter space. On the contrary, this project strives to build a systematic procedure inspired by LTI systems theory to obtain the exact mapping between the topology and its functionality. The entire problem is set up as a design-oriented problem as against involving brute force method. The necessary conditions for the functionality are translated to corresponding LTI system requirements. Subsequently, a structure-specific optimization problem is performed to obtain robust parameter space.

Driver genes are defined as genes that help in progression of cancer by giving the cell the ability to divide indefinitely. These genes can further be divided into tumour suppressor genes(TSG) and oncogenes(OG) based on function. Depending on whether driver genes, while functioning normally, help suppress or promote tumour development, the mutation type patterns vary between the two. Thus, identification of driver genes by classifying into these 2 sub-classes may give better results. The objective is to integrate multiple omic data to get better classification power and identify low potency driver genes by classifying them into TSG and OG.

Quantitative Systems Pharmacology models are mostly grey box mathematical models that try to capture the interaction between drug and the physiology at a systems level using both knowledge of biology and empirical data. This project attempts to address the issues of modeling by systematic investigation employing system identification theory as prime tool.

Predicting Novel Metabolic Pathways through Subgraph Mining

Aravind Sankar, Sayan Ranu and Karthik Raman

In this project, we tackle the problem of retrosynthesis by employing subgraph mining. Our novel approach to represent chemical reactions results in a fully automated scalable and accurate algorithm.

Network Analysis of Protein Folding Pathways

Soundharajan Gopi, Sayan Ranu & Athi N Naganathan

Identifying context of mutation signatures in cancer genomes

B Ravindran, Raghunathan Rengaswamy, Karthik Raman, Ashok Venkitaraman (MRC Cambridge)

Genomic instability is a defining hallmark of cancer cells. Previous studies have identified “signatures” or patterns of mutations in different forms of cancer that gives us clues regarding the underlying mechanisms that trigger these alterations. In this project, we look at the genomic context of the cancer-associated mutations. The context used in our analysis is that of bases in the immediate neighborhood (5’ and 3’ end) of the mutated base. A count of these is recorded for a given mutated base, developing a mutation matrix which is decomposed into a mutational signature matrix and contribution matrix, using non-negative matrix factorization. This is how we obtain our signatures.

Metabolic plasticity and robustness

Gayathri S, Himanshu Sinha & Karthik Raman

Metabolic networks tend to exhibit plasticity with respect to the growth environments. We aim to understand the role of the various biochemical reactions in metabolic plasticity and also deciphering the metabolic “constraints” underlying growth and survival in various environments. We seek to comprehend the above through theoretical simulations using validated genome-scale metabolic models of organisms such as Escherichia coli and Saccharomyces cerevisiae.

Role of gut microbiota in autism

Meghana Palukuri & Swagatika Sahoo & Raghunathan Rengaswamy

The Autism project is first of its kind, where in, hybrid modeling technique has been incorporated, integrating steady-state modeling (i.e., COBRA) and semi-detailed kinetic modeling methods (i.e., physiology based pharmacokinetic model, PBPK). Genome-scale metabolic models of five important gut-microbes [1, 2] (implicated in Autism) were combined with the enterocyte model [3] to represent the human gut. Further this gut model was coupled with brain model [4] and integrated with PBPK model. The PBPK model accounts for six tissue types that are actively involved in metabolism and excretion of gut bacteria-derived metabolites. With such an exhaustive modeling system, the effect of bacteria-derived toxins (i.e., superoxide and hydrogen peroxide) was analyzed on brain and gut, and effective dietary treatment options (i.e., antioxidants) were predicted that could improve autistic symptoms. Further, this study computationally re-confirmed that a dysfunctional mitochondrial metabolism in the brain along with disturbed neurotransmitter synthesis, mainly drives Autism. This is the first ever study that truly integrates steady-state and dynamic modeling approaches, which not only aids in identification of the precise biochemical factors contributing to a clinical condition (i.e., Autism), but also, quantified host-microbe interactions in a complex system.

Identification of Disease Modules: DREAM challenge

Beethika Tripathi, Karthik Azhagesan, B Ravindran, Himanshu Sinha & Karthik Raman

Disease Module Identification was a part of DREAM challenge which was organized in late 2016. The objective of the challenge was crowdsourcing the community efforts to benchmark diverse community detection or graph clustering approaches across diverse types of genomic networks, in order to compare strengths and limitations of alternative approaches. The task was to identify disease-relevant modules for individual networks or by sharing information across multiple networks. The problem was hard because no biological information was supposed to used and module identification was supposed to be done purely on the basis of the network structure. The other main issue is related to the properties of the disease modules. The disease modules tend to be typically small in comparison to the size of the networks. Most of the community detection algorithms fail to identify small communities when the network is large. We have used the notion of identifying ‘core communities’ which are ‘small and structurally well defined’ communities. These are internally well connected as well as well separated from the rest of the network. This way of doing community detection showed significant improvement than ‘off-the-shelf’ community detection approaches.

Fast-SL: efficient enumeration of synthetic lethals in metabolic networks

Aditya Pratapa, Shankar Balachandran & Karthik Raman

This project involved the identification of synthetic lethals in metabolic networks, by applying a novel reduction of search space.