Dr. T. Madhan Mohan

Invited Guest Article


Bioinformatics, an emerging area offering a fundamental tool to the scientific community, particularly the biologists, to speed up the research, application and commercialization of biotechnology. In fact the best thing, which has happened towards the end of the 20th century and in the 21st century, is the marriage between biotechnologists and information technologists leading to the growth and development of this field. India took a lead as early as in the 1980s but more precisely from 1986 onwards by establishing a strong base of Bioinformatics with the necessary infrastructure. As a result one has seen major breakthroughs in biology and the growth of biotechnology has been phenomenal in the last decade, specially with the most outstanding technological breakthrough of the 20th century wherein the draft human genome sequence was completed in 2000 and now the genetic code is completely sequenced. This has provided the world scientific community information on the vast sequence and structure of the Genomes, the crystal structures and there is an increasing dependence on computational approaches. Biotechnology has emerged as a front-line area with vital significance in unraveling secrets of life, particularly in the studies of new biology and biotechnology. The genomic revolution has underscored the central role of Bioinformatics in understanding the very basic of life processes.

Let us start with the issue of the genetic resources and precious biodiversity of India as also of the world. The need to conserve, protect and utilize them on a sustainable basis is the need of the hour. Genetic resources are made to improve the agricultural productivity and are the vital components of the global environment. Gene Bank concept has become very important, both for ex-situ conservation and the utilization of the valuable germplasm on a long -term basis.

In fact, world over the scientists are associating the computers and the new biology looking at a speed with which data collection is done, automated high throughput data gathering, i.e. sequencing the whole genome and the web based data “Warehouses” integrate the information from the research efforts from all over the world. We know that there will be more than 3-27 million undiscovered species of plants and animals on the planet earth. However, through the endeavor of the scientists, hundreds of new species are discovered and classified every day. The accelerated progress of information technology has completely transformed the research in life sciences. The focus on genome technologies has made it still more relevant to establish very close linkages between computers and experimental biologists. There are many developments, for e.g. Enterprise Bioinformatics, high system computing, genomic data and database integration and DNA amplification and sequencing, genome assembly and gene types, web based Bioinformatics, biochips and gene expression analysis and Proteomics and mass spectrometry.

Recent studies have shown that even the implication of mathematical theory in cell biology have become very important. In fact Leroy Hood from the Institute for Systems Biology, in Seattle says “ the future will be the study of genes and proteins of organisms in the context of their informational pathways and networks”. Similarly the areas of modular biology, synthetic biology is at the forefront of scientific excitement.

The announcement of the remarkable success of the human genome project supported in 1986 and making the draft sequence available to the people was a landmark in the history of modern biology and science. This has generated tools to produce the whole gene catalogues for many microbes, the plant Arabidopsis thalina, the fruit fly Drosophila melanogaster, the roundworm Caenorhabditis elegans, and soon the puffer fish Fugu rubripes. The complete genome sequences have almost served as recipes for life and the foundation of modern biology in the 21st century aimed at the biological goals as defined bye the scientific community world over as “Achieve a fundamental, comprehensive and systematic understanding of life”.

It is important now that scientists work towards characterization of the full repertoire of molecular machines of the living systems. Also they must understand how these machines are orchestrated with the life system of single cell and complex multicellular organisms.


Bioinformatics or Life Science Informatics is a young, fast developing science branch, dealing with storage, analysis integration and simulation of molecular biological data. Bioinformatics is an applied discipline that utilizes computational tools to conceptualize biology. It has emerged as a cutting edge technology and a knowledge revolution. Extraordinary growth of information technology and unprecedented advances in molecular biology and recombinant DNA techniques have ushered in the age of Bioinformatics.

Human kind is on the brink of another revolution. There is no doubt that the mapping of the human genome, completed in June 2000, is one of the greatest scientific advancements in history. This breakthrough in biological research was made possible by advancements in using Bioinformatics and computational Biology. Areas such as proteomics, Genomics, combinatorial chemistry, statistics, nanotechnology, spectroscopy and structural and computational Biology will have increasing applications of Bioinformatics in days to come.


The major goal of bioinformatics is to obtain the complete sequences of as many different genomes as possible. By having that sequence information, companies and research organizations can start to do what is commonly referred to as "sequence-based biology". They take that sequence information, and use it to give scientists more direction as to how they should design experiments, and how these scientists should analyze the experiments. So bioinformatics is causing a fundamental shift in how scientists actually approach molecular biology. The "Holy Grail" for bioinformatics is to map all the genes in the human body and decipher what the role each gene and its associated base pair plays in the expression of the gene.


Bioinformatics is becoming increasingly important due to the interest of the pharmaceutical industry in genome sequencing projects. There is a vital need to harness this information for medical diagnostic and therapeutic uses, and there are opportunities for other industrial applications. This field is evolving rapidly, which makes it challenging for biotechnology professionals to keep up with recent advancements. The area has evolved to deal with four distinct problems viz. (i) Handling and management of biological data, including its organization, control, linkages, analysis, and so forth.  (ii) Communication among people, projects, and institutions engaged in the biological research and applications. The communication, which includes e-mail, file transfer, remote login, video conferencing, electronic bulletin boards and establishment of web-based information resources (iii) organization, access, search and retrieval of biological information, documents, and literature; and (iv) analysis and interpretation of the biological data through the computational approaches including visualization, modeling & simulation, and development of algorithms for highly parallel processing of complex biological structures.  The following are the thrust areas of Biotechnology, which are fully dependent on Bioinformatics:


The assumption that the similarity of two sequences whether it is DNA, RNA or protein implies functional correlation. Some of the most successful bioinformatics applications deal with this kind of analysis.  Several software tools have been developed like BLAST that efficiently performs sequence-alignments against large databases of known sequences. It became a routine task to compare a new sequence against several databases.


To investigate genes in their cellular context, expression analysis via Microarray and DNA-chips takes place. The comparison of expression patters of well-defined metabolic states allows identifications of pathological phenotypes on a molecular level. Based on the identifications a reverse engineering of pathways and localization of pharmaceutical targets is possible.


The publication of entire genome sequences led to a shift of interests from pure DNA sequencing to Protein localization and characterization within their cellular context. This became necessary as one gene can give rise to a number of products via mechanisms such as splicing or post-translatoric modifications. The proteome is refers to the identification and analysis of all proteins of a cell. This involves the determination of structures as well as the identifications of protein interactions and function in biological pathways.


Structural Genomics covers the calculation of three-dimensional structures based on the sequence of a macromolecule. The theoretical basis of the relationship between sequences and structure is the most fundamental problem of in silico biology. From a computational point of view the so-called folding problem is the most demanding objection of computational biology. Nevertheless only the knowledge of the structure of a protein can provide a deeper understanding of its function.


The development of drugs aims to maximize effect and minimize side effects. It would therefore be very convenient to personalize drugs for each patient. The genetic variations among all human is only 0.1% of the total DNA. The differences are mostly point mutations, having phenotypic impact. These so-called SNPs (Single Nucleotide Polymorphisms) become good candidates for drug development and diagnosis. With the investigation of SNPs using Microarrays the investigation of entire populations becomes feasible. Databases of SNPs will reveal the patterns that cause cancer or Altzheimer.


If sufficient data is available and all relevant components for life are identified more complex interactions can be investigated. For a holistic biological understanding of cell, simulations of cells, entire organisms and populations provide new insights. The simulation of life in silico is a future directive for bioinformatics that started now.


India’s achievements and strategic advantage in information technology and biotechnology have positioned the country to play a crucial role in the post-genomic era. This role would range from analysis of genes and structures, defining metabolic pathways to identification and validation of drug targets and classification of the natural resources based on genetic profiles. Biotechnological applications in agriculture, medicine and environment protection and biodiversity conservation have not only scientific relevance but also offer opportunities for overall sustainable development, most desirable for 21st Century. The countries rich biodiversity and bioresources can reap the harvest from these scientific breaks through.

With the objective of integrating the country’s vast scientific expertise in the different areas in biotechnology and computational sciences, the Department of Biotechnology envisioned in 1986 the establishment of a distributed information network. The network was to provide a common platform for exchange of information among the scientific community and in the process speed up the scientific discoveries. The network, referred to as the Biotechnology Information System Network, (BTISnet) presently consists of sixty-one centres spread all over the country. Over the years, the network has created a resource base including computational infrastructure, databases and software. Human resource development through short and long-term programmes, research and applications in frontier areas of Bioinformatics and biotechnology, are the main objectives of the Bioinformatics programme.

The BTISnet has established a link among scientists and organizations involved in research and development activities in Biotechnology. The network today offers a single window information resource in the country covering inter-disciplinary areas of biotechnology and molecular biology. Six National Facilities on Interactive Graphics are dedicated to the promotion of molecular modeling and other related activates. More than 100 subject specific databases have been developed through the BTISnet. Several major International databases for application to genomics and proteomics have been established in the form of Mirror sites as part of the network. These databases are being linked through high speed and large bandwidth network in the form of VPN called BIOGRID INDIA to promote faster sharing of information as well as to encourage tandem research by various R&D labs and industrial units. These sites are designed to act as knowledge pathways for discoveries in biotechnology. At the micro level, the programme deals with the various issues related to the biological data. It covers the development of data analysis tools, modeling of biological macromolecules and their complexes, metabolic pathways, designing of new molecules such as drugs, peptide vaccines, proteins etc.



The skilled human resource in bioinformatics is of utmost importance for effective implementation of the high throughput programmes in biotechnology. The BTISnet is running five long-term advanced diploma courses in bioinformatics at the post-M.Sc. level through leading universities of the country: Madurai Kamaraj University, Madurai; University of Pune, Pune; University of Calcutta, Kolkata; Jawaharlal Nehru University and Pondicherry University. The number of seats at each University has been increased based on the demand from the industrial sector. Each course is now producing approximately 15-20 candidates per year. Every year more than 60 trained professionals are being generated through this programme.


Post-Graduate courses such as M.Sc., M.Tech and PhD in bioinformatics have been introduced through Pune University, IIIT-Allahabad and JNU, New Delhi respectively to promote leadership quality human resource in the area of bioinformatics.


The BTISnet is also supporting and organising several short-term training and workshops in front-line areas of bioinformatics and Biocomputing all over the country.  A large number of scientists and research scholars from all over the country are making use of these training programmes of which more than 70 have been conducted this year.


The Bioinformatics Centres are being extensively used for intensive research by the hosts and neighboring institutions. The thrust areas provided are sequence analysis and molecular modeling. Apart from supporting research, the centres are also actively engaged in research on frontier areas of bioinformatics. Intrinsic research in the bioinformatics centres includes gene analysis, protein structure prediction & engineering, modeling macromolecules assembly, evolutionary biology and mechanisms of disease. Under the programme several R&D projects have been supported and the increase of R&D proposals in Bioinformatics has been witnessed in this year. The Department is considering this area as a thrust area to extend more support towards development of Bioinformatics tools.

According to the views of the scientific community world over, there are four goals before science, namely,


Identify and characterize the molecular machines of life- the multiprotein complexes that execute cellular functions and govern cell form


Characterize gene regulatory networks.


Characterize the functional repertoire of complex microbial communities in their natural environments at the molecular level.


Develop the computational methods and capabilities to advance understanding of complex biological systems and predict their behavior (Genes, proteins and molecular machines).

 While it is very fascinating and exciting to link the new development in biology emerging out of the tools of Bioinformatics, all aspects relating to ELSI (Ethical Legal Social Implications) of modern research and its application are equally important. Biosafety of genetically modified products, aspects of environmental risks and assessment, Intellectual property rights etc. have become an integral part of biology today.