Network Biology: a new way of representation and analysis of biological information processing
A new way of representation and analysis of biological information processing.
Jong H. Bhak and Dan Bolser
Biological Information Objects (BiO), Cambridge, UK
The aim of biological science is systematically answer the old question of philosophy, “What is life?” Modern biologists are similar to past philosophers. Bioinformatics—likely to be biology’s future name—employs information-processing philosophy and technology to interpret the whole life process as a complex system with many computable layers of different elements. We can encapsulate the layers (Biolayers) as classes or components of abstract objects for analysis and simulation. Eventually, the layers form a recursive and self-similar pattern of information processing, providing a commonality in all levels of life. As in fractal geometry, these patterns are present naturally and universally in a universal matrix (Biomatrix) where our physical world exists. This universality provides us with the ability to apply computable enzyme circuits of metabolic pathways for cancer to the simulation of bacterial interactions and even socioeconomic behaviors (such as the Internet). Remarkably, because of the tremendous challenges and volume of data biology produces, it has proved itself to be the richest source of information in science. Processing the gigantic body of bio-information asks us to establish efficient way of thinking the problems. Biosophy is a new name for the philosophical foundation for bioinformatics and biology. The core of biosophy, as in philosophy, lies in the interactions of the entities (bioentities). From the simple pairwise to complex multi-layer regulartory interactions can be best represented as network, which is a synonym of matrix. While biomatrix is more physical and static, bionetwork is a concept of dynamics.
We can best represent and analyze the essence of biology’s layers as computable networks. For example, a protein is a network or graph of amino acids with nodes and edges. The nodes can be amino acids and the edges can be chemical forces. We can represent the amino acids (about 20 of them) as the networks of atoms of carbon, nitrogen, oxygen, and so on. This can go down as far as the boundary of matter and nonmatter or go up as far as (or beyond) two humans having a conversation. We can regard the conversation as information processing with a relatively precise syntax and a highly context-dependent grammar. This is essentially the same process as two proteins interacting probabilistically to produce some biological functions. Humans are not much more than huge protein complexes with the same essentiall function of information processing. In this regard, our cities such as Seoul and London are just anoter layer of complex biological information processing units.
As early as the 1980s, researchers started viewing DNA or genomes as the dynamic storage of a language system with precise computable finite states (Searls, 1993). Recent complex-systems research has also suggested some far-reaching commonality in the organization of information in problems from biology, computer science, and physics, such as the Bose–Einstein condensate (a special state of matter, Bianconi and A.L. Barabási, 2001). A grand theory explaining very small and large systems can come from the computational mechanics applied to biological networks that encompass atoms, giant organisms, and even larger objects in a coherent information-processing scheme.
However, only in the last five years has bioinformatics truly shifted its focus from individual genes, proteins, structures, and search algorithms to large-scale networks often denoted as -omes such as biome, interactiome, genome and proteome. Suddenly, biologists are finding the links between the Internet and metabolic pathways, structural interactions of proteins via a network topology or scale-free network (Jeong et al., 2000, see Figure 1). We are becoming more certain that biology’s future lies in networks of biological entities.
Figure 1. A protein protein interaction network.
So, what are the challenges and future trends for network biology? Three main challenges lie ahead: representing biological entities and making databases; mapping the networks efficiently; and modeling, simulating, analyzing, and predicting the networks. Fortunately, the critical problems of all networks boil down to physical and informational interactions among biological entities. So, we can best tackle these challenges by mapping the entities with interaction maps or networks.
In building such interaction networks, proteins—the prime molecules of life—are extremely useful. However, only recently have we observed research results from the large-scale identification of proteins and mapping of their interactions. Due to its difficulty and importance, mapping protein interaction is perhaps comparable to the human genome project. Until we find new technologies for more sophisticated, large-scale detection of different chemicals in cells, the protein entities and their networks will occupy the core of bioinformatics research.
We can study protein interactions (commonly the protein domains and their complex) in many different forms, four of which I explain here.
The first and most obvious way to study interactions is through the literature of various biological fields. Many biological articles provide some degree of protein interaction information. This form’s main problem is that the signal-to-noise ratio is poor owing to massively irrelevant and confusing text bodies. So, we need to logically parse, predict, and verify the interactions. Artificial intelligence techniques, including natural-language processing, are often employed with reasonable success. In the future, with the data-mining processes included, the whole literature itself will form a giant biological entity, which, in essence, is not much different from a whole genome (a textome or archiome, Tsoka and Ouzounis, 2000).
The second form comes from metabolic-pathways information. Interactions in this case are often linked by biological substrates in directed graphs or circuits of enzymes. A good example is the Kyoto Encyclopaedia of Genes and Genomes, a database of metabolic pathways and others integrated into one pathway (see www.genome.ad.jp/kegg). Practically, this representation is close to the electronic circuits of switches. When seemingly distributed biological entities interact with each other and with a switching mechanism, certain emergent properties occur, and the whole circuit becomes alive and starts to control and process the information flow (Thiery and Thomas, 1998). The physical material for the processing is often associated with energy in chemical forms. Finding regulatory principles and rules is critical to correctly analyzing this form of interaction data.
We find the third form of interaction in molecular-genetics methods such as the yeast two-hybrid system (Y2H, Walhout, et al., 2000). This method, on a massively large scale, produces genetically predicted or verified protein interactions. Whole-genome-scale interaction experiments are now possible (Uetz, 2000).
The last and probably most precise form comes from the physical and structural interactions between proteins. Proteomics data from mass spectrometry can provide relatively reliable physical interactions of proteins and can identify new proteins. Another valuable source is the Protein DataBank, which stores 3D protein structures as coordinates. Using the precise 3D structure, we can generalize protein interactions and draw them on a map that encompasses all the known protein topologies and their interactions (Park, et al., 2001). This interaction map can reveal the evolutionary paths of interactions, because it lies at the protein family level rather than the individual protein level (see Figure 2).
Figure 2. The protein-structural interaction map, which is the first global map of protein interactions. It shows all known protein fold interactions in one picture. The interactions are phylogenetic; that is, they are based on evolutionarily determined family–family interactions. The map works as the basic skeleton of more specific protein–protein interactions. Each node is not a protein name but the fold type of a set of related proteins.
Not only can these four sources of interaction networks form different layers of infrastructure in bioinformatics, but they also overlap and interact with each other, resulting in a super information network. This pattern will recurse, eventually forming a tightly yet probabilistically controlled network referred to by “life as human beings model it” (because humans can merely model it ), whether it is called Gaia, Galaxia, or something else.
Thus far, mathematicians, physicists, and computer scientists have not been able to precisely map or successfully predict biology’s seemingly fuzzy and disorganized data, with thousands of different layers ranging from molecules to the Internet. Insights on the nature of biological entities as complex interaction networks are opening a door toward a generalization of the representation of biological entities. Two main challenges exist. The first is to data mine the networks of the domains of bioinformatics—namely, the literature, metabolic pathways, and proteome and structures, in terms of interaction. The second challenge is to generalize the networks to integrate the information into computable data for computers regardless of the layers’ levels. Once bioinformatists find a general principle for how components interact to form any organic interaction network, true simulation and prediction in silico will be possible.
1. D.B. Searls, “The Computational Linguistics of Biological Sequences,” Artificial Intelligence and Molecular Biology, L. Hunter, ed., MIT Press, Cambridge, Mass., 1993, pp. 47–120.
2. G. Bianconi and A.L. Barabási, “Bose-Einstein Condensation in Complex Networks,” Physical Rev. Letters, vol. 86, no. 24, June 2001, pp. 5632–5635.
3. H. Jeong et al., “The Large-Scale Organization of Metabolic Networks,” Nature, vol. 407, no. 6,804, 5 Oct. 2000, pp. 651–654.
4. T.D. Thiery and R. Thomas, “Qualitative Analysis of Gene Networks,” Proc. Pacific Symp Biocomputing, World Scientific, Singapore, 1998, pp. 77–88.
5. S. Tsoka and C.A. Ouzounis, “Recent Developments and Future Directions in Computational Genomics,” FEBS Letters, vol. 480, no. 1, 25 Aug. 2000, pp. 42–48
6. A.J. Walhout, S.J. Boulton, and M. Vidal, “Yeast Two-Hybrid Systems and Protein Interaction Mapping Projects for Yeast and Worm,” Yeast, vol. 17, no. 2, June 2000, pp. 88–94.
7. P. Uetz et al., “A Comprehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae,” Nature, vol. 403, no. 6,770, Feb. 2000, pp. 623–627.
8. J. Park, M. Lappe, and S.A. Teichmann, “Mapping Protein Family Interactions: Intramolecular and Intermolecular Protein Family Interaction Repertoires in the PDB and Yeast,” J. Molecular Biology, vol. 307, no. 3, Mar. 2001, pp. 929–938.