8. INTRODUCTION to Computer Aided Drug Design                                       

 N. Subba Rao

Recent initiatives to sequence the human genome and those of pathogenic microorganisms have provided a plethora of information, which hold considerable promise for drug discovery and development. Technological advances in biological sciences allowing for the rapid development of new diagnostic methods and drugs based on biological molecules, including proteins and nucleic acids. The key to the maximal exploitation of these data for therapeutic purposes lies in accurately identifying the structure and biological function of the protein coded by a given gene. Analyses of genomic data to characterise proteins and predict their form and function has thus become an integral part of the drug design cycle.

Advances in macromolecular structure determination, directed combinatorial chemistry and biocomputing have further extended the boundaries of the structure-based drug design technique. Chemical information technology helps us to appreciate the richness and variety of chemical structural complexity. Computer Aided Drug Design(CADD)  is a combination of computational chemistry and information technology tools that help us to discover new and useful compounds. These technologies are changing the traditional approaches to the drug discovery process.

The process of designing a new drug and bringing it to market is very complex.  It takes 5-7 years  and 350-500 million dollars for the average new drug to go from the research laboratory to patient use. Develop an assay technique to test drug effectiveness. An ideal assay is one in which a compound can be added to tissue samples or micro-organism colonies and there will be a visible indication of an effective treatment. If it is known that a drug must bind to a particular spot on a particular protein or nucleotide then a drug can be tailor made to bind at that site.

This is often modeled computationally using any of several different techniques. Traditionally, the primary way of determining what compounds would be tested computationally was provided by the researchers' understanding of molecular interactions. A second method is the brute force testing of large numbers of compounds from a database of available structures. Lead compounds are compounds that have some activity against a disease. However, the lead compounds provide a starting point for refinement of the chemical structures.

Use of combinatorial chemistry techniques, which produce large numbers of related chemical compounds. This allows testing a large number of compounds at once. When a mixture that is useful is found, a separation must be done to determine which of the related structures has some drug activity. This has been one of the most promising and rapidly growing techniques in recent years. Many chemical/drug databases as well as chemical information systems are developed by pharmaceutical companies as well as molecular modeling software vendors.  Searching chemical databases to find compounds similar to those found by the above means. This is the only part of the lead finding process that is considered to be a computational technique. There are many different measures of molecular similarity and ways of efficiently handling large databases, so this is not yet a trivial step.

Computer Aided Drug Design is to find ligands that are predicted to interact strongly with a host (Receptor) . Alternatively, this procedure can be reversed to search for hosts that will interact strongly with a given ligand. Computer aided Drug Design is playing a major role in   rational drug design process and also to understand molecular recognision processes involving interactions between protein-protein or protein-DNA, protein or DNA binding with substrates.

CADD can be done in two ways: ligand based or receptor based. Receptor based design starts with a known receptor, such as a protein-binding site or supramolecular host. Ligand based design uses a known set of ligands, but an unknown receptor site. Both approaches are actually very similar. Even once a structure has been determined, identifying the site where a drug/ligand must bind is not a trivial task.

The first phase is to determine the three dimensional structure of the protein (Receptor) either by X-ray or NMR and identify the binding site (Drug target) using standard structural analysis from X-ray diffraction, NMR.  In the absence of structural information, homology of the unkown receptor sequence with known structures that have been identified through database searches may be a good starting point. Thus increased availability of X-ray crystal structures of the receptors, and the increased reliability of homology models, are an important incentive for direct drug design.

The current emphasis in CADD is on lead development, which contrasts with early efforts that concentrated on lead optimization. Techniques for the latter are well established, while techniques for lead development are still under development and generally involve either (a) using computer technology to propose a new structure to fit a putative or known receptor (de novo design), or searching a database of known structures for those with a desired activity or similarity to active compounds.

The discovery of new natural products help us explore structures and functionality that we would never guess are importanty availability of chemical structure databases is playing an important role in enhancing the drug discovery approach. Characterizing the biological activity and properties of all the known compounds is impossible.

It is important to develop predicative tools for understanding structure-function relationships and these techniques enhances our ability to predict chemical reactivity and design useful compounds. Computationally, the technique used is known as QSAR (Quantitative Structure Activity Relationships. Quantitative structure activity relationships (QSAR), Quantitative structure property relationships (QSPR), and 3D-database mining play a central role in this effort. Analytical chemists have developed new chemometric techniques that allow the rapid retrieval and prediction of molecular and biological properties. Hydrophobic properties express the ability of a molecule to be transported in the environment and in an organism, to interact with biological membranes, and to be bound to a receptor by hydrophobic forces. Hydrophobic properties calculated currently  are; 

logP - logarithm of the octanol-water partition coefficient

MR - molar refractivity

log(1/WS) - water solubility; log(VP) - vapor pressure 

Multi-variate and artificial intelligence techniques are necessary to efficiently use our wealth of information. This information can then be used to suggest new chemical modifications for synthesis and testing. Ideally there is a continual exchange of information between the researchers doing QSAR studies, synthesis and testing. These techniques are frequently used and often very successful since they do not rely on knowning the biological basis of the disease which can be very difficult to determine.

The second phase is to generate a query for database searching. This model may be based on a pharmacophore( Functional group types (e.g. hbond donors, acceptors, hydrophobic regions) and the spatial arrangement of those groups on a molecule that interact with the receptor and are responsible for binding and biological activity). Ideally separate the binding pharmacophore from the activity pharmacophore to design a compound that binds but does not cause the biological activity (antagonist). Thus pharmacophore  identifies a few specific interactions that are responsible for the binding.

The query is generated by building a simplified model of the receptor site .

The next phase is to search databases for ligands that may bind to the chosen receptor. The 3D-pharmacophore is used in conformationally flexible searches for ligands that match the spatial distribution of the receptor. Alternatively, the receptor pocket can be used with auto-docking to find ligands that avoid close-contacts. The 3D-pharmacophore approach and the binding pocket approach are actually very similar, and queries can be fashioned that incorporate aspects of both approaches. Pharmacophores emphasize a few specific and varied types of interactions, while binding pockets emphasize steric interactions over the entire ligand. Some of  the docking programmes that are described below are widely used in the rational structure based drug design.

Docking can be accomplished by either geometric matching of the ligand and its receptor or by minimising the energy of interaction. The geometric matching algorithms form the majority of approaches because of their relative speed. A subdivision can be made in geometry matching based on descriptors and geometry matching based on fragments. One of the oldest docking programmes, DOCK (Kuntz et al.) is based on a description of the negative image of a spacefilling representation of the receptor that should be filled by the ligand. Matching of the structures in the Cambridge Structural Database with the DOCK images offered some of the early successes.

CAVEAT(Barret et al)  suggests ligands to a particular receptor, not based on the matching atoms, but rather based on the desired bond vectors. The concept of electrostatic complementarity is not addressed by Caveat, nor in the original versions of DOCK. Nowadays, recent improvements to DOCK are the addition of a force-field for energy evaluation, limited conformational flexibility and the inclusion of a hydrophobic term in the energy evaluation.

Based on fragments, instead of a descriptors, are the GROW (Moon et al.), HOOK (Eisen et al.)  Programmes. By slowly growing functional groups to a peptide, GROW proposes peptidic ligands for a given macromolecule. Which are easy to synthesise and evaluate pharmacologically. The HOOK programme proposes docking sites by using multiple copies of functional groups in simultaneous searches, followed by molecular mechanics or dynamics with the CHARm programme.

LUDI proposes somewhat larger fragments to match with the interaction sites of a macromolecule and scores its hits based on geometry criteria taken from the CSD, the PDB and on criteria based on binding data. The most recent improvement is the incorporation of hydrophobic terms and the loss of binding energy because of freezing of the internal ligand energy or torsional and translational rotors. 

The AutoDock programme , attempts to help in docking ligands in a flexible manner to their receptive proteins using a Monte Carlo Simulated Annealing approach. Recently, successful use of this method was reported in the design of HIV-1 protease inhibitors. The advantage of such a flexible procedure is evident.