THINK A new program for Virtual Screening

E. K. Davies and C. J. Davies
Treweren Consultants Ltd, Holmleigh, Evesham Road, Harvington, Evesham WR11 8LU, UK

Abstract

This paper describes virtual screening software which is used in the CAN-DDO Screen Saver Cancer Research project. The methodology used is similar to the pharmacophore approach used in the Chem-X software which systematically considers all conformations of molecules and the pharmacophores they exhibit. These pharmacophores are compared with those that complement the residues in the protein receptor site. An enhanced version of the ChemScore scoring function is used in order to rank the hits.

Introduction

In strict medicinal chemistry terms, a pharmacophore is the minimum requirements for biological activity. Prior to innovative 3D database developments, generalisations of the common substructures with geometric constraints were usually considered as pharmacophores1. While searches based on such pharmacophore queries performed well in validation studies, when used to find novel leads their performance was often disappointing. This can be attributed to the use of specific atoms and a substructure rather than a generalisation of the interactions of the substructure with a receptor. One of the first 3D database systems which had this capability was Chem-X2 which quickly established a reputation for selecting novel molecules for testing. The atoms in molecules were assigned attributes classifying the types of interactions with the receptor typically as hydrogen bond donors, acceptors, positive charge centres, acidic and basic groups. Dummy atoms were used for aromatic rings and lipophilic groups. This approach also had the capability to express a common pharmacophore in series of molecules that acted competitively at the same receptor site but did not contain common substructures.

A significant problem with a pharmacophore approach to finding new leads is that molecules can exhibit the pharmacophore but are incapable of inhibiting the receptor site (eg because they are too big). This can result in only 1-2% of selected molecules showing the desired activity. While this is obviously better than random screening, there is scope to do better. Early attempts to eliminate a significant proportion of false positives used exclusion spheres3 or volume maps4 to represent regions of space relative to the pharmacophore query that were normally occupied by the receptor site atoms. Van Drie et al in the Catalyst software5 were the first to implement what is best described as a 3D model for optimal activity. Unfortunately, they caused some confusion by choosing to describe this as a pharmacophore (which it is not as it embraces ideal characteristics rather than minimum requirements) although they avoided confusion with ComFA6 from which it is quite different by not describing it as a 3D-QSAR approach (which of course it is). Catalyst does have the capability to provide a score for selected molecules that represents how well each molecule exhibits the 3D model. This does not require the structure of the receptor site to be known unlike the analogous scoring functions which have been developed for ligand-protein docking such as used in DOCK7, LUDI8 and Prometheus9.

This papers describes an implementation similar to the Chem-X pharmacophore methodology in a new program called THINK where it is integrated with the ChemScore function which forms part of Prometheus. The combination of these approaches provides the very rapid lead generation searching based on pharmacophores with the quantitative scoring of hits. The generation of the pharmacophores is first described, then exploring conformations which might exhibit a pharmacophore of interest and finally scoring the resulting solutions or hits.

Pharmacophore Generation

When the 3D structure of the receptor is unknown, the only source of information about the pharmacophore is the structures of the molecules which show activity (those that do not show activity may also be useful). Although this case can be addressed by computational methods, this paper is concerned with Virtual Screening when the structure of the receptor is known. In these circumstances software such as Chem-X could generate possible pharmacophores from the receptor site10. A similar approach has been implemented in THINK with additional interaction types and positions.

The algorithm first identifies those atoms or groups in the protein which can have strong interactions with small molecule ligands such as hydrogen bonds, charge interactions etc. For each atom or group, one or more dummy atom is created which represents the ideal position(s) of a ligand atom with specific interaction capabilities. Chem-X introduced the term "complementary centres" to describe such dummy atoms and then constructed a list of search queries comprising 3 or 4 centres in each pharamacophore. The results of the algorithm which positions the complementary centres are shown in figure 1. For each complementary centre, a radius is used for a tolerance within which the ligand atom is required.

The software uses the neutral rather than ionised forms of the amino acids. This is necessary in order to be consistent with the neutral forms of molecules which are being searched (the use of salt forms of collections of molecules is not common practice because there may be several alternative salts). This approach was also used in Chem-X.

In THINK a subtle improvement is used which results in a significant speed increase (as well as reduced memory requirements). The permutation of complementary centres to generate a list of pharmacophores is deferred until the search stage. At this time, centres which cannot be matched by the potential ligand and pharmacophores which have inter-centre distances which are not compatible with the ligand can be eliminated. This greatly reduces the number of permutations which need further processing. Deducing whether an inter-centre distance is accessible in all possible conformations prior to generating those conformers is non-trivial and inevitably approximations must be used. In THINK, an extended chain is constructed between the atoms of interest by setting non-rigid ring torsion angles to 180 degrees. It is then trivial to deduce the maximum and minimum distances between the end-points of the chain. The pharmacophore generation stage thus provides a list of pharmacophores that are possibly exhibited by the ligand molecule and the atoms in the ligand that define those pharmacophores. These are further processed in the conformational analysis stage.

Conformational Analysis

It is now widely accepted that most small molecules exhibit many low energy conformations and the idea that a molecule might interact with a receptor only in its global energy minimum conformation or that observed in a small molecule crystal has been discredited. It is consequently surprising that many molecular modelling programs and most 3D database software5,11,12 do not adopt a comprehensive exploration of low energy conformations. It is inevitable that minimising from random conformers (as is used by UNITY and ISIS-3D) is not comprehensive and consequently may miss solutions. This is often not apparent in 3D searching when the random conformer is to be refined to a single pharmacophore. However, when processing tens or sometimes thousands of pharmacophores, such an approach would not be practical. The alternative used by Catalyst of generating a predefined (typically 100-250) representative conformers per molecule offers the benefit of very rapid searching but at the cost of time to generate the representative conformers and (for more flexible molecules) the omission of relevant conformers.

The increased speed of computers means that it is now possible to generate thousands of conformations extremely quickly. However, the vast numbers of molecules which may be processed and the torsional increment appropriate for the distance tolerances means that a brute force approach remains impractical - even if it was appropriate. The algorithmic challenge is to generate relevant conformers quickly and this necessitates an approach that can quickly identify and skip large numbers of conformations. Central to the approach used by THINK (and previously by Chem-X) is to order the list of rotational bonds to divide those which do affect distances between atoms which match points in a pharmacophore from bonds which do not. This simplifies rotating those bonds that change distances which match those in the query without rotating other bonds. Which in turn greatly increases the speed of generating solutions. It is necessary to rotate all bonds to relieve unacceptably small distances between atoms such as those in contact and to improve the score of the solutions.

THINK includes a library of ring torsion angles for a range of ring sizes, saturation patterns and heteroatom substitutions. This is used to rapidly generate alternative conformers for rings (including fused rings). Although the geometries of the resulting conformers are approximate but meet the normal distance tolerance criteria without refinement.

By default, THINK uses a distance tolerance of 0.5 Angstroms. This may be increased by defining radii for the atoms giving a tolerance equal to the sum of the radii. For most protein x-ray crystal structures, a tolerance of 0.5 Angstroms is inappropriately small but larger tolerances may give rise to visually poor hydrogen bonding geometries and lower solution scores. As a consequence it can be beneficial to refine the geometry of the protein (or at least the active site) using molecular mechanics programs. However, this would not avoid the issue of flexible amino-acid side-chains which might accommodate significant differences in conformers and scores. In practice, by using default tolerances, virtual screening will fail to select some molecules and at least for validation purposes it may be necessary to increase the default tolerance and/or refine the geometry of the receptor site.

Scoring Hits

In order to generate solutions which fit inside the receptor site it is necessary to reject conformers which would require atoms to overlap those in the protein. The scoring function used is based on the ChemScore function developed by Protherics9 with extension to include a standard Lennard-Jones VdW potential and torsional energy term. This approach is preferred to using a simple VdW contact test because it is less sensitive to rounding errors and other small changes in geometry.

The geometry of conformers which exhibit the pharmacophore and score less than a cutoff (by default 1000.) are refined using a SIMPLEX minimiser to adjust the torsion angles, x,y,z orientation and position of the molecule in the site. Molecules which are predicted to bind will have a negative score. The scores are obviously dependent on the parameters and any threshold score may result in borderline cases not being selected. Analogous to biological testing, the choice of threshold for activity is partly subjective: too lower a value will result in few (if any hits) while a value which is high may result in far too many molecules being selected for further analysis. The current implementation has the option to stop processing a molecule when the first acceptable hit is found. Consequently, it is conceivable that a molecule scores just beneath the threshold may have other conformers which score much better. As a result, selecting molecules for further consideration simply by sorting may eliminate molecules which deserve follow-up. Strategies for reviewing solutions are outside the scope of this paper.

Results and Discussion

As an example we use the estrogen receptor alpha ligand binding domain (PDB code 3ERT) and a selection of molecules similar to those used for the CAN-DDO project13. Water molecules were removed from the PDB file using a text editor and THINK was used to generate 45 complementary centres for residues that were in contact with the ligand. The default contact radii were used (1.25 x VdW radii). Of the 36 retained complementary centres which where within 6 Angstroms of a ligand atom, 13 were close to the ligand (less than 1.0+VdW radii of any ligand atoms). These include 3 sets of 3 overlapping centres (with different types) and one set of 2 overlapping centres.

Using THINK 1.14, 27 hits are found from a collection 88,655 molecules in 24 hours (on an 600 MHz PC) with refinement. The molecule predicted to bind best (lowest scoring molecule) had a score of -43.6 KJ and there were 5 other molecules which scored less than -20kJ.

The inclusion of a scoring function is a significant enhancement on the Chem-X pharmacophore approach. Although it is not apparent solely from the results presented here, THINK is also faster than Chem-X for the same query. The time taken is reduced if only the subset of 13 centres very close to ligand are used or some other subset based on mutagensis or other information is selected. In addition, a volume check option may be enabled which eliminates molecules which exhibit the pharmacophore in a volume corresponding of less than 0.25 multiplied by the number of non-H atoms. The resulting speed can increase dramatically to a few minutes per 10,000 molecules in idealised cases, although these options can also eliminates some of the less desirable hits. It should also be noted that when attempting to reproduce hits found experimentally or ligand-protein crystal structures, it may be necessary to change default settings eg reduce the numbers of centres being considered to 3 or even 2 and at least review the positions of side-chains and waters in the active site. Details of such validation studies are being published separately14.

In order to screen larger numbers of molecules very quickly, THINK can be configured to run as a screen saver application on PCs under Windows with access to a list of jobs on a shared disk. In the CAN-DDO project, United Devices Global Metaprocessor software (www.ud.com) is used to utilise the idle CPU time on over a million PCs connected via the internet.

References

(1) Sheridan, R. P.; Nilakantan, R.; Rusinko III, A.; Bauman, N.; Haraki, K. S. and Ventkataraghavan, R. 3DSEARCH: A System for Three-Dimensional Substructure Searching. J. Chem. Inf. Comput. Sci. 1989, 29, 225-260
(2) Murrall, N. W. and Davies, E. K. Conformational freedom in 3D databases. 1. Techniques. J. Chem. Inf. Comput. Sci. 1990, 30, 312-316
(3) MACCS 3D was developed at MDL Information Systems Inc, San Francisco, CA (www.mdli.com)
(4) Chem-X was developed at Chemical Design (now part of Accelrys).
(5) Catalyst developed at Accelrys (www.accelrys.com - formerly MSI)
(6) ComFA developed at TRIPOS Associates Inc, St Louis, MO (www.tripos.com)
(7) (a) DOCK developed at UCSF, San Francisco, CA ; (b)Kuntz, I.D. et al. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 1982, 161, 269-288
(8) (a) Boehm, H-J. The computer program LUDI: A new method for the de novo design of enzyme inhibitors. J. Comput.-Aided Mol. Design 1992, 6, 61-78; (b) Boehm, H-J. The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J. Comput.-Aided Mol. Design 1994, 8, 243-256
(9) (a) Prometheus developed at Tularik, Macclesfield, England (www.tularik.com - formerly Proteus); (b) Murray, C.W.; Auton, T. R.; and Eldridge, M.D. Empirical scoring functions. II. The testing of an empirical scoring function for the prediction of ligand-receptor binding affinities and the use of Bayesian regression to improve the quality of the model. J. Comput.-Aided Mol. Design 1998, 12, 503-519
(10) Murray, C.M. and Cato S.J. Design of Libraries to Explore Receptor Sites. J. Chem. Inf. Comput. Sci. 1999, 39, 46-50
(11) ISIS-3D developed at MDL Information Systems Inc, San Francisco, CA (www.mdli.com)
(12) UNITY developed at TRIPOS Associates Inc, St Louis, MO (www.tripos.com)
(13) Hand, L. Computing for Cancer Research. The Scientist 2001, 15, 1-5.
(14) Davies, E.K. and Cockcroft, V. Reproducing ligand-protein x-ray crystal structures complexes using THINK. In preparation