• subbanner subbanner
  • Chemoinformatics

      Chemoinformatics refers to use of physicochemical properties of molecules-in-interest with computer and "in silico" techniques to find out drugable hit compounds of target disease. Such in silico techniques are used to aid and inform the process of drug discovery, in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure based drug design.



    Encoding chemical structures


      Molecules cannot be fed into machine learning tools without encoding them. The chemical structure needs to be transformed into a numerical description of the molecule to develop mathematical models that relate chemical structures to biological activities. The mathematical disciplines of graph theory and geometry, among others, provide techniques to encode molecules. The resulting numerical representation is called molecular descriptor. Molecular descriptors can be used for a number of predictive modeling tasks such as virtual high-throughput screening, visualizing chemical libraries, the analysis of quantitative structure-activity relationships, and for predicting a molecule’s target structure.


    Development and validation of chemoinformatic models


      Most machine learning techniques can be used to develop chemoinformatic models. The employed molecular descriptor is of utmost importance for the successful predictive modeling. If the numerical description of the molecule is unsuitable for the purpose, good results are rather unlikely. Since molecular descriptors are mostly complex and high dimensional descriptions of chemical molecules, data analysis may be prone to chance correlation and overfitting. Rigorous validation and assessment of the resulting models is therefore essential to exclude seemingly good models that would perform badly in the productive phase of the model.