Hight-Throughput Screening (HTS) is a powerful tool in drug discovery, but very expensive in terms of required equipment and running costs. The virtual equivalent of HTS is molecular databases with the ability to search between millions of molecules by means of a similarity measure. In this work we propose a new class of bounds, algorithms and storage strategies based on the Intersection Inequality [5] for the Tanimoto Similarity to improve state of the art performances in querying large repositories of binary fingerprints. We focus on a special case that we call the β = B algorithm. The performance of the algorithm is assessed by simulating queries over an excerpt of the ChemDB [7]. We show how the average search can be up to 37% faster than using the Bit-Bound[4] alone, depending on the amount of space dedicated to data structures needed by the algorithm. © 2010 IEEE.
A scalable reference-point based algorithm to efficiently search large chemical databases
Napolitano F.;
2010-01-01
Abstract
Hight-Throughput Screening (HTS) is a powerful tool in drug discovery, but very expensive in terms of required equipment and running costs. The virtual equivalent of HTS is molecular databases with the ability to search between millions of molecules by means of a similarity measure. In this work we propose a new class of bounds, algorithms and storage strategies based on the Intersection Inequality [5] for the Tanimoto Similarity to improve state of the art performances in querying large repositories of binary fingerprints. We focus on a special case that we call the β = B algorithm. The performance of the algorithm is assessed by simulating queries over an excerpt of the ChemDB [7]. We show how the average search can be up to 37% faster than using the Bit-Bound[4] alone, depending on the amount of space dedicated to data structures needed by the algorithm. © 2010 IEEE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.