ChemAudit: An Open-Source Chemical Structure Validation Suite  

Data quality in chemistry remains one of the biggest bottlenecks in cheminformatics, drug discovery, and machine learning for chemistry. Issues such as incorrect structural representations, undefined stereocenters, PAINS-flagged1 compounds, and inconsistent standardisation can quietly undermine the reliability of downstream models and analyses.

ChemAudit was built to address this. It’s a free, open-source web platform that brings structure validation, standardisation, structural alert screening, and quality scoring together in one clear, user-friendly interface. No command-line experience needed.

What it does: 

  • Runs 15+ validation checks covering parsability, valence, stereochemistry, and representation consistency
  • Screens against 480+ PAINS patterns and 700+ pharmaceutical alert filters sourced from BMS, Glaxo, Dundee, and other ChEMBL2 collections
  • Scores ML-readiness (0–100) by testing 451 molecular descriptors and 7 fingerprint types
  • Evaluates drug-likeness via Lipinski3, QED4, Veber5, Ghose6, and Muegge 7 rules
  • Predicts ADMET properties, including synthetic accessibility, solubility, and CNS penetration
  • Standardises structures using the ChEMBL pipeline8 (salt stripping, tautomer canonicalization, charge normalisation)
  • Assesses natural product likeness with scaffold analysis

Built for scale: Batch processing supports up to 1M molecules with real-time WebSocket progress tracking. Results can be exported to CSV, Excel, SDF, JSON, and PDF.

Built on proven tools: RDKit9, MolVS10, and the ChEMBL structure pipeline power the backend. React and RDKit.js deliver interactive 2D depictions with atom-level issue highlighting on the frontend.

ChemAudit is designed for database curators, ML researchers, medicinal chemists, and natural products scientists who need reliable, standardised chemical data without the overhead of stitching together disparate CLI tools or licensing commercial software.

Self-hosted and MIT-licensed. Try it, break it, extend it.

Available at: