ChemAudit: An Open-Source Chemical Structure Validation Suite

Data quality in chemistry remains one of the biggest bottlenecks in cheminformatics, drug discovery, and machine learning for chemistry. Issues such as incorrect structural representations, undefined stereocenters, PAINS-flagged¹ compounds, and inconsistent standardisation can quietly undermine the reliability of downstream models and analyses.

ChemAudit was built to address this. It’s a free, open-source web platform that brings structure validation, standardisation, structural alert screening, and quality scoring together in one clear, user-friendly interface. No command-line experience needed.

What it does:

Runs 15+ validation checks covering parsability, valence, stereochemistry, and representation consistency

Screens against 480+ PAINS patterns and 700+ pharmaceutical alert filters sourced from BMS, Glaxo, Dundee, and other ChEMBL² collections

Scores ML-readiness (0–100) by testing 451 molecular descriptors and 7 fingerprint types

Evaluates drug-likeness via Lipinski³, QED⁴, Veber⁵, Ghose⁶, and Muegge ⁷ rules

Predicts ADMET properties, including synthetic accessibility, solubility, and CNS penetration

Standardises structures using the ChEMBL pipeline⁸ (salt stripping, tautomer canonicalization, charge normalisation)

Assesses natural product likeness with scaffold analysis

Built for scale: Batch processing supports up to 1M molecules with real-time WebSocket progress tracking. Results can be exported to CSV, Excel, SDF, JSON, and PDF.

Built on proven tools: RDKit⁹, MolVS¹⁰, and the ChEMBL structure pipeline power the backend. React and RDKit.js deliver interactive 2D depictions with atom-level issue highlighting on the frontend.

ChemAudit is designed for database curators, ML researchers, medicinal chemists, and natural products scientists who need reliable, standardised chemical data without the overhead of stitching together disparate CLI tools or licensing commercial software.

Self-hosted and MIT-licensed. Try it, break it, extend it.