AI Summary of Peer-Reviewed Research

This page presents an AI-generated summary of a published research paper. The original authors did not write or review this article. [See full disclosure ↓]

Publishing process signals: STRONG — reflects the venue and review process. — venue and review process.

New dataset covers atomization energies across broad chemical space

Materials Science research
Photo by Polina Tankilevitch on Pexels · Pexels License
Research area:Materials ScienceMaterials ChemistryAdvanced Chemical Physics Studies

What the study found

The study presents Microsoft Research Accurate Chemistry Collection (MSR-ACC) and its first release, MSR-ACC/TAE25, a dataset of 73,040 total atomization energies calculated at the CCSD(T)/CBS level using the W1-F12 thermochemical protocol. The dataset is designed to cover a broad space of closed-shell, neutral, covalently bound equilibrium molecules with up to 5 non-hydrogen atoms.

Why the authors say this matters

The authors say that sub-chemical accuracy means being within 1 kcal mol^-1 of the empirical ground truth, and that datasets with this level of accuracy are still limited in size or scope. The study suggests that MSR-ACC/TAE25 can help develop data-driven computational chemistry methods with more predictive accuracy across broad chemical space.

What the researchers tested

The researchers built an openly available dataset on Zenodo in QCSchema format under the CDLA Permissive 2.0 license. It includes molecules made from elements up to argon and excludes structures with significant multireference character.

What worked and what didn't

The release contains 73,040 total atomization energies and is described as exhaustively covering the specified chemical space. The abstract does not report performance comparisons, model benchmarks, or failures.

What to keep in mind

The available summary does not describe limitations beyond the dataset scope itself. The release is restricted to closed-shell, charge-neutral, covalently bound equilibrium structures with up to 5 non-hydrogen atoms and without significant multireference character.

Key points

  • MSR-ACC/TAE25 contains 73,040 total atomization energies.
  • The energies were obtained at the CCSD(T)/CBS level using the W1-F12 thermochemical protocol.
  • The dataset covers closed-shell, neutral, covalently bound equilibrium molecules with up to 5 non-hydrogen atoms.
  • The covered elements extend up to argon, and molecules with significant multireference character are excluded.
  • The dataset and canonical train/validation splits are openly available on Zenodo in QCSchema format under the CDLA Permissive 2.0 license.

Disclosure

Research title:
New dataset covers atomization energies across broad chemical space
Authors:
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Víctor García Satorras, Stephanie Lanius, Marwin Segler, Klaas J. H. Giesbertz, Kenji Takeda, Kenji Takeda, Giulia Luise, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton
Institutions:
Microsoft (Netherlands), Microsoft (United States), Microsoft (Germany), Microsoft Research (United Kingdom), University of New England
Publication date:
2026-04-25
OpenAlex record:
View
Image credit:
Photo by Polina Tankilevitch on Pexels · Pexels License
AI provenance: This post was generated by OpenAI. The original authors did not write or review this post.