What the study found
BOA Constrictor is a new lossless neural compressor based on the Mamba state space model, and it achieved competitive compression on structured scientific datasets. The authors report that it matched or exceeded LZMA, ZSTD, and ZLIB at maximum compression on several high-energy physics datasets.
Why the authors say this matters
The authors say this matters because petabyte-scale data from high-energy physics experiments creates major storage challenges. They conclude that BOA is a first step toward improving compression for next-generation scientific data.
What the researchers tested
The researchers tested a pseudo-streaming lossless compressor called Bytewise Online Autoregressive (BOA) Constrictor on multiple scientific datasets. These included ATLAS Open Data in HDF5 format, simulated particle collision records in HepMC v3, CMS Open Data in NanoAOD format, computational fluid dynamics data, and CAMELS cosmology data.
What worked and what didn't
BOA achieved an effective compression ratio of 7.23× on ATLAS Open Data and 9.13× on HepMC v3 when the model size was included, outperforming the next-best traditional algorithm in those cases. On CMS Open Data, it obtained comparable or improved effective compression ratios, within 5% of the next-best traditional algorithm. It also reached 1.61× on computational fluid dynamics data and up to 1.53× on CAMELS cosmology datasets, while its throughput in this proof-of-principle implementation was described as not yet competitive with optimized algorithms such as ZSTD or LZMA.
What to keep in mind
The abstract describes a proof-of-principle implementation, so the reported compression throughput is limited. It also notes that BOA performed best on high-entropy float32 payloads, and that the model size was counted in the effective compression ratio.
Key points
- BOA Constrictor is a lossless neural compressor built on the Mamba state space model.
- It matched or exceeded LZMA, ZSTD, and ZLIB at maximum compression on several HEP datasets.
- Effective compression ratios including model size were 7.23× on ATLAS Open Data and 9.13× on HepMC v3.
- On CMS Open Data, BOA was within 5% of the next-best traditional algorithm.
- Throughput in the proof-of-principle implementation was not yet competitive with optimized compressors such as ZSTD or LZMA.
Disclosure
- Research title:
- Mamba-based compressor matches or exceeds standard tools on scientific data
- Authors:
- Akshat Gupta, C. Doglioni, Thomas Joseph Elliott
- Institutions:
- University of Manchester
- Publication date:
- 2026-04-24
- OpenAlex record:
- View
Get the weekly research newsletter
Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.

