Tag: Benchmark (surveying)

Strict gating cuts unsafe commitments but raises false positives
—
What the study found The study found that, in a toy robotic-arm simulation, strict binary commitment gating reduced unsafe commitment but created a high burden of hard false positives. It also found that authority throttling and cost-aware throttled gating kept most of the safe-stop benefit while sharply reducing unnecessary hard stops. Why the authors say…

TRAVELER benchmark reveals weaker LLM temporal reasoning with vague references
What the study found The study found that current large language models (LLMs) do better with explicit time references, such as exact dates, than with implicit or vague ones, such as "yesterday" or "recently." Performance also declines as the number of past events in a set increases, and vague references are the hardest category overall.…

Detailed deduction of the Tennessee-Eastman benchmark model
Detailed mathematical reconstruction of the Tennessee-Eastman benchmark process model with previously unavailable parameters and explicit documentation of assumptions.

GeoGraphNetworks provides validated spatial network data
—
GeoGraphNetworks: 110 validated spatial networks spanning US and UK transportation and hydrological systems in analysis-ready JSON and XLSX formats with complete topological and geographic data.

Ancient DNA relatedness methods vary in reliability across conditions
Benchmark evaluation of ancient DNA genetic relatedness estimation methods reveals multiple sources of bias and limitations in current approaches.

BenchPCNP provides labeled printed circuit netlist graph data
—
in EngineeringBenchPCNP: A labeled printed circuit netlist graph dataset for partitioning benchmarking, constructed from 50 production-verified circuits with 54 distinct module labels following IPC-2612 standards.






