Publishing process signals: MODERATE — reflects the venue and review process. — venue and review process.

DerStandard dataset spans ten years of comments and votes

—

Research area:Data scienceMetadataIdentifier

What the study found

The study presents a large, longitudinal dataset from DerStandard, an Austrian newspaper platform, covering user activity from 2013 to 2022. It includes over 75 million user comments, more than 400 million votes, and metadata on articles and user interactions.

Why the authors say this matters

The authors say the dataset enables research on discussion dynamics, network structures, and semantic analysis in German, a mid-resourced language. They also state that it is a reusable resource for computational social science and related fields while preserving user privacy.

What the researchers tested

The researchers assembled and released structured conversation threads, explicit up- and downvotes on comments, and editorial topic labels from the DerStandard news forum. Persistent identifiers were anonymized with salted hash functions, raw comment texts were not publicly shared, and pre-computed vector representations from a state-of-the-art embedding model were released instead.

What worked and what didn't

The dataset contains detailed metadata and interaction data across ten years, including comment threads, votes, and topic labels. The abstract does not report comparative performance results or tests of specific analyses; it describes the dataset as enabling further research.

What to keep in mind

The abstract does not describe analytical findings from the dataset itself. Raw comment text is not publicly available, and the summary provided here is limited to the information stated in the title and abstract.

Key points

The dataset covers DerStandard user activity from 2013 to 2022.
It includes over 75 million comments and more than 400 million votes.
Conversation threads, vote data, and editorial topic labels are included in the release.
User identifiers were anonymized, and raw comment texts were not shared publicly.
The authors say the resource supports research in German-language discourse and computational social science.

Disclosure

Research title:: DerStandard dataset spans ten years of comments and votes

AI provenance: AI provenance information is not available for this post.