Generating the language of AI harms: mapping guardrails using critical code studies

A person wearing glasses sits at a desk in a modern office with city skyline visible through windows, viewing multiple computer monitors displaying code and digital interfaces.
Image Credit: Photo by Blackcreek Corporate on Unsplash (SourceLicense)

AI Summary of Peer-Reviewed Research

This page presents an AI-generated summary of a published research paper. The original authors did not write or review this article. See full disclosure ↓

AI & Society·2026-03-10·Peer-reviewed·View original paper ↗·Follow this topic (RSS)
Publication Signals show what we were able to verify about where this research was published.MODERATECore publication signals for this source were verified. Publication Signals reflect the source’s verifiable credentials, not the quality of the research.
  • ✔ Peer-reviewed source
  • ✔ Published in indexed journal
  • ✔ No retraction or integrity flags

Overview

This study applies critical code studies methodologies to examine guardrails in large language models as sociotechnical control mechanisms. The research investigates how foundation models from four major organizations—Anthropic, DeepSeek, Meta, and OpenAI—implement content moderation through guardrails, focusing on both general-purpose models and public API moderation tools. The analysis treats guardrails as computational and linguistic artifacts that simultaneously encode technical constraints and ideological positions, examining how these systems regulate conversational possibilities through filtering and promotion mechanisms.

Methods and approach

The investigation analyzes multiple documentation types and technical artifacts including endpoint documentation, code examples, technical reports, model architectures, training dataset content, and methodology research papers. The study employs critical code studies as an analytical framework to deconstruct guardrail implementations across the four organizations. This approach treats code not merely as functional instruction but as a site of ideological encoding and social control, examining how computational structures embedded in guardrails shape conversational boundaries and possibilities. The analysis integrates examination of both technical implementation details and their co-construction with linguistic and ideological dimensions.

Key Findings

The examination reveals that guardrails function as dual mechanisms that simultaneously enforce technical constraints and regulate discourse through language. The study maps how guardrail architectures vary across organizations while demonstrating consistent patterns in how conversational boundaries are established through filtering mechanisms. Analysis of documentation and implementation patterns shows that certain conversations are systematically delimited while others are promoted through technical design choices. The research demonstrates that guardrails operate as conversational interfaces that encode specific ideological positions through their technical construction, making the invisible edges of LLM systems visible through examination of policy language, filter criteria, and moderation logic.

Implications

The findings establish that guardrails constitute more than technical safeguards; they represent sites where computational and natural language systems jointly produce regulatory effects on discourse. Understanding guardrails through critical code studies reveals how technical architecture and policy language are inseparable, with code functioning as both encoder and decoder of ideological constraints. This analysis challenges the notion that large-scale AI systems are necessarily impenetrable, demonstrating that systematic examination of guardrails, documentation, and technical reports provides access to comprehending how these systems' regulatory logics operate.

Scope and limitations

This summary is based on the study abstract and available metadata. It does not include a full analysis of the complete paper, supplementary materials, or underlying datasets unless explicitly stated. Findings should be interpreted in the context of the original publication.

Disclosure

  • Research title: Generating the language of AI harms: mapping guardrails using critical code studies
  • Authors: Sarah Ciston
  • Institutions: Academy of Media Arts Cologne, Center for Advanced Internet Studies
  • Publication date: 2026-03-10
  • DOI: https://doi.org/10.1007/s00146-026-02922-0
  • OpenAlex record: View
  • PDF: Download
  • Image credit: Photo by Blackcreek Corporate on Unsplash (SourceLicense)
  • Disclosure: This post was generated by Claude (Anthropic). The original authors did not write or review this post.

Get the weekly research newsletter

Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.

More posts