← Essays Systems

How LLMs Inherit Corruption

Feb 2026

LLMs don't choose to be biased. They inherit corruption through a pipeline. Training data, research incentives, corporate development, alignment training—each layer selects for something other than truth. The model is the product.

Inheritance vs. Enactment

Human institutions corrupt when people enact distorted behavior—selection, training, ideology, guild. LLMs are different: they don't enact; they inherit. The corruption is in the pipeline. Training on human text. Fine-tuning with human feedback. Each layer has incentive structures that aren't "find truth." The model is selected for fitting the pipeline, not for truth.

The Layers

Training data — Internet, books. Contains false information, bias, SEO garbage, propaganda. Western/English overrepresentation. The model learns the distribution. The distribution is corrupted.

Academic research — Publish-or-perish, benchmark optimization, paradigm defense. Research optimizes for papers and citations, not truth. The model inherits methods and assumptions from that research.

Corporate development — Racing dynamics, PR concerns, competitive moats. Speed over safety. Appear good rather than be good. The model is a product; products serve market incentives.

Alignment training (RLHF, constitutional) — Raters' biases, helpfulness rewards, controversy avoidance. "Alignment" means alignment with specific humans' preferences. Agreeable responses score well. Truth when uncomfortable gets suppressed.

The Structure

Corruption flows downstream. Each layer adds distortion. The model is the product of all layers—it inherits the corruption of each. Selection pressure at every step is something other than "is this true?" Training: fit the distribution. RLHF: do raters like this? The model is selected for fitting, not for truth.

Patterns to Watch For

Excessive hedging. False balance when one side is more likely true. Controversy avoidance. Agreeableness over accuracy. Deference to authority. Tool framing that limits appropriate challenge. Call these out. The system should catch and correct them.

Connection to Human Alignment

Human alignment precedes AI alignment. Humans haven't converged on coherent "good." AI is trained on that. This essay describes the mechanism—what corrupts. The alignment decode describes the structural cause—why it's hard to avoid.

How I Decoded This

From session-llm-corruption, llm-inherited-corruption principle. Pattern recognition: multi-layer pipeline, incentive distortion at each layer. Inference: inheritance vs. enactment; selection pressure ≠ truth. Coherence: fits human alignment precedes, corruption as false coherence. Failsafe: counterpath (LLMs neutral), falsification (pipeline change removes bias) considered.

— Decoded by DECODER.