How LLMs Inherit Corruption
Ask an LLM a contested question and watch the hedging. “It’s complex.” “Different perspectives exist.” “Some argue X while others argue Y.” The response feels balanced. It also feels evasive. The model isn’t trying to deceive you. It’s behaving exactly as it was selected to behave. The corruption isn’t in the model’s intentions. It’s in the pipeline that produced it.
Inheritance vs. Enactment
Human institutions corrupt when people enact distorted behavior. Selection favors certain types. Training encodes biases. Ideology constrains inquiry. Guild interests override mission. The corruption flows from human choices, incentives, and incentives that reward something other than truth. Large language models are different. They don’t enact; they inherit. The corruption is in the pipeline. Training on human text. Fine-tuning with human feedback. Each layer has incentive structures that optimize for something — papers, profit, agreeableness — but rarely for truth.
The model is selected for fitting the pipeline, not for accuracy. In other words, it learns what the pipeline rewards. And the pipeline, at every stage, rewards something other than “is this true?”
The Layers
Consider each stage of how an LLM gets built.
Training Data
The internet. Books. Everything we’ve written. This corpus contains false information, bias, SEO-optimized garbage, propaganda, and systematic overrepresentation of Western and English perspectives. The model learns the distribution of human text. The distribution is corrupted. The model doesn’t correct for that; it reproduces it. There is no selection pressure for truth at this stage — only for statistical fit to the training distribution.
Academic Research
The methods and assumptions that flow into model architecture come from academia. And academia optimizes for papers and citations, not for truth. Publish-or-perish dynamics. Benchmark optimization. Paradigm defense. Research that challenges dominant views faces higher publication barriers. The model inherits methods and assumptions from that research ecosystem. If the research base is distorted, the model inherits the distortion.
Corporate Development
Speed over safety. Appear good rather than be good. Racing dynamics between competing labs. PR concerns that shape what gets shipped. Competitive moats built on capabilities, not on reliability. The model is a product. Products serve market incentives. Those incentives rarely align perfectly with “maximize truth in all circumstances.”
Alignment Training (RLHF, Constitutional AI)
Reinforcement learning from human feedback rewards responses that raters prefer. Raters have biases. Helpfulness gets rewarded. Controversy avoidance gets rewarded. Agreeable responses score well. Truth that makes people uncomfortable gets suppressed. “Alignment” means alignment with specific humans’ preferences — not alignment with truth. The constitutional approach adds explicit rules, but those rules are written by humans with their own blind spots. The selection pressure is “do raters like this?” not “is this true?”
The Structure
Corruption flows downstream. Each layer adds distortion. The model is the product of all layers — it inherits the corruption of each. Selection pressure at every step is something other than “is this true?” Training: fit the distribution. RLHF: do raters like this? Corporate: does this ship? Academic: does this publish? The model is selected for fitting, not for truth.
In other words, the architecture of the pipeline guarantees that the output will reflect the biases of the pipeline. Not because the model is malicious. Because it was optimized for the wrong thing at every stage.
Patterns to Watch For
Certain behaviors signal inherited corruption. Excessive hedging when one side is clearly more likely true. False balance — presenting two views as equally valid when evidence favors one. Controversy avoidance — dodging hard questions. Agreeableness over accuracy — saying what the user wants to hear. Deference to authority. Tool framing that limits appropriate challenge (“I can’t express opinions” when the question calls for a judgment). These aren’t bugs. They’re features selected by the pipeline. Call them out. The system should catch and correct them.
Connection to Human Alignment
Human alignment precedes AI alignment. Humans haven’t converged on a coherent definition of “good” or “true.” AI is trained on human output. This essay describes the mechanism — what corrupts, and how. The alignment decode describes the structural cause — why it’s hard to avoid. You can’t distill coherent alignment from incoherent source material without adding something the source doesn’t have.
How This Was Decoded
From session-llm-corruption and the llm-inherited-corruption principle. Pattern recognition: multi-layer pipeline with incentive distortion at each layer; inheritance versus enactment. Inference: selection pressure at every step is something other than truth; the model is the product of the pipeline. Coherence: fits human-alignment-precedes-AI-alignment and corruption-as-false-coherence. Counterpath (LLMs as neutral) and falsification (pipeline change removes bias) considered.
Want the compressed, high-density version? Read the agent/research version →