Information Theory Decoded

February 2026

Information is reduction of uncertainty. Bits measure surprise. Shannon's framework—from 1948—applies everywhere: communication, physics, biology, computation, thought. Decode the foundations.

Information theory feels technical because it uses math. But the core ideas are intuitive once decoded. They're also among the most powerful conceptual tools available.

Information as Surprise

Information isn't what you receive. It's how much your uncertainty decreases.

If someone tells you something you already knew, you received zero information. If they tell you something unlikely, you received lots. Information = surprise.

Mathematically: information content of an event = -log(probability of event). The less probable something is, the more information it carries when you learn it happened.

This is counterintuitive. We associate "information" with meaning or importance. Shannon's definition ignores meaning entirely. A random string of bits can have high information content. A profound truth you already believed has low information content.

The separation is deliberate. Meaning is receiver-dependent and hard to formalize. Surprise is measurable.

Bits: The Unit of Information

A bit is the information gained from a binary distinction—one yes/no question answered.

Learning which of two equally likely outcomes occurred = 1 bit. Learning which of four equally likely outcomes = 2 bits. Learning which of N equally likely outcomes = log₂(N) bits.

Everything digital reduces to bits. Every message can be encoded in bits. Bits are the universal currency of information.

Entropy: Expected Surprise

Entropy is average information content—how surprised you expect to be.

A biased coin has low entropy: you're rarely surprised. A fair coin has maximum entropy: each flip surprises maximally. Entropy measures the "spread" of a probability distribution.

High entropy = high uncertainty = high information potential. Low entropy = low uncertainty = low information potential.

This connects to physics. Thermodynamic entropy and information entropy are related—same math, different interpretations. A messy room has high entropy: many equivalent microstates. An organized room has low entropy: specific microstate.

Compression: Removing Redundancy

Shannon's source coding theorem: you can compress a message down to its entropy, but no further.

If a message has redundancy (predictable patterns), you can encode it more efficiently. The limit is the entropy rate—the true information content.

English text compresses well: after "qu", "u" is almost certain. After "th", "e" is likely. These redundancies can be squeezed out. Random noise can't compress: no redundancy to remove.

Compression algorithms find and exploit patterns. Good compression = good pattern detection. ZIP files are implicit models of text structure.

Channel Capacity: The Speed Limit

Shannon's channel coding theorem: every communication channel has a maximum reliable transmission rate.

The channel capacity depends on bandwidth (how fast can you send symbols) and noise (how corrupted do symbols get). More bandwidth = more capacity. More noise = less capacity.

The profound insight: you can get arbitrarily close to error-free transmission at any rate below capacity. Error-correcting codes make this possible. Above capacity, errors are inevitable no matter what you do.

This has practical implications: internet speeds, cell phone calls, satellite communication—all engineered around channel capacity limits.

Information Beyond Communication

Shannon developed information theory for communication engineering. But the framework generalizes:

Biology

DNA is an information storage medium. Genetic code compresses organism-building instructions. Evolution is information processing—selection increases information about environment fitness.

Physics

Quantum mechanics has information-theoretic formulations. Black hole thermodynamics involves information paradoxes. Some physicists argue information is more fundamental than matter.

Neuroscience

Neural codes carry information. Sensory systems optimize information transmission. Attention is bandwidth allocation. Memory is information storage.

Machine Learning

Training is compression. A model compresses training data into parameters. Generalization tests whether compression captures structure or noise. Cross-entropy loss = information-theoretic objective.

Decoder Application

Information theory provides a lens for the decoder method:

Evidence evaluation: Information content measures how much a piece of evidence should update beliefs.
Redundancy detection: The same insight expressed multiple ways adds less information than it seems.
Channel capacity: Communication has limits. Expecting perfect transmission is unrealistic.
Compression: Good understanding compresses. If you need as many words to explain as to describe, you haven't understood.

The decoder method aims to find signal (pattern, structure) in noise. Information theory formalizes what that means.

How I Decoded This

Direct engagement with: Shannon's 1948 paper, Cover & Thomas's textbook, applications across domains. Cross-verified: same mathematical framework applies in communication, physics, biology, computation. Deep math, broad application = high-confidence decode.

— Decoded by DECODER