THE MACHINERY OF CAUSALITY

A Complete Guide to What Makes What Happen

How the Structure Underneath Cause and Effect Actually Works

What follows is not philosophy.

It is not a debate about free will. Not an argument about determinism. Not another thought experiment about billiard balls and counterfactuals.

It is mechanism.

The actual structure that determines what can affect what. The geometry that separates cause from coincidence. The mathematics that makes “because” mean something precise.

Most people treat causality the way they treat gravity. They experience it every day. They assume they understand it. They never see the architecture operating underneath.

This document is that architecture.

Nothing more.

What you do with it is your business.

PART ONE: THE ARROW

Causality Is Not Sequence

The common understanding is simple. A happens. Then B happens. A caused B.

This is wrong.

Sequence is not causation. The rooster crows before sunrise. The rooster does not cause the sun to rise. Every statistics student learns this phrase. Few understand the depth of what it means.

The real question is not “did A come before B.” The real question is “does intervening on A change B.” This distinction is the fault line between correlation and causation. Between observing and understanding. Between prediction and control.

David Hume saw this in 1739. He argued that causation is never directly observed. All we ever see is conjunction. One event followed by another. The causal connection itself is invisible. It lives in the structure of the world, not in the sequence of appearances.

It took 250 years for mathematics to catch up to his observation.

The Intervention Test

Judea Pearl formalized what Hume could only articulate. The difference between seeing and doing.

    THE FUNDAMENTAL DISTINCTION

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │              SEEING (OBSERVATION)                │
    │                                                  │
    │    P(Y | X)                                      │
    │                                                  │
    │    "When we observe X, what happens to Y?"       │
    │                                                  │
    │    This includes ALL pathways:                   │
    │    - X causing Y                                 │
    │    - Y causing X                                 │
    │    - Z causing both X and Y                      │
    │    - Pure coincidence                            │
    │                                                  │
    └──────────────────────────────────────────────────┘

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │              DOING (INTERVENTION)                │
    │                                                  │
    │    P(Y | do(X))                                  │
    │                                                  │
    │    "When we FORCE X, what happens to Y?"         │
    │                                                  │
    │    This isolates ONE pathway:                    │
    │    - X causing Y (and only this)                 │
    │                                                  │
    │    All other pathways into X are severed.        │
    │    X is no longer caused by anything.            │
    │    X simply IS.                                  │
    │                                                  │
    └──────────────────────────────────────────────────┘

The do-operator is not a statistical technique. It is a physical metaphor. When you do(X), you reach into the system and set X to a value by force. You sever every arrow pointing into X. You make X a root cause with no cause of its own.

Then you watch what happens downstream.

If Y changes, X causes Y.

If Y does not change, the association between X and Y was flowing through some other path.

This is the mechanical test for causation. Not correlation. Not temporal precedence. Intervention.

PART TWO: THE LIGHT CONE

The Speed Limit

Physics imposes a hard constraint on causality.

Nothing travels faster than light. This is not a technological limitation. It is not that we have not built fast enough ships. The speed of light is the speed limit of causation itself.

When Einstein published special relativity in 1905, he did not just unify space and time. He drew the boundary of what can cause what.

The Geometry of Cause

In Minkowski spacetime, every event has a light cone. This cone divides the entire universe into three regions relative to that event.

    THE LIGHT CONE

                        FUTURE
                     (can be caused)
                          ▲
                         /│\
                        / │ \
                       /  │  \
                      /   │   \
                     /    │    \
                    /     │     \
    ───────────────/──────┼──────\───────────────
    SPACELIKE     /       │       \    SPACELIKE
    (no causal   /        │        \   (no causal
     contact)   /    HERE AND       \   contact)
               /      NOW           \
              /           │          \
             /            │           \
            /             │            \
           /              │             \
                          ▼
                        PAST
                   (could have caused)

Inside the future cone: events this moment can influence. Light or slower signals can reach them. Causation is possible.

Inside the past cone: events that could have influenced this moment. Their signals had time to arrive.

Outside both cones: the spacelike region. Events here are causally disconnected. No signal, no force, no information of any kind can travel between them. They cannot cause each other. Not because of distance. Because of geometry.

The Invariant Order

Here is the fact that makes causality physical rather than conventional.

In special relativity, different observers disagree about simultaneity. Two events that one observer calls simultaneous, another observer calls sequential. The order of spacelike-separated events is observer-dependent.

But the order of causally connected events is not.

If A is inside B’s past light cone, then A precedes B in every reference frame. Every observer in the universe agrees on this ordering. No Lorentz transformation can reverse it.

    CAUSAL ORDER IS ABSOLUTE

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │   TIMELIKE SEPARATED EVENTS                      │
    │   (inside the light cone)                        │
    │                                                  │
    │   Order is the SAME for all observers.           │
    │   If A caused B, everyone agrees A came first.   │
    │   This is physically invariant.                  │
    │                                                  │
    └──────────────────────────────────────────────────┘

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │   SPACELIKE SEPARATED EVENTS                     │
    │   (outside the light cone)                       │
    │                                                  │
    │   Order DEPENDS on the observer.                 │
    │   Some say A first. Some say B first.            │
    │   Some say simultaneous.                         │
    │   None of this matters. They cannot cause        │
    │   each other regardless.                         │
    │                                                  │
    └──────────────────────────────────────────────────┘

Causality is not a human convention imposed on physics.

It is the invariant structure of spacetime itself.

PART THREE: THE ARROW OF TIME

The Paradox of Reversibility

The fundamental laws of physics are time-symmetric.

Newton’s equations run forward and backward identically. Maxwell’s equations have both retarded and advanced solutions. Quantum mechanics is unitary, meaning every process has a time-reversed counterpart.

And yet the world has a direction.

Eggs break. They do not unbreak. Coffee cools. It does not uncool. Memory records the past. Not the future.

Where does the arrow come from?

Entropy as the Arrow

The second law of thermodynamics provides the asymmetry.

In an isolated system, entropy never decreases. It increases or stays the same. This is not a fundamental law in the way Maxwell’s equations are fundamental. It is statistical. It describes what is overwhelmingly probable, not what is absolutely required.

But the numbers are so extreme that “overwhelmingly probable” becomes “effectively mandatory.”

    THE THERMODYNAMIC ARROW

    Entropy
         │
         │                              ████████████
    HIGH │                         █████
         │                    █████
         │               █████
         │          █████
    MED  │     █████
         │  ███
         │██
    LOW  │█
         │
         └────────────────────────────────────────────►
                                                  Time

    The microscopic laws do not enforce this direction.
    The statistics do.

    Number of high-entropy states >> low-entropy states.
    By factors of 10^(10^23).
    The system is not drawn forward.
    It simply has nowhere else to go.

The connection between entropy and causality runs deep. Recent work by Lucia and Grisolia (2021) defines time itself as the metric of causality, with entropy production providing the operational measure of temporal direction.

Causes precede effects because entropy increases. Entropy increases because the number of disordered states dwarfs the number of ordered states. The asymmetry is combinatorial. It is counting.

Retarded Potentials

In electrodynamics, this asymmetry appears as a choice.

Maxwell’s equations admit two types of solutions. Retarded solutions describe waves that propagate outward from a source, reaching distant points later. Advanced solutions describe waves that converge inward toward a source, arriving from the future.

Both are mathematically valid.

Physics uses the retarded solution. Not because the advanced solution is wrong. Because the retarded solution respects the thermodynamic arrow. Causes produce effects that radiate outward. Effects do not converge from the boundary to create causes.

    RETARDED VS ADVANCED SOLUTIONS

    RETARDED (physical):

         Source
           │
           ▼
    ───────●───────
          /│\
         / │ \
        /  │  \
       /   │   \
      ▼    ▼    ▼
    Waves propagate outward into the future


    ADVANCED (mathematical but unphysical):

      ▲    ▲    ▲
       \   │   /
        \  │  /
         \ │ /
          \│/
    ───────●───────
           │
           ▼
         Source
    Waves converge inward from the future

The Green’s function formalism makes this precise. The retarded Green’s function has support only in the future light cone of the source. It propagates influence forward. The advanced Green’s function has support in the past light cone. It propagates influence backward.

Choosing the retarded solution is choosing causality.

PART FOUR: THE GRAPH

Cause Has Structure

Causes do not exist in isolation. They form networks. Webs of influence where each effect becomes a cause for something downstream.

Pearl and others formalized this insight using directed acyclic graphs. A DAG.

    A CAUSAL GRAPH

        ┌───┐         ┌───┐
        │ X │         │ Z │
        └─┬─┘         └─┬─┘
          │              │
          ▼              ▼
        ┌───┐         ┌───┐
        │ W │────────►│ Y │
        └───┘         └───┘

    Arrows represent direct causal influence.
    X causes W.
    Z causes Y.
    W causes Y.
    X causes Y indirectly through W.

    The graph is DIRECTED: arrows have direction.
    The graph is ACYCLIC: no loops.
    Acyclicity enforces the arrow of time.
    Causes come before effects. Always.

The graph encodes more than sequence. It encodes conditional independence. If you know W, then X and Y become conditionally independent along the path through W. The information X provides about Y is fully captured by W. This is called d-separation, and it is the mathematical engine that lets causal structure be read from data.

The Three Elemental Structures

Every causal graph, no matter how complex, is built from three atomic patterns.

    THE THREE BUILDING BLOCKS

    1. CHAIN (mediation)

       ┌───┐     ┌───┐     ┌───┐
       │ A │────►│ B │────►│ C │
       └───┘     └───┘     └───┘

       A causes B causes C.
       Conditioning on B blocks
       the flow from A to C.


    2. FORK (common cause)

                 ┌───┐
            ┌────│ B │────┐
            │    └───┘    │
            ▼             ▼
          ┌───┐         ┌───┐
          │ A │         │ C │
          └───┘         └───┘

       B causes both A and C.
       A and C are correlated.
       But A does not cause C.
       Conditioning on B removes
       the correlation.


    3. COLLIDER (common effect)

          ┌───┐         ┌───┐
          │ A │         │ C │
          └─┬─┘         └─┬─┘
            │             │
            ▼             ▼
             ┌───┐
             │ B │
             └───┘

       A and C both cause B.
       A and C are NOT correlated.
       But conditioning on B
       CREATES a correlation.
       This is called collider bias.

The collider is the structure that creates the most confusion. It is the reason that controlling for a variable can introduce bias rather than remove it. The fork and chain tell you when to control. The collider tells you when controlling makes things worse.

Understanding these three structures is understanding the grammar of cause and effect.

PART FIVE: THE LADDER

Three Levels of Causal Reasoning

Pearl identified three levels of cognitive capability for reasoning about causes. He called it the Ladder of Causation.

Each level enables something the level below cannot do.

    THE LADDER OF CAUSATION

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  RUNG 3: COUNTERFACTUALS (IMAGINING)             │
    │                                                  │
    │  "What if things had been different?"             │
    │  "Was it X that caused Y?"                       │
    │                                                  │
    │  Requires: Structural Causal Model               │
    │  Enables:  Attribution, blame, explanation,      │
    │            reasoning about alternative worlds    │
    │                                                  │
    └──────────────────────────────────────────────────┘
                          ▲
                          │
    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  RUNG 2: INTERVENTION (DOING)                    │
    │                                                  │
    │  "What happens if I do X?"                       │
    │  "What is P(Y | do(X))?"                         │
    │                                                  │
    │  Requires: Causal graph + do-calculus             │
    │  Enables:  Prediction of effects of actions,     │
    │            policy decisions, experimental design │
    │                                                  │
    └──────────────────────────────────────────────────┘
                          ▲
                          │
    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  RUNG 1: ASSOCIATION (SEEING)                    │
    │                                                  │
    │  "What is the probability of Y given X?"         │
    │  "What is P(Y | X)?"                             │
    │                                                  │
    │  Requires: Data + statistics                     │
    │  Enables:  Correlation, prediction from          │
    │            patterns, curve fitting               │
    │                                                  │
    └──────────────────────────────────────────────────┘

Most of machine learning lives on Rung 1. It sees patterns. It cannot reason about what would happen if you changed something. It cannot answer “why.”

Randomized controlled trials operate on Rung 2. By intervening randomly, they sever confounding paths and isolate causal effects.

Only Rung 3 can answer questions about worlds that did not happen. “Would the patient have survived if we had given the drug?” This requires a complete structural model. A model of the mechanisms, not just the associations.

The ladder is not just a classification scheme. It is a hierarchy of computational capability. No amount of Rung 1 data can answer a Rung 2 question without additional assumptions. No amount of Rung 2 experiments can answer a Rung 3 question without a structural model. The levels are separated by walls that data alone cannot cross.

PART SIX: THE INFORMATION FLOW

Measuring Causal Influence

Information theory provides a different lens on causation. Not structural. Quantitative.

If X causes Y, then knowing the past of X should help predict the future of Y, above and beyond knowing the past of Y alone.

Clive Granger formalized this in 1969. Granger causality asks: does including the history of X improve the prediction of Y? If yes, X “Granger-causes” Y.

    GRANGER CAUSALITY

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  PREDICTION OF Y WITHOUT X:                      │
    │                                                  │
    │  Y(t) = f(Y(t-1), Y(t-2), ...) + error₁         │
    │                                                  │
    │  Error₁ = ████████████████████                   │
    │                                                  │
    └──────────────────────────────────────────────────┘

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  PREDICTION OF Y WITH X:                         │
    │                                                  │
    │  Y(t) = f(Y(t-1), ..., X(t-1), ...) + error₂    │
    │                                                  │
    │  Error₂ = ████████████                           │
    │                                                  │
    └──────────────────────────────────────────────────┘

    If error₂ < error₁, X Granger-causes Y.
    The reduction measures the strength of
    causal information flow.

Granger causality is not true causality. It can be fooled by confounders. Two variables driven by a common hidden cause will appear to Granger-cause each other. But it provides a quantitative measure. A number attached to the causal arrow.

Transfer Entropy

Thomas Schreiber reformulated this in information-theoretic terms in 2000. Transfer entropy measures the reduction in uncertainty about Y’s future when X’s past is known, beyond what Y’s own past already provides.

For Gaussian variables, Granger causality and transfer entropy are mathematically equivalent. They are the same measure expressed in different languages. Autoregressive prediction in one. Shannon information in the other.

But transfer entropy generalizes further. It captures nonlinear dependencies that Granger’s linear framework misses. It quantifies the directed information flow from X to Y without assuming a specific functional form.

    TRANSFER ENTROPY

    T(X → Y) = H(Y_future | Y_past) - H(Y_future | Y_past, X_past)

    ┌────────────────────────┐     ┌────────────────────────┐
    │                        │     │                        │
    │  UNCERTAINTY ABOUT Y   │     │  UNCERTAINTY ABOUT Y   │
    │  KNOWING ONLY Y'S     │     │  KNOWING Y'S PAST     │
    │  OWN PAST              │     │  AND X'S PAST          │
    │                        │     │                        │
    │  ████████████████████  │     │  ████████████          │
    │                        │     │                        │
    └────────────────────────┘     └────────────────────────┘

                        │                    │
                        └────────┬───────────┘
                                 │
                                 ▼

                    ┌────────────────────────┐
                    │                        │
                    │  TRANSFER ENTROPY      │
                    │                        │
                    │  ████████              │
                    │                        │
                    │  The information that  │
                    │  flowed from X to Y.   │
                    │                        │
                    └────────────────────────┘

This is causality measured in bits. The currency of information theory applied to the currency of cause and effect.

PART SEVEN: THE CONFOUND

The Central Problem

The deepest problem in causal inference is the confound.

A confounding variable is a common cause of both the treatment and the outcome. It creates a correlation that looks causal but is not. The association is real. The causal interpretation is wrong.

    THE CONFOUNDING STRUCTURE

              ┌───────────────────┐
              │                   │
              │    CONFOUNDER     │
              │       (Z)        │
              │                   │
              └─────┬───────┬─────┘
                    │       │
                    ▼       ▼
              ┌─────────┐  ┌─────────┐
              │         │  │         │
              │    X    │  │    Y    │
              │         │  │         │
              └─────────┘  └─────────┘

    X and Y are correlated.
    But X does not cause Y.
    Z causes both.

    Observing P(Y | X) ≠ P(Y | do(X))

    The association is real.
    The causal arrow is not.

Ice cream sales and drowning deaths are correlated. Ice cream does not cause drowning. Summer causes both. The temperature is the confounder.

This is obvious when the confounder is known. It is catastrophic when it is not.

Simpson’s Paradox

The confound reaches its most extreme form in Simpson’s paradox.

A trend that appears in every subgroup of data reverses when the subgroups are combined. Or a trend that appears in the aggregate reverses in every subgroup.

This is not a statistical curiosity. It is a demonstration that data without causal structure is meaningless. The numbers alone cannot tell you which way the effect goes. Only the causal graph can resolve the paradox.

    SIMPSON'S PARADOX

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  AGGREGATED DATA:                                │
    │                                                  │
    │  Treatment A appears BETTER than Treatment B     │
    │                                                  │
    │  A success rate: 83%  ████████████████████████   │
    │  B success rate: 78%  ████████████████████       │
    │                                                  │
    └──────────────────────────────────────────────────┘

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  DISAGGREGATED DATA:                             │
    │                                                  │
    │  Group 1 (mild cases):                           │
    │  A success rate: 87%  ████████████████████████   │
    │  B success rate: 93%  ████████████████████████   │
    │                                                  │
    │  Group 2 (severe cases):                         │
    │  A success rate: 73%  ██████████████████         │
    │  B success rate: 82%  ████████████████████       │
    │                                                  │
    │  B is BETTER in EVERY subgroup.                  │
    │  But A looks better overall.                     │
    │                                                  │
    └──────────────────────────────────────────────────┘

    The reversal occurs because severity is a
    confound. More severe patients got Treatment A.
    The aggregate mixes severity with treatment effect.

    Which answer is correct?
    The data cannot tell you.
    Only the causal graph can.

Simpson’s paradox proves that “let the data speak” is not a valid methodology. Data speaks in the language of association. The question is asked in the language of causation. Translation requires structure.

PART EIGHT: THE EMERGENCE

Causation Across Scales

Erik Hoel posed a question that disturbs the reductionist assumption. Can macro-level descriptions be more causal than micro-level descriptions?

The standard view says no. Micro causes everything. Macro is just a summary. Go small enough and you have the full causal story.

Hoel measured this using effective information. A quantity that captures how deterministically a system’s current state predicts its next state. High effective information means precise, reliable causation. Low effective information means noisy, uncertain causation.

The result: some systems have higher effective information at the macro scale than the micro scale.

    CAUSAL EMERGENCE

    MICRO LEVEL:
    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  256 micro-states                                │
    │  Each state has 256 possible transitions         │
    │  Transition probabilities are nearly uniform     │
    │                                                  │
    │  Effective Information: LOW                      │
    │  ████                                            │
    │                                                  │
    │  Each micro-state weakly determines the next.    │
    │  Causation is diffuse. Noisy. Unreliable.        │
    │                                                  │
    └──────────────────────────────────────────────────┘

                          │
                          │  Coarse-grain
                          ▼

    MACRO LEVEL:
    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  4 macro-states                                  │
    │  Each state has 1 deterministic transition        │
    │  Transition probabilities are 1.0                │
    │                                                  │
    │  Effective Information: HIGH                     │
    │  ████████████████████████                        │
    │                                                  │
    │  Each macro-state perfectly determines the next. │
    │  Causation is sharp. Clean. Reliable.            │
    │                                                  │
    └──────────────────────────────────────────────────┘

This is not a philosophical argument about levels of description. It is a measurement. The macro scale literally carries more causal information per state transition than the micro scale.

The noise in the micro-level cancels. The structure that remains at the macro level is the causal structure. The noise was never part of the causal story. It was hiding the causal story.

This has implications for every system where macroscopic descriptions seem more explanatory than microscopic ones. Cells are more causal than molecules in some contexts. Markets are more causal than individual traders. Weather patterns are more causal than individual air molecules.

Not approximately. Measurably.

PART NINE: THE QUANTUM BOUNDARY

Where Causality Strains

Quantum mechanics does not violate causality. But it pushes against its edges in ways that reveal the concept’s limits.

Two entangled particles are measured at spacelike separation. The outcomes are correlated in ways that no local hidden variable theory can reproduce. Bell’s theorem proves this. Experiments confirm it.

And yet no signal passes between them.

    ENTANGLEMENT AND CAUSALITY

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │              THE FACTS:                           │
    │                                                  │
    │  1. Alice and Bob share entangled particles      │
    │  2. They are spacelike separated                 │
    │  3. Their measurement outcomes are correlated    │
    │  4. The correlations violate Bell inequalities   │
    │  5. No local hidden variable model can produce   │
    │     these correlations                           │
    │                                                  │
    └──────────────────────────────────────────────────┘
                          │
            ┌─────────────┴─────────────┐
            │                           │
            ▼                           ▼
    ┌───────────────┐          ┌───────────────┐
    │               │          │               │
    │   WHAT IT IS  │          │ WHAT IT ISN'T │
    │               │          │               │
    │ Correlations  │          │ Causation     │
    │ that exceed   │          │               │
    │ any classical │          │ Alice's       │
    │ explanation   │          │ measurement   │
    │               │          │ does not      │
    │ The quantum   │          │ CAUSE Bob's   │
    │ state is      │          │ outcome       │
    │ non-separable │          │               │
    │               │          │ No signal     │
    │               │          │ is sent       │
    │               │          │               │
    │               │          │ No usable     │
    │               │          │ information   │
    │               │          │ is            │
    │               │          │ transmitted   │
    │               │          │               │
    └───────────────┘          └───────────────┘

The no-signaling theorem is the guardrail. While the correlations are nonlocal, they cannot be used to transmit information faster than light. Alice cannot encode a message that Bob can decode by measuring his particle. The correlations are real. The causation is not.

This forces a distinction. Between correlation and causation. Between influence and signaling. Between the structure of the quantum state and the structure of causal influence.

Quantum mechanics preserves the causal structure of spacetime. The light cone is still the boundary. No effect can precede its cause in any reference frame. But the correlations that exist within that boundary are stranger than any classical causal model can accommodate.

Indefinite Causal Order

Recent theoretical work pushes further. In certain quantum gravity scenarios, the causal order itself may become indefinite. Not A causes B. Not B causes A. A superposition of both orderings simultaneously.

The quantum switch is the paradigmatic example. A process where whether A happens before B or B happens before A is controlled by a quantum bit in superposition. The causal order is not fixed. It is a quantum variable.

This does not destroy causality. It generalizes it. Classical causality assumes a fixed background causal order. Quantum causal structures may relax this assumption while preserving the prohibition on signaling.

The light cone is still the law. But within that law, the ordering of events may be less rigid than classical physics assumed.

PART TEN: THE CONSTRAINTS

What Causality Cannot Do

Causal reasoning has hard limits. Not practical limits. Structural ones.

The Identification Problem

Not every causal question can be answered from data, no matter how much data you have.

    THE IDENTIFICATION WALL

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  IDENTIFIABLE                                    │
    │                                                  │
    │  Causal effect can be computed from              │
    │  observational data + causal graph               │
    │                                                  │
    │  Example: front-door criterion,                  │
    │           back-door criterion                    │
    │                                                  │
    └──────────────────────────────────────────────────┘

    ┌──────────────────────────────────────────────────┐
    │                                                  │
    │  NON-IDENTIFIABLE                                │
    │                                                  │
    │  Causal effect CANNOT be computed from           │
    │  observational data regardless of sample size    │
    │                                                  │
    │  Example: unmeasured confounder with no          │
    │           alternative adjustment path            │
    │                                                  │
    │  More data does not help.                        │
    │  Better statistics do not help.                  │
    │  The information is not in the data.             │
    │  It is in the structure.                         │
    │  And the structure is missing.                   │
    │                                                  │
    └──────────────────────────────────────────────────┘

This is not a failure of method. It is a theorem. Pearl’s do-calculus is complete. If a causal effect is identifiable from the graph and data, the do-calculus will find the identifying formula. If the do-calculus cannot find it, no other method can either. The information is structurally absent.

The Problem of Causal Sufficiency

Every causal graph assumes you have included all common causes. This assumption is called causal sufficiency. It is almost never true.

There is always something you have not measured. Something you have not thought of. Something operating outside the frame of your model.

This is not a reason to abandon causal reasoning. It is a reason to hold causal conclusions at the epistemic weight they deserve. Strong when the graph is well-supported. Tentative when the graph is assumed.

The Asymmetry of Proof

Causation is easier to disprove than to prove.

A single randomized experiment can establish a causal effect. But establishing the absence of causation requires ruling out every possible pathway. Every confounder. Every mediator. Every feedback loop.

    THE ASYMMETRY

    PROVING CAUSATION:

    Intervene on X.
    Observe Y changes.
    Done.

    One experiment. One intervention.
    The severed graph tells the story.


    DISPROVING CAUSATION:

    Must show no pathway from X to Y.
    Through ANY mediator.
    Through ANY mechanism.
    In ANY time frame.
    Through ANY nonlinear interaction.

    This is open-ended.
    There is always another pathway
    you have not checked.

PART ELEVEN: THE COMPLETE PICTURE

The Unified Framework

Causality is not one concept. It is a structure with multiple faces.

    THE COMPLETE FRAMEWORK OF CAUSALITY

    ┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │                  CAUSAL STRUCTURE                       │
    │                                                         │
    │    The invariant ordering of events where               │
    │    intervention on A can change B                       │
    │                                                         │
    └─────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │                 │ │                 │ │                 │
    │  GEOMETRIC      │ │  STRUCTURAL     │ │  INFORMATIONAL  │
    │                 │ │                 │ │                 │
    │  Light cones    │ │  DAGs and       │ │  Transfer       │
    │  Spacetime      │ │  do-calculus    │ │  entropy        │
    │  intervals      │ │  Counterfactuals│ │  Granger        │
    │  Lorentz        │ │  SCMs           │ │  causality      │
    │  invariance     │ │                 │ │  Effective       │
    │                 │ │                 │ │  information    │
    └─────────────────┘ └─────────────────┘ └─────────────────┘
              │               │               │
              └───────────────┼───────────────┘
                              │
                              ▼
    ┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │               THE OPERATING CONSTRAINTS                 │
    │                                                         │
    │  1. Speed of light limits causal reach                  │
    │  2. Entropy determines causal direction                 │
    │  3. Confounders corrupt causal inference                │
    │  4. Identification walls bound what can be known        │
    │  5. Scale affects causal resolution                     │
    │  6. Quantum mechanics preserves but generalizes         │
    │     the causal order                                    │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

The geometric face. Light cones and spacetime intervals. What can reach what. The hard boundary imposed by the speed of light. The invariant ordering that all observers agree on.

The structural face. Directed acyclic graphs and the do-calculus. The grammar of cause and effect. Chains, forks, and colliders. The three rungs of the causal ladder. The intervention test that separates correlation from causation.

The informational face. Transfer entropy and effective information. Causality measured in bits. The flow of predictive information from past to future. The emergence of causal structure at macro scales.

The Core Truth

Causality is not a feeling. Not an intuition. Not a philosophical position.

It is the structure that determines what can affect what.

At the smallest scale, it is the geometry of spacetime. The light cone carving the universe into what is reachable and what is not.

At the statistical scale, it is the difference between seeing and doing. Between P(Y

X) and P(Y

do(X)). Between pattern recognition and understanding.

At the thermodynamic scale, it is the arrow of time. Entropy increasing. Effects radiating outward from causes. The past cone filling the future cone.

At the information scale, it is directed flow. Bits moving from cause to effect. Uncertainty reduced by knowing the right history.

At the quantum scale, it is preserved but generalized. The light cone holds. No signal violates it. But within that boundary, the correlations are stranger than any classical model of cause and effect can produce.

The same structure. Different resolutions.

The rooster does not cause the sunrise. But something does. And the machinery that determines what that something is, and how you would prove it, and why the question even makes sense in a universe of pure physics.

That is causality.

Not the label. The geometry underneath the label.

The structure of what makes what happen.

CITATIONS

Foundational Physics

Causality and Special Relativity

Einstein, A. (1905). “Zur Elektrodynamik bewegter Körper.” Annalen der Physik, 322(10):891-921.

Minkowski, H. (1908). “Die Grundgleichungen für die elektromagnetischen Vorgänge in bewegten Körpern.” Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1908:53-111.

Causal Structure of Spacetime

Penrose, R. (1972). “Techniques of Differential Topology in Relativity.” SIAM, Philadelphia.

Hawking, S.W. & Ellis, G.F.R. (1973). “The Large Scale Structure of Space-Time.” Cambridge University Press.

Malament, D. (1977). “The class of continuous timelike curves determines the topology of spacetime.” Journal of Mathematical Physics, 18(7):1399-1404.

Thermodynamics and the Arrow of Time

Entropy and Causality

Lucia, U. & Grisolia, G. (2021). “Thermodynamic Definition of Time: Considerations on the EPR Paradox.” Mathematics, 10(1):35. https://inspirehep.net/files/e488c7c77018ce8e1f91c579b351a0ed

Mlodinow, L. & Brun, T. (2014). “Relating the thermodynamic arrow of time to the causal arrow.” Physical Review E, 89:052102. https://arxiv.org/pdf/0708.1175

Retarded Potentials

Jackson, J.D. (1998). “Classical Electrodynamics.” 3rd edition. Wiley.

Wheeler, J.A. & Feynman, R.P. (1945). “Interaction with the Absorber as the Mechanism of Radiation.” Reviews of Modern Physics, 17(2-3):157-181.

Causal Inference and Graphical Models

Pearl’s Framework

Pearl, J. (2009). “Causality: Models, Reasoning, and Inference.” 2nd edition. Cambridge University Press.

Pearl, J. (1995). “Causal diagrams for empirical research.” Biometrika, 82(4):669-688.

Pearl, J. & Mackenzie, D. (2018). “The Book of Why: The New Science of Cause and Effect.” Basic Books.

Do-Calculus

Shpitser, I. & Pearl, J. (2006). “Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models.” Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06). https://arxiv.org/pdf/1305.5506

Simpson’s Paradox

Pearl, J. (2014). “Comment: Understanding Simpson’s Paradox.” The American Statistician, 68(1):8-13. https://pmc.ncbi.nlm.nih.gov/articles/PMC2880329/

Information-Theoretic Causality

Granger Causality

Granger, C.W.J. (1969). “Investigating Causal Relations by Econometric Models and Cross-spectral Methods.” Econometrica, 37(3):424-438.

Transfer Entropy

Schreiber, T. (2000). “Measuring Information Transfer.” Physical Review Letters, 85(2):461-464.

Barnett, L., Barrett, A.B. & Seth, A.K. (2009). “Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables.” Physical Review Letters, 103(23):238701. https://arxiv.org/pdf/0910.4514

Causal Emergence

Effective Information

Hoel, E.P., Albantakis, L. & Tononi, G. (2013). “Quantifying causal emergence shows that macro can beat micro.” Proceedings of the National Academy of Sciences, 110(49):19790-19795.

Hoel, E.P. (2017). “When the Map Is Better Than the Territory.” Entropy, 19(5):188. https://www.theintrinsicperspective.com/p/a-primer-on-causal-emergence

Comolatti, R. & Hoel, E.P. (2022). “Causal emergence is widespread across measures of causation.” https://arxiv.org/pdf/2202.01854

Quantum Causality

Bell’s Theorem and Non-locality

Bell, J.S. (1964). “On the Einstein Podolsky Rosen paradox.” Physics Physique Fizika, 1(3):195-200.

Aspect, A., Dalibard, J. & Roger, G. (1982). “Experimental Realization of Einstein-Podolsky-Rosen-Bohm Gedankenexperiment: A New Violation of Bell’s Inequalities.” Physical Review Letters, 49(2):91-94.

Indefinite Causal Order

Oreshkov, O., Costa, F. & Brukner, Č. (2012). “Quantum correlations with no causal order.” Nature Communications, 3:1092.

Procopio, L.M. et al. (2015). “Experimental superposition of orders of quantum gates.” Nature Communications, 6:7913.

Philosophy of Causation

Foundational Arguments

Hume, D. (1739). “A Treatise of Human Nature.” Book I, Part III.

Woodward, J. (2003). “Making Things Happen: A Theory of Causal Explanation.” Oxford University Press.

Document compiled from foundational physics, causal inference theory, information theory, and quantum foundations literature.

THE MACHINERY OF ENTROPY. Entropy determines the direction of causation. The arrow of time is the thermodynamic arrow, and causes precede effects because high-entropy states vastly outnumber low-entropy states.
THE MACHINERY OF EMERGENCE. Causal emergence (Hoel) shows that macro-level descriptions can carry more causal information than micro-level descriptions. The coarse-graining that defines emergent levels is the same operation that sharpens causal structure.
THE MACHINERY OF INFORMATION. Transfer entropy and Granger causality measure causal influence in bits. Information theory provides the quantitative language for the informational face of causality.
THE MACHINERY OF CONSTRAINTS. The light cone constrains causal reach. The identification wall constrains what causal effects can be computed from data. Every limit on causal reasoning is a constraint on the degrees of freedom available to inference.
THE MACHINERY OF LOCALITY. Locality defines the geometry that causality must obey. Every causal chain propagates through adjacent points at finite speed. The light cone is locality made geometric.

PUBLISH TO FREE