# Why `Margins.contrast` exists: joint covariance for two arms ## The problem A common task is to compute a *contrast* between two counterfactual specifications — for example, the population average treatment effect (PATE): $$ \Delta \;=\; \mathbb{E}\bigl[g_A(\beta)\bigr] \;-\; \mathbb{E}\bigl[g_B(\beta)\bigr], $$ where $g_A$ averages the predicted response with `treat=1` and $g_B$ does the same with `treat=0`. A user could compute this naïvely: ```python r_a = M.predict(values={"treat": 1}) r_b = M.predict(values={"treat": 0}) diff = r_a.estimate - r_b.estimate # correct naive_se = np.sqrt(r_a.se**2 + r_b.se**2) # WRONG ``` The point estimate is fine. **The standard error is wrong**, often badly. ## Why the naïve SE is wrong The two predictions $g_A$ and $g_B$ depend on the *same* $\hat\beta$. Their sampling errors are correlated — strongly, since they share every coefficient. The variance of the difference is: $$ \operatorname{Var}\bigl(g_A(\hat\beta) - g_B(\hat\beta)\bigr) \;=\; \operatorname{Var}(g_A) + \operatorname{Var}(g_B) - 2\,\operatorname{Cov}(g_A, g_B). $$ The naïve formula sets the covariance to zero. For PATE in a logit, where $g_A$ and $g_B$ differ only by one regressor, the covariance term is typically **the same order of magnitude as the variances**, and the naïve SE can be off by a factor of 2 or more. ## The delta-method derivation Let $J_A$ and $J_B$ be the Jacobians of $g_A$ and $g_B$ with respect to $\beta$, evaluated at $\hat\beta$. Stack them: $$ J \;=\; \begin{bmatrix} J_A \\ J_B \\ J_A - J_B \end{bmatrix}, \qquad g(\hat\beta) \;=\; \begin{bmatrix} g_A(\hat\beta) \\ g_B(\hat\beta) \\ g_A(\hat\beta) - g_B(\hat\beta) \end{bmatrix}. $$ The first-order delta-method approximation gives a single joint covariance matrix: $$ \operatorname{Var}\bigl(g(\hat\beta)\bigr) \;\approx\; J\,\widehat V\,J^\top, $$ where $\widehat V$ is the estimated covariance of $\hat\beta$. The (3, 3) block of this matrix is the *correct* variance of the difference — and it automatically picks up the off-diagonal $\operatorname{Cov}(g_A, g_B)$ because of the way $J_A - J_B$ is built into the stack. Equivalently, observe that $$ \bigl[J\,\widehat V\,J^\top\bigr]_{33} \;=\; J_A \widehat V J_A^\top + J_B \widehat V J_B^\top - 2\,J_A \widehat V J_B^\top, $$ so the joint computation is simply doing the bookkeeping the user would otherwise have to do by hand. ## What the implementation does `Margins.contrast` (`smmargins/core.py:1442`) builds exactly that stacked statistic and Jacobian: 1. Resolve arm A's frames via the same path `predict()` uses (with `values=a` or `a_newdata=`). 2. Resolve arm B's frames the same way. 3. Form $g(\beta) = [g_A(\beta), g_B(\beta), g_A(\beta) - g_B(\beta)]$ as a single vector-valued statistic. 4. Form $J(\beta)$ by stacking $J_A$, $J_B$, and $J_A - J_B$. 5. Pass through the standard `_delta` runner. Inference kwargs (`vce=`, `cov_type=`, `ci_method=`) compose unchanged. The result is a standard `MarginsResult` with three rows. The third row's SE is the joint-covariance SE. The full $3 \times 3$ block is available as `result.vcov` for users who want to construct further linear combinations. ## Composition with non-delta inference For `vce="bootstrap"` and `vce="simulation"`, the joint computation is *automatic*: a single resample (or a single $\hat\beta$ draw) produces both $g_A^{(b)}$ and $g_B^{(b)}$, and the contrast at draw $b$ is just $g_A^{(b)} - g_B^{(b)}$. The empirical covariance of those draws gives the joint covariance with no extra work — the bookkeeping is implicit in the resampling design. For `ci_method="sup-t"`, the simultaneous CI uses the joint draw matrix across all three rows, so the band on the difference is calibrated against $g_A$ and $g_B$ jointly. ## Tradeoffs and pitfalls - **`a_newdata` and `b_newdata` should usually be the same length.** They don't have to be — `M.contrast(a_newdata=df_treated, b_newdata=df_control)` is well-defined when treated and control populations differ — but the contrast then estimates a *target-population-specific* effect, not PATE on the original sample. Document which population you mean. - **The joint $\widehat V$ assumption requires the two arms to be evaluated against the same $\hat\beta$.** If you fit two separate models and want to contrast them, this machinery is the wrong tool — you need a different covariance estimator (typically a stacked-GMM or seemingly-unrelated approach). - **Bootstrap arm consistency.** Each resample refits once and computes both arms against the refit's coefficients. Don't bootstrap arms independently — that throws away the very correlation the joint computation is designed to capture. ## See also - Tutorial: contrast walkthrough (forthcoming). - API: `smmargins.Margins.contrast` (see {doc}`../api`). - Theory: {doc}`../math` for the general delta-method derivation underlying every `MarginsResult`.