Mathematical motivation ======================= This page derives, in one place, every statistic that :class:`~smmargins.Margins` computes and the standard error attached to each one. The motivation is identical to Stata's `margins-delta-method FAQ `_ and the cases enumerated in Richard Williams' `Margins01 notes `_; the goal here is to write them down using a single primitive so the implementation reduces to one Jacobian. The delta method ---------------- For a fitted parameter vector :math:`\hat\beta` with estimated covariance :math:`\widehat V`, and a (possibly vector-valued) statistic :math:`g(\beta)`, a first-order Taylor expansion gives .. math:: g(\hat\beta) \;\approx\; g(\beta_0) + G\,(\hat\beta - \beta_0), \qquad G = \left.\tfrac{\partial g}{\partial \beta}\right|_{\beta_0}. Taking variances on both sides yields .. math:: \widehat{\operatorname{Var}}\bigl[g(\hat\beta)\bigr] \;\approx\; G\,\widehat V\,G^\top . Three things to notice: 1. The approximation depends only on **first** derivatives of :math:`g`. 2. Once :math:`G\widehat V G^\top` is in hand, *any* linear combination :math:`C\,g(\hat\beta)` has covariance :math:`C(G\widehat V G^\top)C^\top` — no further differentiation is needed. That is why :meth:`~smmargins.MarginsResult.contrast` is exact under the same approximation. 3. The standard error of :math:`g(\hat\beta)` is just the square-root of the diagonal of :math:`G\widehat V G^\top`; we report both. A single statistic schema ------------------------- Let :math:`f(\eta) = E[Y \mid \eta]` be the response-scale map (the identity for OLS, the inverse link for a GLM, etc.). Williams enumerates seven statistics; every one can be written as a linear combination of mean predictions on different design matrices: .. list-table:: :header-rows: 1 :widths: 15 60 * - Statistic - :math:`g(\beta)` * - AAP - :math:`\frac{1}{n}\sum_i f(x_i^\top\beta)` * - APM - :math:`f(\bar x^\top\beta)` * - APR - :math:`\frac{1}{n}\sum_i f(x_i^\top\beta)` with some :math:`x` fixed * - AME - :math:`\frac{1}{n}\sum_i \partial f(x_i^\top\beta)/\partial x_{ij}` * - MEM - :math:`\partial f(\bar x^\top\beta)/\partial x_j` * - MER - AME with some :math:`x` held at representative values * - Contrast - :math:`E[f(\cdot)\mid x_j=\text{level}] - E[f(\cdot)\mid x_j=\text{ref}]` Every entry is a linear combination of mean-predictions :math:`\frac{1}{n}\sum_i f(x_i^\top\beta)` on different design matrices. The single Jacobian primitive is therefore .. math:: \frac{\partial}{\partial\beta}\, \frac{1}{n}\sum_i f(x_i^\top\beta) \;=\; \frac{1}{n}\sum_i f'(x_i^\top\beta)\,x_i . Substituting back into the delta method gives a closed form for the covariance of every statistic in the table. Marginal effects of a single variable ------------------------------------- For a continuous covariate :math:`x_{ij}`, .. math:: \frac{\partial f(x_i^\top\beta)}{\partial x_{ij}} \;=\; f'(x_i^\top\beta)\,\frac{\partial (x_i^\top\beta)}{\partial x_{ij}} . If the design matrix is just :math:`X = [\,\dots,\, x_j,\,\dots]`, the inner derivative is :math:`\beta_j` and the AME collapses to the familiar :math:`\overline{f'(x^\top\beta)}\,\beta_j`. With a formula involving ``I(x**2)``, ``x1:x2``, ``bs(x, df=4)``, or a :class:`~patsy.Categorical` contrast, the inner derivative is whatever patsy produces when the *data column* is perturbed and the design matrix is rebuilt — which is exactly what :class:`~smmargins.Margins` does. Discrete contrasts (factors, booleans, low-cardinality numerics) are handled by computing :math:`f` at the level and at the reference, and differencing — the delta-method covariance follows from stacking the two predictions and applying the contrast matrix :math:`C = [-1, +1]`. Elasticities ------------ The four ``method=`` choices on :meth:`~smmargins.Margins.dydx` apply a per-observation transform to the raw derivative :math:`d_i = \partial f(x_i^\top\beta)/\partial x_{ij}` before averaging: .. list-table:: :header-rows: 1 :widths: 10 30 60 * - Method - Per-obs transform - Interpretation * - ``dydx`` - :math:`d_i` - Level change (default). * - ``dyex`` - :math:`d_i\,x_i` - Semi-elasticity, :math:`dy/d(\ln x)`. * - ``eyex`` - :math:`d_i\,x_i / y_i` - Full elasticity. * - ``eydx`` - :math:`d_i / y_i` - Semi-elasticity, :math:`d(\ln y)/dx`. Elasticities are only defined for continuous variables; the implementation raises if you ask for an elasticity on a discrete one. Outer Jacobian: analytic vs. finite differences ----------------------------------------------- Stata's ``margins`` computes the entire :math:`G` numerically. We compute it analytically when possible: - For any GLM, :math:`f'` is read off ``family.link.inverse_deriv``. This covers Logit, Probit, Poisson, NegBin, Gamma, and Gaussian out of the box. - For ``OLS`` / ``WLS`` / ``GLS``, :math:`f' \equiv 1` (identity link). - For ``MNLogit``, we use an analytic softmax-derivative gradient. Whenever an analytic :math:`f'` or Jacobian is available **and** the linear predictor is :math:`\eta = X\beta` (no offset/exposure), the chain rule gives :math:`\partial g/\partial\beta` in closed form, saving :math:`p` forward ``predict`` calls per statistic. The fallback is central differences, .. math:: \frac{\partial f}{\partial x_j} \approx \frac{f(x + h e_j) - f(x - h e_j)}{2 h_j}, \qquad h_j = \epsilon^{1/3}\cdot\max(|x_j|, 1), with :math:`h_j` chosen to balance truncation error (:math:`O(h^2)`) against round-off (:math:`O(\epsilon/h)`); the optimum scales as :math:`h \sim \epsilon^{1/3}`. See :func:`~smmargins.utils._central_jacobian` and Nocedal & Wright, *Numerical Optimization* (2006), Chapter 8. - **Analytic-vs-FD parity matrix.** Every public API call (AAP, APM, APR, AME, MEM, MER — continuous and discrete) is run twice on each of OLS+poly, Logit+`C(grp)`, Poisson, and MNLogit, with `analytic=True` and `analytic=False`, and the two paths must agree on both estimate and SE. This guards against the analytic chain-rule formula drifting from the FD answer that the other tests pin to Stata / `get_margeff`. Beyond delta: simulation and bootstrap VCEs ------------------------------------------- The delta method is a first-order approximation: :math:`G\widehat V G^\top` is exact only to the extent that :math:`g` is locally linear in :math:`\beta` near :math:`\hat\beta`. For statistics that are sharply nonlinear (a probability near 0 or 1, an elasticity at a near-zero prediction) this can produce symmetric Wald intervals that extend beyond the natural support, or under-cover when the curvature is non-negligible. ``smmargins`` exposes two alternatives behind the same ``vce=`` keyword. **Krinsky–Robb simulation** (``vce="simulation"``). Draw :math:`S` parameter vectors :math:`\beta_s\sim N(\hat\beta,\widehat V)`, evaluate :math:`g(\beta_s)` for each, and read inference off the empirical distribution of the draws: .. math:: \widehat{\operatorname{Var}}_{\mathrm{KR}}\bigl[g(\hat\beta)\bigr] = \frac{1}{S-1}\sum_{s=1}^S \bigl(g(\beta_s) - \bar g\bigr)\!\bigl(g(\beta_s) - \bar g\bigr)^{\!\top}. The reported point estimate stays the *analytic* :math:`g(\hat\beta)`, not :math:`\bar g`; this matches Stata's ``vce(simulation)`` convention and avoids spurious bias from finite :math:`S`. Pointwise intervals default to empirical quantiles of the draws and so are not forced to be symmetric — an asymmetric CI for an extreme probability is the practical reason to reach for KR. **Bootstrap** (``vce="bootstrap"``). Refit the model on :math:`B` resamples of the rows and read inference off the bootstrap distribution. Three resampling schemes are built in (``boot_method=``): - ``"pairs"`` — :math:`(y_i, x_i)` resampled IID (default); - ``"cluster"`` — whole clusters resampled with replacement, required for robustness to within-cluster correlation; - ``"block"`` — moving-block resampling for time series, with ``block_size=`` chosen to span the dependence horizon. Failed refits are caught and counted; a ``RuntimeWarning`` fires when the failure rate exceeds 5%. The point estimate is again the analytic :math:`g(\hat\beta)`, not the bootstrap mean. Simultaneous confidence intervals --------------------------------- For a family of :math:`m` margins returned in a single call, ``ci_method=`` controls the critical value. With pointwise critical value :math:`z_{1-\alpha/2}`: - ``"bonferroni"`` uses :math:`z_{1-\alpha/(2m)}` — works with any VCE; - ``"sidak"`` uses :math:`z_{(1+(1-\alpha)^{1/m})/2}`, slightly narrower than Bonferroni — also works with any VCE; - ``"sup-t"`` uses the :math:`(1-\alpha)` quantile of the maximum standardised absolute deviation across the family, computed from the simulation/bootstrap draw matrix: .. math:: c_{\sup\text{-}t}^{\,(1-\alpha)} = Q_{1-\alpha}\!\left( \max_{j=1,\dots,m} \left|\frac{g_j(\beta_s) - g_j(\hat\beta)}{\widehat\sigma_j}\right| \right). Sup-t requires draws (so ``vce`` must be ``"simulation"`` or ``"bootstrap"``) and is typically narrower than Bonferroni / Šidák when the margins in the family are correlated — exactly the case for margins evaluated at neighbouring covariate profiles. Why the response scale matters for DiD (Ai & Norton 2003) --------------------------------------------------------- In a logit, the coefficient on a ``group:condition`` interaction is on the *log-odds* scale. On the *probability* scale, the difference-in-differences .. math:: \mathrm{DiD}(x_*) = \bigl[f(\eta_{1,1}) - f(\eta_{1,0})\bigr] - \bigl[f(\eta_{0,1}) - f(\eta_{0,0})\bigr] is a nonlinear function of *every* parameter and *every* covariate profile :math:`x_*`. You cannot read it off the interaction coefficient. :meth:`~smmargins.Margins.did` evaluates the four cells on the response scale, with their joint delta-method covariance, and forms the DiD as a contrast — Ai & Norton's recommendation in code. References ---------- - Stata FAQ, *How are the standard errors computed with margins?* https://www.stata.com/support/faqs/statistics/compute-standard-errors-with-margins/ - Williams, R. (2012). *Using the margins command to estimate and interpret adjusted predictions and marginal effects.* Stata Journal, 12(2), 308–331. https://www3.nd.edu/~rwilliam/stats/Margins01.pdf - Ai, C., & Norton, E. C. (2003). *Interaction terms in logit and probit models.* Economics Letters, 80(1), 123–129. https://doi.org/10.1016/S0165-1765(03)00032-6 - Nocedal, J., & Wright, S. J. (2006). *Numerical Optimization*, 2nd ed., Springer. Chapter 8.