Mathematical motivation
=======================
This page derives, in one place, every statistic that
:class:`~smmargins.Margins` computes and the standard error attached to
each one. The motivation is identical to Stata's `margins-delta-method
FAQ `_
and the cases enumerated in Richard Williams' `Margins01 notes
`_; the goal
here is to write them down using a single primitive so the
implementation reduces to one Jacobian.
The delta method
----------------
For a fitted parameter vector :math:`\hat\beta` with estimated
covariance :math:`\widehat V`, and a (possibly vector-valued) statistic
:math:`g(\beta)`, a first-order Taylor expansion gives
.. math::
g(\hat\beta) \;\approx\; g(\beta_0)
+ G\,(\hat\beta - \beta_0),
\qquad
G = \left.\tfrac{\partial g}{\partial \beta}\right|_{\beta_0}.
Taking variances on both sides yields
.. math::
\widehat{\operatorname{Var}}\bigl[g(\hat\beta)\bigr]
\;\approx\; G\,\widehat V\,G^\top .
Three things to notice:
1. The approximation depends only on **first** derivatives of :math:`g`.
2. Once :math:`G\widehat V G^\top` is in hand, *any* linear combination
:math:`C\,g(\hat\beta)` has covariance :math:`C(G\widehat V G^\top)C^\top`
— no further differentiation is needed. That is why
:meth:`~smmargins.MarginsResult.contrast` is exact under the same
approximation.
3. The standard error of :math:`g(\hat\beta)` is just the square-root
of the diagonal of :math:`G\widehat V G^\top`; we report both.
A single statistic schema
-------------------------
Let :math:`f(\eta) = E[Y \mid \eta]` be the response-scale map (the
identity for OLS, the inverse link for a GLM, etc.). Williams
enumerates seven statistics; every one can be written as a linear
combination of mean predictions on different design matrices:
.. list-table::
:header-rows: 1
:widths: 15 60
* - Statistic
- :math:`g(\beta)`
* - AAP
- :math:`\frac{1}{n}\sum_i f(x_i^\top\beta)`
* - APM
- :math:`f(\bar x^\top\beta)`
* - APR
- :math:`\frac{1}{n}\sum_i f(x_i^\top\beta)` with some
:math:`x` fixed
* - AME
- :math:`\frac{1}{n}\sum_i \partial f(x_i^\top\beta)/\partial x_{ij}`
* - MEM
- :math:`\partial f(\bar x^\top\beta)/\partial x_j`
* - MER
- AME with some :math:`x` held at representative values
* - Contrast
- :math:`E[f(\cdot)\mid x_j=\text{level}]
- E[f(\cdot)\mid x_j=\text{ref}]`
Every entry is a linear combination of mean-predictions
:math:`\frac{1}{n}\sum_i f(x_i^\top\beta)` on different design
matrices. The single Jacobian primitive is therefore
.. math::
\frac{\partial}{\partial\beta}\,
\frac{1}{n}\sum_i f(x_i^\top\beta)
\;=\; \frac{1}{n}\sum_i f'(x_i^\top\beta)\,x_i .
Substituting back into the delta method gives a closed form for the
covariance of every statistic in the table.
Marginal effects of a single variable
-------------------------------------
For a continuous covariate :math:`x_{ij}`,
.. math::
\frac{\partial f(x_i^\top\beta)}{\partial x_{ij}}
\;=\; f'(x_i^\top\beta)\,\frac{\partial (x_i^\top\beta)}{\partial x_{ij}} .
If the design matrix is just :math:`X = [\,\dots,\, x_j,\,\dots]`, the
inner derivative is :math:`\beta_j` and the AME collapses to the
familiar :math:`\overline{f'(x^\top\beta)}\,\beta_j`. With a formula
involving ``I(x**2)``, ``x1:x2``, ``bs(x, df=4)``, or a
:class:`~patsy.Categorical` contrast, the inner derivative is whatever
patsy produces when the *data column* is perturbed and the design
matrix is rebuilt — which is exactly what
:class:`~smmargins.Margins` does.
Discrete contrasts (factors, booleans, low-cardinality numerics) are
handled by computing :math:`f` at the level and at the reference, and
differencing — the delta-method covariance follows from stacking the
two predictions and applying the contrast matrix
:math:`C = [-1, +1]`.
Elasticities
------------
The four ``method=`` choices on :meth:`~smmargins.Margins.dydx` apply a
per-observation transform to the raw derivative
:math:`d_i = \partial f(x_i^\top\beta)/\partial x_{ij}` before
averaging:
.. list-table::
:header-rows: 1
:widths: 10 30 60
* - Method
- Per-obs transform
- Interpretation
* - ``dydx``
- :math:`d_i`
- Level change (default).
* - ``dyex``
- :math:`d_i\,x_i`
- Semi-elasticity, :math:`dy/d(\ln x)`.
* - ``eyex``
- :math:`d_i\,x_i / y_i`
- Full elasticity.
* - ``eydx``
- :math:`d_i / y_i`
- Semi-elasticity, :math:`d(\ln y)/dx`.
Elasticities are only defined for continuous variables; the
implementation raises if you ask for an elasticity on a discrete one.
Outer Jacobian: analytic vs. finite differences
-----------------------------------------------
Stata's ``margins`` computes the entire :math:`G` numerically. We
compute it analytically when possible:
- For any GLM, :math:`f'` is read off
``family.link.inverse_deriv``. This covers Logit, Probit, Poisson,
NegBin, Gamma, and Gaussian out of the box.
- For ``OLS`` / ``WLS`` / ``GLS``, :math:`f' \equiv 1` (identity link).
Whenever an analytic :math:`f'` is available **and** the linear
predictor is :math:`\eta = X\beta` (no offset/exposure), the chain rule
gives :math:`\partial g/\partial\beta` in closed form, saving
:math:`p` forward ``predict`` calls per statistic.
The fallback is central differences,
.. math::
\frac{\partial f}{\partial x_j}
\approx \frac{f(x + h e_j) - f(x - h e_j)}{2 h_j},
\qquad
h_j = \epsilon^{1/3}\cdot\max(|x_j|, 1),
with :math:`h_j` chosen to balance truncation error
(:math:`O(h^2)`) against round-off (:math:`O(\epsilon/h)`); the
optimum scales as :math:`h \sim \epsilon^{1/3}`. See
:func:`~smmargins._central_jacobian` and Nocedal & Wright,
*Numerical Optimization* (2006), Chapter 8.
The two paths agree to FD tolerance — there is a parity test that
runs every public API call on three different model classes with
``analytic=True`` and ``analytic=False`` and checks the estimates and
SEs match.
Why the response scale matters for DiD (Ai & Norton 2003)
---------------------------------------------------------
In a logit, the coefficient on a ``group:condition`` interaction is on
the *log-odds* scale. On the *probability* scale, the
difference-in-differences
.. math::
\mathrm{DiD}(x_*)
= \bigl[f(\eta_{1,1}) - f(\eta_{1,0})\bigr]
- \bigl[f(\eta_{0,1}) - f(\eta_{0,0})\bigr]
is a nonlinear function of *every* parameter and *every* covariate
profile :math:`x_*`. You cannot read it off the interaction
coefficient. :meth:`~smmargins.Margins.did` evaluates the four cells
on the response scale, with their joint delta-method covariance, and
forms the DiD as a contrast — Ai & Norton's recommendation in code.
References
----------
- Stata FAQ, *How are the standard errors computed with margins?*
https://www.stata.com/support/faqs/statistics/compute-standard-errors-with-margins/
- Williams, R. (2012). *Using the margins command to estimate and
interpret adjusted predictions and marginal effects.*
Stata Journal, 12(2), 308–331.
https://www3.nd.edu/~rwilliam/stats/Margins01.pdf
- Ai, C., & Norton, E. C. (2003). *Interaction terms in logit and
probit models.* Economics Letters, 80(1), 123–129.
https://doi.org/10.1016/S0165-1765(03)00032-6
- Nocedal, J., & Wright, S. J. (2006). *Numerical Optimization*,
2nd ed., Springer. Chapter 8.