smmargins.Margins

class smmargins.Margins(results, data: DataFrame | None = None, level: float = 0.95, use_t: bool = False, analytic: bool = True)

Compute adjusted predictions and marginal effects for a StatsModels fit.

Rather than hand-differentiating each model’s linear predictor / link combination, every statistic is built as a function of \(\beta\) using model.predict(params, exog) (which handles the inverse link). The Jacobian of that function is taken by central finite differences, and the delta method is applied with results.cov_params().

Parameters:
  • results (statsmodels results object) – Must expose params, cov_params(), and have model.predict(params, exog) available (OLS, GLM, GLM-like, Logit, Probit, Poisson, NegBin, GEE, mixed, … all qualify).

  • data (pandas.DataFrame, optional) – Original fitting data. If None, we try model.data.frame.

  • level (float) – Confidence level for intervals (default 0.95).

  • use_t (bool) – If True and results.df_resid is available, use t-distribution.

  • analytic (bool) – If True (default), use an analytic outer Jacobian \(\partial g/\partial\beta\) whenever the model exposes a link derivative (any GLM via family.link.inverse_deriv, plus OLS/WLS/GLS via the identity link). Falls back to central finite differences otherwise. Set to False to force FD everywhere.

Notes

Formula vs. Raw Exog Mode

If the model was fit using a formula (e.g. smf.ols("y ~ x1 + x2", data)), Margins uses the model’s DesignInfo to rebuild the design matrix. This ensures that interactions (x1:x2) and transformations (log(x1)) are correctly updated when a single variable is perturbed for marginal effects.

If the model was fit using raw matrices (e.g. sm.OLS(y, X).fit()), Margins operates in “raw mode”. In this mode, only the literal column corresponding to the variable is perturbed. Interactions or pre-computed transformations in the design matrix will not be automatically updated. To correctly handle such models, it is recommended to fit using formulas.

Delta method

For a statistic \(g(\beta)\) with Jacobian \(G = \partial g / \partial \beta|_{\hat\beta}\), the delta method approximates

\[\widehat{\mathrm{Var}}[g(\hat\beta)] \approx G \, \widehat V \, G^\top .\]

Examples

A logit with a categorical and an interaction, which would be awkward to differentiate by hand:

import statsmodels.formula.api as smf
from smmargins import Margins

fit = smf.logit(
    "voted ~ age + income + C(educ) + female + age:female",
    data=df,
).fit()
M = Margins(fit)

Adjusted predictions:

M.predict()                                # AAP
M.predict(at="mean")                       # APM
M.predict(atexog={"age": [25, 45, 65]})    # APR

Marginal effects on the response (probability) scale:

M.dydx("age")                              # AME
M.dydx("age", at="mean")                   # MEM
M.dydx("age", atexog={"female": [0, 1]})   # MER, by sex
M.dydx("educ", reference="college")        # discrete contrasts
M.dydx("kids", count=True)                 # x -> x+1 for integers
M.dydx("age", method="eyex")               # full elasticity

Most calls return a MarginsResult whose __repr__ prints a tidy table of estimates, SEs, z- (or t-) statistics, p-values, and confidence intervals.

A small runnable smoke test:

>>> import numpy as np, pandas as pd, statsmodels.formula.api as smf
>>> from smmargins import Margins
>>> rng = np.random.default_rng(0)
>>> df = pd.DataFrame({
...     "x": rng.standard_normal(200),
...     "g": rng.choice(["A", "B"], 200),
... })
>>> df["y"] = 1.0 + 2.0 * df["x"] + (df["g"] == "B") + rng.standard_normal(200)
>>> fit = smf.ols("y ~ x + C(g)", df).fit()
>>> M = Margins(fit)
>>> aap = M.predict()
>>> ame = M.dydx("x")
>>> aap.estimate.shape, ame.estimate.shape
((1,), (1,))
>>> bool(abs(ame.estimate[0] - 2.0) < 0.2)        # close to truth
True

References

[1] StataCorp. FAQ: How are the standard errors computed with

margins https://www.stata.com/support/faqs/statistics/compute-standard-errors-with-margins/

[2] Williams, R. (2012). Using the margins command to estimate and

interpret adjusted predictions and marginal effects. Stata Journal, 12(2), 308–331 https://www3.nd.edu/~rwilliam/stats/Margins01.pdf

[3] Ai, C., & Norton, E. C. (2003). Interaction terms in logit and

probit models. Economics Letters, 80(1), 123–129 https://doi.org/10.1016/S0165-1765(03)00032-6

See also

MarginsResult

__init__(results, data: DataFrame | None = None, level: float = 0.95, use_t: bool = False, analytic: bool = True)

Methods

__init__(results[, data, level, use_t, analytic])

did(group, condition[, group_levels, ...])

Difference-in-differences on the response scale.

dydx(variable[, at, atexog, discrete, ...])

Marginal effect of variable on the response.

predict([at, atexog, factor_stat, outcome])

Compute adjusted predictions (expected outcome on the response scale).

Attributes

n_outcomes

Number of outcome classes (K) for the fitted model.

outcome_labels

Outcome class labels for multi-outcome models, or None.