smmargins.Margins¶
- class smmargins.Margins(results, data: DataFrame | None = None, level: float = 0.95, use_t: bool = False, analytic: bool = True)¶
Compute adjusted predictions and marginal effects for a StatsModels fit.
Rather than hand-differentiating each model’s linear predictor / link combination, every statistic is built as a function of \(\beta\) using
model.predict(params, exog)(which handles the inverse link). The Jacobian of that function is taken by central finite differences, and the delta method is applied withresults.cov_params().- Parameters:
results (statsmodels results object) – Must expose
params,cov_params(), and havemodel.predict(params, exog)available (OLS, GLM, GLM-like, Logit, Probit, Poisson, NegBin, GEE, mixed, … all qualify).data (pandas.DataFrame, optional) – Original fitting data. If
None, we trymodel.data.frame.level (float) – Confidence level for intervals (default 0.95).
use_t (bool) – If True and
results.df_residis available, use t-distribution.analytic (bool) – If True (default), use an analytic outer Jacobian \(\partial g/\partial\beta\) whenever the model exposes a link derivative (any GLM via
family.link.inverse_deriv, plusOLS/WLS/GLSvia the identity link). Falls back to central finite differences otherwise. Set to False to force FD everywhere.
Notes
Formula vs. Raw Exog Mode
If the model was fit using a formula (e.g.
smf.ols("y ~ x1 + x2", data)),Marginsuses the model’sDesignInfoto rebuild the design matrix. This ensures that interactions (x1:x2) and transformations (log(x1)) are correctly updated when a single variable is perturbed for marginal effects.If the model was fit using raw matrices (e.g.
sm.OLS(y, X).fit()),Marginsoperates in “raw mode”. In this mode, only the literal column corresponding to the variable is perturbed. Interactions or pre-computed transformations in the design matrix will not be automatically updated. To correctly handle such models, it is recommended to fit using formulas.Delta method
For a statistic \(g(\beta)\) with Jacobian \(G = \partial g / \partial \beta|_{\hat\beta}\), the delta method approximates
\[\widehat{\mathrm{Var}}[g(\hat\beta)] \approx G \, \widehat V \, G^\top .\]Examples
A logit with a categorical and an interaction, which would be awkward to differentiate by hand:
import statsmodels.formula.api as smf from smmargins import Margins fit = smf.logit( "voted ~ age + income + C(educ) + female + age:female", data=df, ).fit() M = Margins(fit)
Adjusted predictions:
M.predict() # AAP M.predict(at="mean") # APM M.predict(atexog={"age": [25, 45, 65]}) # APR
Marginal effects on the response (probability) scale:
M.dydx("age") # AME M.dydx("age", at="mean") # MEM M.dydx("age", atexog={"female": [0, 1]}) # MER, by sex M.dydx("educ", reference="college") # discrete contrasts M.dydx("kids", count=True) # x -> x+1 for integers M.dydx("age", method="eyex") # full elasticity
Most calls return a
MarginsResultwhose__repr__prints a tidy table of estimates, SEs, z- (or t-) statistics, p-values, and confidence intervals.A small runnable smoke test:
>>> import numpy as np, pandas as pd, statsmodels.formula.api as smf >>> from smmargins import Margins >>> rng = np.random.default_rng(0) >>> df = pd.DataFrame({ ... "x": rng.standard_normal(200), ... "g": rng.choice(["A", "B"], 200), ... }) >>> df["y"] = 1.0 + 2.0 * df["x"] + (df["g"] == "B") + rng.standard_normal(200) >>> fit = smf.ols("y ~ x + C(g)", df).fit() >>> M = Margins(fit) >>> aap = M.predict() >>> ame = M.dydx("x") >>> aap.estimate.shape, ame.estimate.shape ((1,), (1,)) >>> bool(abs(ame.estimate[0] - 2.0) < 0.2) # close to truth True
References
- [1] StataCorp. FAQ: How are the standard errors computed with
margins https://www.stata.com/support/faqs/statistics/compute-standard-errors-with-margins/
- [2] Williams, R. (2012). Using the margins command to estimate and
interpret adjusted predictions and marginal effects. Stata Journal, 12(2), 308–331 https://www3.nd.edu/~rwilliam/stats/Margins01.pdf
- [3] Ai, C., & Norton, E. C. (2003). Interaction terms in logit and
probit models. Economics Letters, 80(1), 123–129 https://doi.org/10.1016/S0165-1765(03)00032-6
See also
- __init__(results, data: DataFrame | None = None, level: float = 0.95, use_t: bool = False, analytic: bool = True)¶
Methods
__init__(results[, data, level, use_t, analytic])did(group, condition[, group_levels, ...])Difference-in-differences on the response scale.
dydx(variable[, at, atexog, discrete, ...])Marginal effect of
variableon the response.predict([at, atexog, factor_stat, outcome])Compute adjusted predictions (expected outcome on the response scale).
Attributes
n_outcomesNumber of outcome classes (K) for the fitted model.
outcome_labelsOutcome class labels for multi-outcome models, or None.