Introduction¶
smmargins is a small module that fills in the marginal-effects gaps
in StatsModels: adjusted predictions
and marginal effects at user-specified covariate profiles, with
delta-method standard errors, for any fitted model that exposes
params, cov_params(), and a predict(params, exog) method.
The design target is Stata’s margins command:
the same statistics, the same parameter names where they translate, and
the same answers to the precision both tools agree on.
Why another margins module?¶
StatsModels ships Results.get_margeff, but it is limited:
only marginal effects, not adjusted predictions;
atexogis keyed by column index, not variable name;no
at(...)profiles, no representative-value contrasts;no joint covariance across statistics, so you cannot form contrasts like a difference-in-differences without re-deriving the delta method by hand;
no support for difference-in-differences on the response scale (the Ai & Norton 2003 issue).
smmargins provides:
predict()— adjusted predictions (AAP / APM / APR), withat=and name-keyedatexog=;dydx()— marginal effects (AME / MEM / MER), continuous and discrete, including elasticities (eyex/dyex/eydx);did()— 2x2 difference-in-differences on the response scale, with the joint covariance baked in;contrast()— exact linear combinations of any result, reusing the joint covariance;Multi-outcome support for
MNLogitandOrderedModel.
Multi-outcome models¶
smmargins supports statsmodels.MNLogit (multinomial logit) and
statsmodels.miscmodels.ordinal_model.OrderedModel (ordered logit/probit).
For these models every statistic returns one value per outcome class — K
values in place of the usual scalar — with full joint covariance across
both rows and classes.
ame = M.dydx("x1") # AME of x1 on each class probability; K rows
ame.summary() # long-format DataFrame with `outcome` column
# Subset to specific outcomes
M.predict(outcome=1) # only class 1
M.predict(outcome="versicolor") # by label, if labeled
Difference-in-differences¶
Two small additions turn the module into a full DiD estimator:
contrast()forms any linear combination of the estimates directly on the already-computed joint covariance.did()sets up the 2x2 grid and returns aDiDResultbundling the four cell predictions, the two simple effects, and the DiD — all sharing the same joint covariance.
For multi-outcome models, did() returns a DiDResult where
every field carries the K-outcome axis. The DiD contains K estimates
whose sum is exactly zero.
Installation¶
pip install smmargins
Requires Python ≥3.9. Dependencies (numpy, pandas, statsmodels,
scipy, patsy) are installed automatically.
Quickstart¶
import statsmodels.formula.api as smf
from smmargins import Margins
fit = smf.logit(
"voted ~ age + income + C(educ) + female + age:female",
data=df,
).fit()
M = Margins(fit)
# Adjusted predictions
M.predict() # AAP
M.predict(at="mean") # APM (margins, atmeans)
M.predict(atexog={"age": [25, 45, 65]}) # APR
# Marginal effects on the response (probability) scale
M.dydx("age") # AME
M.dydx("age", at="mean") # MEM
M.dydx("age", atexog={"female": [0, 1]}) # MER, by sex
M.dydx("educ", reference="college") # discrete contrasts
# Difference-in-differences on the response scale
res = M.did("group", "preexist_Y",
group_levels=["A", "B"], condition_levels=[0, 1])
print(res) # cells, simple effects, DiD
Each call returns a MarginsResult with .estimate,
.se, .vcov, .ci_lower, .ci_upper, .pvalue, plus
.summary() returning a pandas.DataFrame. Pass use_t=True
to the Margins constructor for t-distribution
inference (uses results.df_resid).
Why patsy¶
When the formula is y ~ x1 + I(x1**2) + x1:x2 + C(group) and we
want the marginal effect of x1, we cannot just nudge one column of
the design matrix — x1 enters three columns. What we can nudge is
the x1 column of the original data frame, then ask patsy to
rebuild the design matrix using the stored DesignInfo:
patsy.dmatrix(design_info, perturbed_frame, return_type="matrix")
That preserves polynomial terms, interactions, splines (bs(x, df=4)),
and categorical contrasts automatically. It is also the right
abstraction for “hold age=45” or “set group='b'” — you mutate
the data frame, not the design matrix.
Formula vs. raw exog mode¶
Margins supports models fit without formulas
(sm.OLS(y, X).fit()). In this raw mode, variable names are taken
from model.exog_names.
Warning
In raw mode, Margins cannot know about relationships between
columns of the design matrix. If you manually included an
interaction column (e.g. X["x1_x2"] = X["x1"] * X["x2"]),
perturbing x1 for a marginal effect will not automatically
update x1_x2, and the marginal effect will be wrong.
If your model has interactions or transformations, fit it with a
formula so Margins can rebuild the design matrix correctly.
Where to next¶
Mathematical motivation — delta method, statistic schema, analytic vs FD Jacobian.
Demos — full Williams-style and DiD walkthroughs.
API reference — reference documentation for every public class and method.