Demos¶
Two end-to-end walkthroughs ship in the repository root. Each one is a
single self-contained script you can run with
python demo_<name>.py after installing smmargins.
Williams-style logit walkthrough¶
demo_margins.py reproduces, on a simulated voting dataset, every
core statistic in Richard Williams’ Margins01 notes and then
exercises the 0.3 inference surface end-to-end:
Adjusted predictions at specific values (APR /
margins, at(...))APM vs AAP (
margins, atmeansvsmargins)MER vs MEM vs AME for a continuous covariate
Discrete contrasts for a multi-level categorical variable
Discrete change for a 0/1 dummy
Williams’ classic interaction example: AME of
ageby sexPredicted probability over age, by sex (table for plotting)
Analytic vs FD parity check
Robust covariance via
cov_type="HC3"Krinsky–Robb simulation VCE
Pairs bootstrap VCE
Simultaneous CIs via sup-t
Cluster-robust SEs (
cov_type="cluster"withcov_kwds=)Multiple-comparison adjustments side-by-side (Bonferroni / Šidák / sup-t)
User-supplied parameter covariance (
vcov=)
Highlights from the script¶
Fit a logit with an interaction:
fit = smf.logit(
"voted ~ age + income + C(educ) + female + age:female",
data=df,
).fit()
M = Margins(fit)
APR — predictions at policy-relevant ages, averaging everything else over the sample:
M.predict(atexog={"age": [25, 45, 65]})
MER, MEM, and AME for age — these can differ meaningfully in
nonlinear models with interactions:
M.dydx("age", atexog={"age": [25, 45, 65]}) # MER
M.dydx("age", at="mean") # MEM
M.dydx("age") # AME
Discrete AME for a multi-level factor with an explicit reference level:
M.dydx("educ", reference="college")
Williams’ interaction lesson — same model, AME of age for each
sex:
M.dydx("age", atexog={"female": [0, 1]})
Robust SEs and alternative VCEs (sections 9–11):
Margins(fit, cov_type="HC3").dydx("age") # heteroskedastic-robust
M.dydx("age", vce="simulation",
n_sims=2000, sim_seed=42) # Krinsky–Robb
M.dydx("age", vce="bootstrap",
n_boot=500, boot_seed=42) # pairs bootstrap
Cluster-robust SEs through cov_type="cluster" with cluster IDs
passed in cov_kwds (section 13):
Margins(fit, cov_type="cluster",
cov_kwds={"groups": df["household"]}).dydx("age")
Family-wise CI methods side-by-side at five ages (section 14) — for a correlated family of predictions, sup-t is typically narrower than Bonferroni / Šidák:
common = dict(atexog={"age": [25, 35, 45, 55, 65]},
vce="simulation", n_sims=4000, sim_seed=123)
M.predict(**common, ci_method="pointwise")
M.predict(**common, ci_method="bonferroni")
M.predict(**common, ci_method="sidak")
M.predict(**common, ci_method="sup-t")
User-supplied parameter covariance (section 15) — drop in any
\((k, k)\) matrix and smmargins sandwiches it through the
Jacobian:
Margins(fit, vcov=my_vcov_matrix).dydx("age")
Full source¶
1"""
2demo_margins.py
3===============
4
5Walkthrough of the core analyses in Richard Williams' *Margins01* notes
6(https://academicweb.nd.edu/~rwilliam/stats/Margins01.pdf), implemented
7on top of StatsModels + patsy + the ``smmargins`` package.
8
9Sections
10--------
11 1. Adjusted predictions at specific values (APR / ``margins, at(...)``)
12 2. APM vs AAP (``margins, atmeans`` vs ``margins``)
13 3. MER vs MEM vs AME for a continuous covariate
14 4. Discrete contrast for a categorical variable
15 5. Discrete change for a 0/1 dummy
16 6. AME by interaction subgroup (Williams' motivating example)
17 7. Predicted probability over age, by sex (table for plotting)
18 8. Analytic vs FD parity check
19 9. Robust covariance (``cov_type="HC3"``)
20 10. Krinsky–Robb simulation VCE
21 11. Pairs bootstrap VCE
22 12. Simultaneous CIs via sup-t
23 13. Cluster-robust SEs (``cov_type="cluster"``)
24 14. Multiple-comparison adjustments side-by-side (Bonferroni / Sidak / sup-t)
25 15. User-supplied parameter covariance (``vcov=``)
26"""
27
28import numpy as np
29import pandas as pd
30import statsmodels.formula.api as smf
31
32from smmargins import Margins
33
34pd.options.display.width = 120
35pd.options.display.float_format = "{: .4f}".format
36
37# ---------------------------------------------------------------------------
38# Simulate a binary-outcome dataset with structure similar to Williams' notes
39# ---------------------------------------------------------------------------
40rng = np.random.default_rng(7)
41N = 5_000
42df = pd.DataFrame(
43 {
44 "age": rng.normal(45, 12, N).clip(18, 90),
45 "income": rng.lognormal(10.5, 0.4, N), # ~36k median
46 "educ": rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
47 "female": rng.integers(0, 2, N),
48 }
49)
50eta = (
51 -4.0
52 + 0.05 * df["age"]
53 + 0.00001 * df["income"]
54 + 0.8 * (df["educ"] == "college")
55 + 1.4 * (df["educ"] == "grad")
56 + 0.3 * df["female"]
57 - 0.0004 * df["age"] * (df["female"]) # interaction
58)
59df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)
60
61print("Sample:")
62print(df.head(3), "\n")
63
64# ---------------------------------------------------------------------------
65# Fit a logit with an interaction, like the Williams example
66# ---------------------------------------------------------------------------
67fit = smf.logit(
68 "voted ~ age + income + C(educ) + female + age:female",
69 data=df,
70).fit(disp=False)
71print("=" * 80)
72print("Fitted logit")
73print("=" * 80)
74print(fit.summary().tables[1])
75print()
76
77# `analytic=True` is the default: the outer ∂g/∂β goes through
78# `family.link.inverse_deriv` for any GLM (Logit/Probit/Poisson/...) and
79# the identity link for OLS/WLS/GLS, falling back to central finite
80# differences only when the link derivative isn't available. Set
81# `analytic=False` to force FD; you'll get the same answers (see the
82# parity check at the bottom of this file) but pay p extra forward
83# predict() calls per statistic.
84M = Margins(fit)
85
86# ---------------------------------------------------------------------------
87# 1. Adjusted predictions at representative values (APR)
88# Stata: margins, at(age=(25 45 65))
89# ---------------------------------------------------------------------------
90print("=" * 80)
91print("1. APR (predict at age=25,45,65; everything else at sample values)")
92print("=" * 80)
93print(M.predict(atexog={"age": [25, 45, 65]}))
94print()
95
96# ---------------------------------------------------------------------------
97# 2. Adjusted prediction at means (APM) vs average adjusted prediction (AAP)
98# ---------------------------------------------------------------------------
99print("=" * 80)
100print("2. APM (margins, atmeans) vs AAP (margins)")
101print("=" * 80)
102print("APM:"); print(M.predict(at="mean"))
103print("\nAAP:"); print(M.predict())
104print()
105
106# ---------------------------------------------------------------------------
107# 3. Marginal effect: MER vs MEM vs AME for `age`
108# (Williams points out these three can differ meaningfully in nonlinear
109# models with interactions)
110# ---------------------------------------------------------------------------
111print("=" * 80)
112print("3. d Pr(voted)/d age : MER (at age=25,45,65), MEM, and AME")
113print("=" * 80)
114print("MER (at age=25,45,65):")
115print(M.dydx("age", atexog={"age": [25, 45, 65]}))
116print("\nMEM (at means of everything):")
117print(M.dydx("age", at="mean"))
118print("\nAME (averaged over the sample):")
119print(M.dydx("age"))
120print()
121
122# ---------------------------------------------------------------------------
123# 4. Discrete contrast for the categorical variable `educ`
124# ---------------------------------------------------------------------------
125print("=" * 80)
126print("4. Discrete AME for educ (each level vs 'college' as reference)")
127print("=" * 80)
128print(M.dydx("educ", reference="college"))
129print()
130
131# ---------------------------------------------------------------------------
132# 5. Discrete change for the dummy `female` (auto-detected as discrete)
133# ---------------------------------------------------------------------------
134print("=" * 80)
135print("5. AME for female (0/1 dummy): Pr(voted|female=1) - Pr(voted|female=0)")
136print("=" * 80)
137print(M.dydx("female"))
138print()
139
140# ---------------------------------------------------------------------------
141# 6. Interaction-sensitivity: marginal effect of age, separately for men/women
142# This is Williams' classic motivating example: the interaction coefficient
143# alone tells you little about what the marginal effect actually is for any
144# given subpopulation.
145# ---------------------------------------------------------------------------
146print("=" * 80)
147print("6. AME of age, separately by sex (Williams' interaction illustration)")
148print("=" * 80)
149print(M.dydx("age", atexog={"female": [0, 1]}))
150print()
151
152# ---------------------------------------------------------------------------
153# 7. Adjusted predictions, age by sex — table suitable for plotting
154# ---------------------------------------------------------------------------
155print("=" * 80)
156print("7. Predicted Pr(voted) over age, for each sex")
157print("=" * 80)
158tbl = M.predict(atexog={"age": list(range(20, 91, 10)), "female": [0, 1]})
159print(tbl)
160
161# ---------------------------------------------------------------------------
162# 8. Analytic vs FD: same answers, faster path
163# Logit exposes `family.link.inverse_deriv`, so the analytic outer
164# Jacobian is used by default. Toggling `analytic=False` reroutes
165# every statistic through central finite differences — useful as a
166# sanity check or when working with a custom Link subclass that
167# doesn't implement inverse_deriv.
168# ---------------------------------------------------------------------------
169print()
170print("=" * 80)
171print("8. Analytic vs FD — same numbers, taken via different paths")
172print("=" * 80)
173M_fd = Margins(fit, analytic=False)
174ame_an = M.dydx("age")
175ame_fd = M_fd.dydx("age")
176print(f"AME(age) analytic : est={ame_an.estimate[0]: .8f} se={ame_an.se[0]: .8f}")
177print(f"AME(age) FD : est={ame_fd.estimate[0]: .8f} se={ame_fd.se[0]: .8f}")
178print(f"max abs diff : "
179 f"est {abs(ame_an.estimate[0] - ame_fd.estimate[0]): .2e}, "
180 f"se {abs(ame_an.se[0] - ame_fd.se[0]): .2e}")
181
182# ---------------------------------------------------------------------------
183# 9. Robust covariance (Feature 1)
184# Recompute SEs with HC3 heteroskedasticity-consistent covariance.
185# ---------------------------------------------------------------------------
186print()
187print("=" * 80)
188print("9. Robust covariance — HC3")
189print("=" * 80)
190M_hc3 = Margins(fit, cov_type="HC3")
191print(M_hc3.dydx("age"))
192
193# ---------------------------------------------------------------------------
194# 10. Krinsky–Robb simulation VCE (Feature 2)
195# Draw parameters from their sampling distribution and evaluate margins.
196# ---------------------------------------------------------------------------
197print()
198print("=" * 80)
199print("10. Krinsky–Robb simulation VCE")
200print("=" * 80)
201print(M.dydx("age", vce="simulation", n_sims=2000, sim_seed=42))
202
203# ---------------------------------------------------------------------------
204# 11. Bootstrap VCE (Feature 3)
205# Pairs bootstrap with 500 replications.
206# ---------------------------------------------------------------------------
207print()
208print("=" * 80)
209print("11. Bootstrap VCE")
210print("=" * 80)
211print(M.dydx("age", vce="bootstrap", n_boot=500, boot_seed=42))
212
213# ---------------------------------------------------------------------------
214# 12. Simultaneous CIs — sup-t (Feature 4)
215# Use simulation draws to compute simultaneous CIs for a family of margins.
216# ---------------------------------------------------------------------------
217print()
218print("=" * 80)
219print("12. Simultaneous CIs (sup-t)")
220print("=" * 80)
221print(M.predict(atexog={"age": [25, 45, 65]},
222 vce="simulation", n_sims=2000, sim_seed=42,
223 ci_method="sup-t"))
224
225# ---------------------------------------------------------------------------
226# 13. Cluster-robust SEs
227# Synthesize a clustering structure (e.g., households of ~10 voters who
228# share unobserved local effects). Cluster-robust SEs propagate that
229# correlation through the Jacobian to the AME.
230# ---------------------------------------------------------------------------
231print()
232print("=" * 80)
233print("13. Cluster-robust SEs vs nonrobust (synthetic household clusters)")
234print("=" * 80)
235df_c = df.copy()
236df_c["household"] = rng.integers(0, N // 10, N) # ~10 obs per cluster
237fit_c = smf.logit(
238 "voted ~ age + income + C(educ) + female + age:female",
239 data=df_c,
240).fit(disp=False)
241M_nonrobust = Margins(fit_c)
242M_cluster = Margins(fit_c, cov_type="cluster",
243 cov_kwds={"groups": df_c["household"]})
244ame_nr = M_nonrobust.dydx("age").se[0]
245ame_cl = M_cluster.dydx("age").se[0]
246print(f"AME(age) SE — nonrobust : {ame_nr: .6f}")
247print(f"AME(age) SE — cluster : {ame_cl: .6f} (ratio {ame_cl / ame_nr:.2f}x)")
248
249# ---------------------------------------------------------------------------
250# 14. Multiple-comparison adjustments
251# A family of 5 marginal effects at different ages. Pointwise CIs
252# under-cover the joint event "all 5 contain the truth"; Bonferroni
253# and Sidak inflate the critical value uniformly; sup-t uses the
254# simulation draws to exploit correlation across the family.
255# ---------------------------------------------------------------------------
256print()
257print("=" * 80)
258print("14. Family-wise CI methods at age=25,35,45,55,65")
259print("=" * 80)
260ages = [25, 35, 45, 55, 65]
261common = dict(atexog={"age": ages}, vce="simulation",
262 n_sims=4000, sim_seed=123)
263pw = M.predict(**common, ci_method="pointwise")
264bonf = M.predict(**common, ci_method="bonferroni")
265sidk = M.predict(**common, ci_method="sidak")
266supt = M.predict(**common, ci_method="sup-t")
267
268widths = pd.DataFrame({
269 "age": ages,
270 "pointwise": pw.ci_upper - pw.ci_lower,
271 "bonferroni": bonf.ci_upper - bonf.ci_lower,
272 "sidak": sidk.ci_upper - sidk.ci_lower,
273 "sup-t": supt.ci_upper - supt.ci_lower,
274}).set_index("age")
275print("CI widths:")
276print(widths)
277print("\nBonferroni >= Sidak (always); for correlated margins sup-t is "
278 "typically narrower than both.")
279
280# ---------------------------------------------------------------------------
281# 15. User-supplied vcov
282# Drop in any (k, k) covariance matrix you trust — e.g. a sandwich
283# computed offline, a Bayesian posterior covariance, or the output
284# of a custom resampling scheme — and smmargins will sandwich it
285# through the Jacobian without recomputing anything else.
286# ---------------------------------------------------------------------------
287print()
288print("=" * 80)
289print("15. User-supplied parameter covariance (vcov=)")
290print("=" * 80)
291V_default = fit.cov_params().to_numpy()
292V_inflated = V_default * 1.5 # toy example: assume 50% wider sampling cov
293M_v = Margins(fit, vcov=V_inflated)
294ame_default = M.dydx("age").se[0]
295ame_user = M_v.dydx("age").se[0]
296print(f"AME(age) SE — default cov_params() : {ame_default: .6f}")
297print(f"AME(age) SE — vcov = 1.5 x default : {ame_user: .6f} "
298 f"(ratio {ame_user / ame_default:.3f}, expect ≈ sqrt(1.5)={np.sqrt(1.5):.3f})")
Healthcare-style 2x2 difference-in-differences¶
demo_did.py answers a clinical question:
Is there a rate difference of condition \(X\) between groups A and B, with or without a preexisting condition \(Y\)?
The script fits a logit on simulated patient data and reports, on the probability scale:
4 cell predictions \(P(X \mid \text{group}, Y)\)
2 simple effects \(P(X \mid B, Y) - P(X \mid A, Y)\) at each \(Y\)
1 difference-in-differences (whether the A-vs-B gap depends on \(Y\))
All with delta-method standard errors and confidence intervals. The
DiD here is not the coefficient on the group:Y interaction —
that coefficient is on the log-odds scale, while the clinical question
is about probabilities. This is Ai & Norton (2003) in practice; see
Mathematical motivation for the derivation.
Highlights from the script¶
Fit and call did():
fit = smf.logit(
"condition_X ~ C(group) + preexist_Y + C(group):preexist_Y "
"+ age + female",
data=df,
).fit()
M = Margins(fit)
did = M.did("group", "preexist_Y",
group_levels=["A", "B"],
condition_levels=[0, 1])
print(did) # cells + simple effects + DiD
Same DiD at one specific patient profile (60-year-old male):
M.did("group", "preexist_Y",
group_levels=["A", "B"], condition_levels=[0, 1],
atexog={"age": 60, "female": 0})
Plot-ready cell table:
tbl = did.cells.summary() # estimate / SE / CI per cell
Full source¶
1"""
2demo_did.py
3===========
4
5Difference-in-differences example directly matching the question:
6
7 "Is there a rate difference of condition X between group A and B,
8 with or without preexisting condition Y?"
9
10We fit a logit for P(X=1) on group (A/B), preexisting Y (0/1), their
11interaction, and control covariates. Then we use Margins.did() to get,
12on the *probability* scale:
13
14 * 4 cell predictions P(X | group, Y)
15 * 2 simple effects P(X|B,Y) - P(X|A,Y) at each Y
16 * 1 DiD (simple effect at Y=1) - (simple effect at Y=0)
17
18All with delta-method standard errors and CIs.
19
20The DiD here is the "does the A-vs-B gap depend on Y?" question. It is
21NOT the coefficient on group×Y (that's on the log-odds scale); on the
22probability scale you have to go through the inverse link — which is
23exactly what Margins.did() does.
24"""
25import numpy as np
26import pandas as pd
27import statsmodels.formula.api as smf
28
29from smmargins import Margins
30
31pd.options.display.width = 140
32pd.options.display.float_format = "{: .4f}".format
33
34# ---------------------------------------------------------------------------
35# Simulate patient-level data
36# ---------------------------------------------------------------------------
37rng = np.random.default_rng(42)
38N = 6_000
39df = pd.DataFrame({
40 "group": rng.choice(["A", "B"], N, p=[0.55, 0.45]),
41 "preexist_Y": rng.integers(0, 2, N), # 0 = no Y, 1 = has Y
42 "age": rng.normal(55, 15, N).clip(18, 95),
43 "female": rng.integers(0, 2, N),
44})
45
46# True data-generating process:
47# * baseline rate of X depends on age, sex, and Y
48# * group B has a modest additive bump in the log-odds
49# * the group effect is AMPLIFIED among patients with preexisting Y
50# (this is the thing we want to detect)
51eta = (
52 -3.5
53 + 0.04 * df["age"]
54 - 0.3 * df["female"]
55 + 0.5 * (df["group"] == "B")
56 + 1.1 * df["preexist_Y"]
57 + 0.8 * (df["group"] == "B") * df["preexist_Y"] # interaction
58)
59df["condition_X"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)
60
61print("Raw sample rates of condition X by cell:")
62print(df.groupby(["group", "preexist_Y"])["condition_X"].mean().round(4))
63print()
64
65# ---------------------------------------------------------------------------
66# Fit the logit with the group × preexist_Y interaction + controls
67# ---------------------------------------------------------------------------
68fit = smf.logit(
69 "condition_X ~ C(group) + preexist_Y + C(group):preexist_Y + age + female",
70 data=df,
71).fit(disp=False)
72
73print("=" * 84)
74print("Logit model (coefficients are on the log-odds scale)")
75print("=" * 84)
76print(fit.summary().tables[1])
77print()
78
79# ---------------------------------------------------------------------------
80# DiD on the *probability* (response) scale — what the clinical question asks
81# ---------------------------------------------------------------------------
82# Margins(fit) uses analytic outer Jacobians via family.link.inverse_deriv
83# when available (Logit qualifies), falling back to central finite
84# differences otherwise. did() reuses predict()'s machinery, so it
85# inherits the analytic path automatically. Set Margins(fit,
86# analytic=False) to force FD if you ever want to cross-check.
87M = Margins(fit)
88did = M.did("group", "preexist_Y",
89 group_levels=["A", "B"],
90 condition_levels=[0, 1])
91
92print("=" * 84)
93print("DiD on the probability scale, averaged over age and sex distribution")
94print("=" * 84)
95print(did)
96
97# ---------------------------------------------------------------------------
98# Interpretation
99# ---------------------------------------------------------------------------
100pA0 = did.cells.estimate[0] # group=A, Y=0
101pA1 = did.cells.estimate[1] # group=A, Y=1
102pB0 = did.cells.estimate[2] # group=B, Y=0
103pB1 = did.cells.estimate[3] # group=B, Y=1
104se_simple_Y0 = did.simple_effects.se[0]
105se_simple_Y1 = did.simple_effects.se[1]
106did_est, did_se = did.did.estimate[0], did.did.se[0]
107
108print()
109print("=" * 84)
110print("Plain-language summary")
111print("=" * 84)
112print(f"Condition X rate, group A, no preexisting Y : {pA0:.3%}")
113print(f"Condition X rate, group A, with Y : {pA1:.3%}")
114print(f"Condition X rate, group B, no preexisting Y : {pB0:.3%}")
115print(f"Condition X rate, group B, with Y : {pB1:.3%}")
116print()
117print(f"Rate difference (B - A) among NO-Y patients : "
118 f"{(pB0 - pA0):+.3%} (SE {se_simple_Y0:.3%})")
119print(f"Rate difference (B - A) among WITH-Y patients: "
120 f"{(pB1 - pA1):+.3%} (SE {se_simple_Y1:.3%})")
121print()
122print(f"Difference-in-differences : "
123 f"{did_est:+.3%} (SE {did_se:.3%})")
124print(f" -> the B-vs-A gap is {abs(did_est):.3%} larger among patients "
125 f"with preexisting Y.")
126print(f" -> 95% CI: ({did.did.ci_lower[0]:+.3%}, {did.did.ci_upper[0]:+.3%})")
127print(f" -> p-value: {did.did.pvalues[0]:.4g}")
128
129# ---------------------------------------------------------------------------
130# Sensitivity: DiD at a specific patient profile (e.g. 60-year-old male)
131# ---------------------------------------------------------------------------
132print()
133print("=" * 84)
134print("DiD at a specific profile: 60-year-old male")
135print("=" * 84)
136did_profile = M.did(
137 "group", "preexist_Y",
138 group_levels=["A", "B"], condition_levels=[0, 1],
139 atexog={"age": 60, "female": 0},
140)
141print(did_profile.did)
142
143# ---------------------------------------------------------------------------
144# Bonus: plottable table of cell predictions with CIs
145# ---------------------------------------------------------------------------
146print()
147print("=" * 84)
148print("Cells with 95% CIs (suitable for a plot)")
149print("=" * 84)
150tbl = did.cells.summary().copy()
151print(tbl)
152
153# If you wanted to plot:
154# import matplotlib.pyplot as plt
155# fig, ax = plt.subplots()
156# for g in ["A", "B"]:
157# sub = tbl[tbl.index.str.contains(f"group={g}")]
158# ax.errorbar([0, 1],
159# sub["prediction"].values,
160# yerr=(sub["prediction"] - sub["[95% CI lo]"]).values,
161# marker="o", label=f"group {g}", capsize=4)
162# ax.set_xticks([0, 1]); ax.set_xticklabels(["no Y", "with Y"])
163# ax.set_ylabel("P(condition X)"); ax.legend(); plt.show()