Tutorial 4: Inference and Standard Errors¶

The marginal effects we computed in previous tutorials came with standard errors and confidence intervals. See Mathematical motivation for details on how standard errors are computed. This tutorial shows alternative approaches: robust standard errors, the Krinsky-Robb simulation method, bootstrap resampling, and adjustments for multiple comparisons.

What you will learn¶

How to use robust (HC3) standard errors
How to use Krinsky-Robb simulation for inference
How to use bootstrap resampling for inference
How to adjust confidence intervals for multiple comparisons (Bonferroni, Sidak, sup-t)

Setup¶

We continue with the same fitted model and Margins object:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins

rng = np.random.default_rng(7)
N = 5_000
df = pd.DataFrame({
    "age":    rng.normal(45, 12, N).clip(18, 90),
    "income": rng.lognormal(10.5, 0.4, N),
    "educ":   rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
    "female": rng.integers(0, 2, N),
})
eta = (-4.0 + 0.05 * df["age"] + 0.00001 * df["income"]
       + 0.8 * (df["educ"] == "college") + 1.4 * (df["educ"] == "grad")
       + 0.3 * df["female"] - 0.0004 * df["age"] * df["female"])
df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)

fit = smf.logit("voted ~ age + income + C(educ) + female + age:female", data=df).fit(disp=False)
M = Margins(fit)

Robust standard errors (HC3)¶

By default, smmargins uses the variance-covariance matrix from the fitted model. If you want robust standard errors, you can pass cov_type="HC3" when creating the Margins object. HC3 is a heteroskedasticity-consistent estimator that is widely recommended for nonlinear models.

M_robust = Margins(fit, cov_type="HC3")
M_robust.dydx("age").summary()

	dy/dx	std err	z	P>\|z\|	[95% Conf.	Interval]
dage	0.010118	0.000499	20.275469	2.117617e-91	0.00914	0.011096

Compare this to the default standard error from Tutorial 3 (0.0005). The HC3 standard error (0.0006) is slightly larger, reflecting additional uncertainty from heteroskedasticity.

You can also pass cov_type="HC0", cov_type="HC1", or cov_type="HC2" for alternative robust estimators.

Krinsky-Robb simulation¶

The Krinsky-Robb (KR) method draws random parameter vectors from the estimated sampling distribution, recomputes the quantity of interest for each draw, and uses the distribution of those recomputed values as the basis for inference. This does not rely on the delta method.

To use KR simulation, pass vce="simulation" along with the number of simulations and an optional seed:

M.dydx("age", vce="simulation", n_sims=2000, sim_seed=42).summary()

	dy/dx	std err	z	P>\|z\|	[95% Conf.	Interval]
dage	0.010118	0.000497	20.374085	2.839719e-92	0.009108	0.011088

The n_sims parameter controls the number of simulated parameter draws. More simulations yield more stable standard errors but take longer to compute. The sim_seed ensures reproducibility.

You can use KR simulation with any marginal effect or prediction:

M.dydx("female", vce="simulation", n_sims=2000, sim_seed=42).summary()

	contrast	std err	z	P>\|z\|	[95% Conf.	Interval]
female: 1 vs 0	0.062486	0.012995	4.808567	0.000002	0.036856	0.088527

Bootstrap resampling¶

Bootstrap inference resamples the data with replacement, re-fits the model, and recomputes the quantity of interest for each resample. The variation across bootstrap replicates provides the standard error.

To use bootstrap, pass vce="bootstrap":

M.dydx("age", vce="bootstrap", n_boot=500, boot_seed=42).summary()

	dy/dx	std err	z	P>\|z\|	[95% Conf.	Interval]
dage	0.010118	0.000505	20.023489	3.437809e-89	0.00919	0.011156

Here n_boot=500 specifies 500 bootstrap replicates. The boot_seed ensures reproducibility. Bootstrap is more computationally intensive than KR simulation because it requires re-fitting the model for each resample, but it makes fewer assumptions about the sampling distribution.

Multiple comparison adjustments¶

When you compute predictions at multiple points, you are making multiple simultaneous inferences. By default, confidence intervals are pointwise (each interval has its own coverage probability). You may want simultaneous coverage: the probability that all intervals contain their true values.

smmargins offers three adjustment methods. Let us first get predictions at several ages:

pred_grid = M.predict(atexog={"age": [25, 45, 65]})
pred_grid.summary()

	prediction	std err	z	P>\|z\|	[95% Conf.	Interval]
age=25	0.176056	0.009420	18.690339	5.933527e-78	0.157594	0.194518
age=45	0.354894	0.006769	52.430979	0.000000e+00	0.341627	0.368160
age=65	0.583282	0.013865	42.069884	0.000000e+00	0.556107	0.610456

Bonferroni adjustment¶

The Bonferroni method multiplies each p-value by the number of comparisons and widens the confidence intervals accordingly:

M.predict(
    atexog={"age": [25, 45, 65]},
    vce="simulation",
    n_sims=4000,
    ci_method="bonferroni"
).summary()

	prediction	std err	z	P>\|z\|	[95% Conf.	Interval]
age=25	0.176056	0.009514	18.505150	1.876542e-76	0.153280	0.198832
age=45	0.354894	0.006703	52.948584	0.000000e+00	0.338848	0.370940
age=65	0.583282	0.013668	42.673914	0.000000e+00	0.550560	0.616003

Sidak adjustment¶

The Sidak method is slightly less conservative than Bonferroni:

M.predict(
    atexog={"age": [25, 45, 65]},
    vce="simulation",
    n_sims=4000,
    ci_method="sidak"
).summary()

	prediction	std err	z	P>\|z\|	[95% Conf.	Interval]
age=25	0.176056	0.009445	18.639602	1.533780e-77	0.153503	0.198609
age=45	0.354894	0.006863	51.708469	0.000000e+00	0.338506	0.371282
age=65	0.583282	0.013896	41.973358	0.000000e+00	0.550100	0.616463

Sup-t adjustment¶

The sup-t (supremum t) method accounts for the correlation between estimates when constructing simultaneous bands. It typically produces tighter intervals than Bonferroni while maintaining correct simultaneous coverage:

M.predict(
    atexog={"age": [25, 45, 65]},
    vce="simulation",
    n_sims=4000,
    ci_method="sup-t"
).summary()

	prediction	std err	z	P>\|z\|	[95% Conf.	Interval]
age=25	0.176056	0.009500	18.531380	1.152913e-76	0.154020	0.198092
age=45	0.354894	0.006691	53.043957	0.000000e+00	0.339375	0.370412
age=65	0.583282	0.013888	41.997637	0.000000e+00	0.551068	0.615495

Compare the three methods. All three widen the intervals relative to the unadjusted case, but Bonferroni is the most conservative (widest), sup-t is the least conservative (narrowest), and Sidak falls in between.

Summary of inference options¶

Method	When to use
Default (delta method)	Fast, works well for most cases
`cov_type="HC3"`	You suspect heteroskedasticity
`vce="simulation"`	You want to avoid delta-method approximations
`vce="bootstrap"`	You want inference robust to distributional assumptions
`ci_method="bonferroni"`	Conservative multiple comparisons
`ci_method="sidak"`	Slightly less conservative than Bonferroni
`ci_method="sup-t"`	Correlated estimates, tighter simultaneous bands

Recap¶

In this tutorial we covered four approaches to inference:

Robust standard errors via cov_type="HC3" on the Margins object
Krinsky-Robb simulation via vce="simulation"
Bootstrap resampling via vce="bootstrap"
Multiple comparison adjustments via ci_method (Bonferroni, Sidak, sup-t)

These options can be combined. For example, you can use KR simulation with Bonferroni adjustment, or bootstrap with Sidak adjustment.

Next steps¶

Apply inference methods to difference-in-differences in Tutorial 5: Difference-in-Differences
See the reference for inference options
Learn about robust and clustered standard errors in How-To: Robust and Clustered Standard Errors
Learn about KR simulation in depth in How-To: KR Simulation VCE
Learn about bootstrap inference in depth in How-To: Bootstrap VCE
Learn about simultaneous confidence intervals in How-To: Simultaneous Confidence Intervals
Compare inference methods in Explanation: Inference Comparison