Tutorial 4: Inference and Standard Errors¶
The marginal effects we computed in previous tutorials came with standard errors and confidence intervals. See Mathematical motivation for details on how standard errors are computed. This tutorial shows alternative approaches: robust standard errors, the Krinsky-Robb simulation method, bootstrap resampling, and adjustments for multiple comparisons.
What you will learn¶
How to use robust (HC3) standard errors
How to use Krinsky-Robb simulation for inference
How to use bootstrap resampling for inference
How to adjust confidence intervals for multiple comparisons (Bonferroni, Sidak, sup-t)
Setup¶
We continue with the same fitted model and Margins object:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins
rng = np.random.default_rng(7)
N = 5_000
df = pd.DataFrame({
"age": rng.normal(45, 12, N).clip(18, 90),
"income": rng.lognormal(10.5, 0.4, N),
"educ": rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
"female": rng.integers(0, 2, N),
})
eta = (-4.0 + 0.05 * df["age"] + 0.00001 * df["income"]
+ 0.8 * (df["educ"] == "college") + 1.4 * (df["educ"] == "grad")
+ 0.3 * df["female"] - 0.0004 * df["age"] * df["female"])
df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)
fit = smf.logit("voted ~ age + income + C(educ) + female + age:female", data=df).fit(disp=False)
M = Margins(fit)
Robust standard errors (HC3)¶
By default, smmargins uses the variance-covariance matrix from the fitted model. If you want robust standard errors, you can pass cov_type="HC3" when creating the Margins object. HC3 is a heteroskedasticity-consistent estimator that is widely recommended for nonlinear models.
M_robust = Margins(fit, cov_type="HC3")
M_robust.dydx("age").summary()
| dy/dx | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| dage | 0.010118 | 0.000499 | 20.275469 | 2.117617e-91 | 0.00914 | 0.011096 |
Compare this to the default standard error from Tutorial 3 (0.0005). The HC3 standard error (0.0006) is slightly larger, reflecting additional uncertainty from heteroskedasticity.
You can also pass cov_type="HC0", cov_type="HC1", or cov_type="HC2" for alternative robust estimators.
Krinsky-Robb simulation¶
The Krinsky-Robb (KR) method draws random parameter vectors from the estimated sampling distribution, recomputes the quantity of interest for each draw, and uses the distribution of those recomputed values as the basis for inference. This does not rely on the delta method.
To use KR simulation, pass vce="simulation" along with the number of simulations and an optional seed:
M.dydx("age", vce="simulation", n_sims=2000, sim_seed=42).summary()
| dy/dx | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| dage | 0.010118 | 0.000497 | 20.374085 | 2.839719e-92 | 0.009108 | 0.011088 |
The n_sims parameter controls the number of simulated parameter draws. More simulations yield more stable standard errors but take longer to compute. The sim_seed ensures reproducibility.
You can use KR simulation with any marginal effect or prediction:
M.dydx("female", vce="simulation", n_sims=2000, sim_seed=42).summary()
| contrast | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| female: 1 vs 0 | 0.062486 | 0.012995 | 4.808567 | 0.000002 | 0.036856 | 0.088527 |
Bootstrap resampling¶
Bootstrap inference resamples the data with replacement, re-fits the model, and recomputes the quantity of interest for each resample. The variation across bootstrap replicates provides the standard error.
To use bootstrap, pass vce="bootstrap":
M.dydx("age", vce="bootstrap", n_boot=500, boot_seed=42).summary()
| dy/dx | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| dage | 0.010118 | 0.000505 | 20.023489 | 3.437809e-89 | 0.00919 | 0.011156 |
Here n_boot=500 specifies 500 bootstrap replicates. The boot_seed ensures reproducibility. Bootstrap is more computationally intensive than KR simulation because it requires re-fitting the model for each resample, but it makes fewer assumptions about the sampling distribution.
Multiple comparison adjustments¶
When you compute predictions at multiple points, you are making multiple simultaneous inferences. By default, confidence intervals are pointwise (each interval has its own coverage probability). You may want simultaneous coverage: the probability that all intervals contain their true values.
smmargins offers three adjustment methods. Let us first get predictions at several ages:
pred_grid = M.predict(atexog={"age": [25, 45, 65]})
pred_grid.summary()
| prediction | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| age=25 | 0.176056 | 0.009420 | 18.690339 | 5.933527e-78 | 0.157594 | 0.194518 |
| age=45 | 0.354894 | 0.006769 | 52.430979 | 0.000000e+00 | 0.341627 | 0.368160 |
| age=65 | 0.583282 | 0.013865 | 42.069884 | 0.000000e+00 | 0.556107 | 0.610456 |
Bonferroni adjustment¶
The Bonferroni method multiplies each p-value by the number of comparisons and widens the confidence intervals accordingly:
M.predict(
atexog={"age": [25, 45, 65]},
vce="simulation",
n_sims=4000,
ci_method="bonferroni"
).summary()
| prediction | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| age=25 | 0.176056 | 0.009514 | 18.505150 | 1.876542e-76 | 0.153280 | 0.198832 |
| age=45 | 0.354894 | 0.006703 | 52.948584 | 0.000000e+00 | 0.338848 | 0.370940 |
| age=65 | 0.583282 | 0.013668 | 42.673914 | 0.000000e+00 | 0.550560 | 0.616003 |
Sidak adjustment¶
The Sidak method is slightly less conservative than Bonferroni:
M.predict(
atexog={"age": [25, 45, 65]},
vce="simulation",
n_sims=4000,
ci_method="sidak"
).summary()
| prediction | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| age=25 | 0.176056 | 0.009445 | 18.639602 | 1.533780e-77 | 0.153503 | 0.198609 |
| age=45 | 0.354894 | 0.006863 | 51.708469 | 0.000000e+00 | 0.338506 | 0.371282 |
| age=65 | 0.583282 | 0.013896 | 41.973358 | 0.000000e+00 | 0.550100 | 0.616463 |
Sup-t adjustment¶
The sup-t (supremum t) method accounts for the correlation between estimates when constructing simultaneous bands. It typically produces tighter intervals than Bonferroni while maintaining correct simultaneous coverage:
M.predict(
atexog={"age": [25, 45, 65]},
vce="simulation",
n_sims=4000,
ci_method="sup-t"
).summary()
| prediction | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| age=25 | 0.176056 | 0.009500 | 18.531380 | 1.152913e-76 | 0.154020 | 0.198092 |
| age=45 | 0.354894 | 0.006691 | 53.043957 | 0.000000e+00 | 0.339375 | 0.370412 |
| age=65 | 0.583282 | 0.013888 | 41.997637 | 0.000000e+00 | 0.551068 | 0.615495 |
Compare the three methods. All three widen the intervals relative to the unadjusted case, but Bonferroni is the most conservative (widest), sup-t is the least conservative (narrowest), and Sidak falls in between.
Summary of inference options¶
Method |
When to use |
|---|---|
Default (delta method) |
Fast, works well for most cases |
|
You suspect heteroskedasticity |
|
You want to avoid delta-method approximations |
|
You want inference robust to distributional assumptions |
|
Conservative multiple comparisons |
|
Slightly less conservative than Bonferroni |
|
Correlated estimates, tighter simultaneous bands |
Recap¶
In this tutorial we covered four approaches to inference:
Robust standard errors via
cov_type="HC3"on theMarginsobjectKrinsky-Robb simulation via
vce="simulation"Bootstrap resampling via
vce="bootstrap"Multiple comparison adjustments via
ci_method(Bonferroni, Sidak, sup-t)
These options can be combined. For example, you can use KR simulation with Bonferroni adjustment, or bootstrap with Sidak adjustment.
Next steps¶
Apply inference methods to difference-in-differences in Tutorial 5: Difference-in-Differences
See the reference for inference options
Learn about robust and clustered standard errors in How-To: Robust and Clustered Standard Errors
Learn about KR simulation in depth in How-To: KR Simulation VCE
Learn about bootstrap inference in depth in How-To: Bootstrap VCE
Learn about simultaneous confidence intervals in How-To: Simultaneous Confidence Intervals
Compare inference methods in Explanation: Inference Comparison