Tutorial 4: Inference and Standard Errors

The marginal effects we computed in previous tutorials came with standard errors and confidence intervals. See Mathematical motivation for details on how standard errors are computed. This tutorial shows alternative approaches: robust standard errors, the Krinsky-Robb simulation method, bootstrap resampling, and adjustments for multiple comparisons.

What you will learn

  • How to use robust (HC3) standard errors

  • How to use Krinsky-Robb simulation for inference

  • How to use bootstrap resampling for inference

  • How to adjust confidence intervals for multiple comparisons (Bonferroni, Sidak, sup-t)

Setup

We continue with the same fitted model and Margins object:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins

rng = np.random.default_rng(7)
N = 5_000
df = pd.DataFrame({
    "age":    rng.normal(45, 12, N).clip(18, 90),
    "income": rng.lognormal(10.5, 0.4, N),
    "educ":   rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
    "female": rng.integers(0, 2, N),
})
eta = (-4.0 + 0.05 * df["age"] + 0.00001 * df["income"]
       + 0.8 * (df["educ"] == "college") + 1.4 * (df["educ"] == "grad")
       + 0.3 * df["female"] - 0.0004 * df["age"] * df["female"])
df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)

fit = smf.logit("voted ~ age + income + C(educ) + female + age:female", data=df).fit(disp=False)
M = Margins(fit)

Robust standard errors (HC3)

By default, smmargins uses the variance-covariance matrix from the fitted model. If you want robust standard errors, you can pass cov_type="HC3" when creating the Margins object. HC3 is a heteroskedasticity-consistent estimator that is widely recommended for nonlinear models.

M_robust = Margins(fit, cov_type="HC3")
M_robust.dydx("age").summary()
dy/dx std err z P>|z| [95% Conf. Interval]
dage 0.010118 0.000499 20.275469 2.117617e-91 0.00914 0.011096

Compare this to the default standard error from Tutorial 3 (0.0005). The HC3 standard error (0.0006) is slightly larger, reflecting additional uncertainty from heteroskedasticity.

You can also pass cov_type="HC0", cov_type="HC1", or cov_type="HC2" for alternative robust estimators.

Krinsky-Robb simulation

The Krinsky-Robb (KR) method draws random parameter vectors from the estimated sampling distribution, recomputes the quantity of interest for each draw, and uses the distribution of those recomputed values as the basis for inference. This does not rely on the delta method.

To use KR simulation, pass vce="simulation" along with the number of simulations and an optional seed:

M.dydx("age", vce="simulation", n_sims=2000, sim_seed=42).summary()
dy/dx std err z P>|z| [95% Conf. Interval]
dage 0.010118 0.000497 20.374085 2.839719e-92 0.009108 0.011088

The n_sims parameter controls the number of simulated parameter draws. More simulations yield more stable standard errors but take longer to compute. The sim_seed ensures reproducibility.

You can use KR simulation with any marginal effect or prediction:

M.dydx("female", vce="simulation", n_sims=2000, sim_seed=42).summary()
contrast std err z P>|z| [95% Conf. Interval]
female: 1 vs 0 0.062486 0.012995 4.808567 0.000002 0.036856 0.088527

Bootstrap resampling

Bootstrap inference resamples the data with replacement, re-fits the model, and recomputes the quantity of interest for each resample. The variation across bootstrap replicates provides the standard error.

To use bootstrap, pass vce="bootstrap":

M.dydx("age", vce="bootstrap", n_boot=500, boot_seed=42).summary()
dy/dx std err z P>|z| [95% Conf. Interval]
dage 0.010118 0.000505 20.023489 3.437809e-89 0.00919 0.011156

Here n_boot=500 specifies 500 bootstrap replicates. The boot_seed ensures reproducibility. Bootstrap is more computationally intensive than KR simulation because it requires re-fitting the model for each resample, but it makes fewer assumptions about the sampling distribution.

Multiple comparison adjustments

When you compute predictions at multiple points, you are making multiple simultaneous inferences. By default, confidence intervals are pointwise (each interval has its own coverage probability). You may want simultaneous coverage: the probability that all intervals contain their true values.

smmargins offers three adjustment methods. Let us first get predictions at several ages:

pred_grid = M.predict(atexog={"age": [25, 45, 65]})
pred_grid.summary()
prediction std err z P>|z| [95% Conf. Interval]
age=25 0.176056 0.009420 18.690339 5.933527e-78 0.157594 0.194518
age=45 0.354894 0.006769 52.430979 0.000000e+00 0.341627 0.368160
age=65 0.583282 0.013865 42.069884 0.000000e+00 0.556107 0.610456

Bonferroni adjustment

The Bonferroni method multiplies each p-value by the number of comparisons and widens the confidence intervals accordingly:

M.predict(
    atexog={"age": [25, 45, 65]},
    vce="simulation",
    n_sims=4000,
    ci_method="bonferroni"
).summary()
prediction std err z P>|z| [95% Conf. Interval]
age=25 0.176056 0.009514 18.505150 1.876542e-76 0.153280 0.198832
age=45 0.354894 0.006703 52.948584 0.000000e+00 0.338848 0.370940
age=65 0.583282 0.013668 42.673914 0.000000e+00 0.550560 0.616003

Sidak adjustment

The Sidak method is slightly less conservative than Bonferroni:

M.predict(
    atexog={"age": [25, 45, 65]},
    vce="simulation",
    n_sims=4000,
    ci_method="sidak"
).summary()
prediction std err z P>|z| [95% Conf. Interval]
age=25 0.176056 0.009445 18.639602 1.533780e-77 0.153503 0.198609
age=45 0.354894 0.006863 51.708469 0.000000e+00 0.338506 0.371282
age=65 0.583282 0.013896 41.973358 0.000000e+00 0.550100 0.616463

Sup-t adjustment

The sup-t (supremum t) method accounts for the correlation between estimates when constructing simultaneous bands. It typically produces tighter intervals than Bonferroni while maintaining correct simultaneous coverage:

M.predict(
    atexog={"age": [25, 45, 65]},
    vce="simulation",
    n_sims=4000,
    ci_method="sup-t"
).summary()
prediction std err z P>|z| [95% Conf. Interval]
age=25 0.176056 0.009500 18.531380 1.152913e-76 0.154020 0.198092
age=45 0.354894 0.006691 53.043957 0.000000e+00 0.339375 0.370412
age=65 0.583282 0.013888 41.997637 0.000000e+00 0.551068 0.615495

Compare the three methods. All three widen the intervals relative to the unadjusted case, but Bonferroni is the most conservative (widest), sup-t is the least conservative (narrowest), and Sidak falls in between.

Summary of inference options

Method

When to use

Default (delta method)

Fast, works well for most cases

cov_type="HC3"

You suspect heteroskedasticity

vce="simulation"

You want to avoid delta-method approximations

vce="bootstrap"

You want inference robust to distributional assumptions

ci_method="bonferroni"

Conservative multiple comparisons

ci_method="sidak"

Slightly less conservative than Bonferroni

ci_method="sup-t"

Correlated estimates, tighter simultaneous bands

Recap

In this tutorial we covered four approaches to inference:

  1. Robust standard errors via cov_type="HC3" on the Margins object

  2. Krinsky-Robb simulation via vce="simulation"

  3. Bootstrap resampling via vce="bootstrap"

  4. Multiple comparison adjustments via ci_method (Bonferroni, Sidak, sup-t)

These options can be combined. For example, you can use KR simulation with Bonferroni adjustment, or bootstrap with Sidak adjustment.

Next steps