How to verify smmargins results against R marginaleffects¶

Prerequisites¶

Tutorial: First steps with smmargins — basic model fitting and margins
R installed locally with the marginaleffects, readr, and sandwich packages

Problem statement¶

You want to confirm that smmargins produces results consistent with the de-facto cross-language reference, R’s marginaleffects package. You need to know which cases are covered, what tolerances are expected, and how to regenerate reference outputs.

Minimal working solution¶

The test suite in tests/comparison/ contains six parity cases. These tests auto-skip if the reference CSVs are missing, so the rest of the suite still runs without R.

import pathlib
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from numpy.testing import assert_allclose

from smmargins import Margins

HERE = pathlib.Path("tests/comparison")

def _read_r_csv(path):
    return pd.read_csv(path)

def _match_label(ame_labels, r_term):
    labels = np.array(ame_labels)
    for prefix in ("d", "e"):
        stripped = np.array([lab[len(prefix):] if lab.startswith(prefix) else lab for lab in labels])
        mask = stripped == r_term
        if np.any(mask):
            return np.where(mask)[0][0]
    return None

# Load shared fixture data
df = pd.read_csv(HERE / "data" / "data_seed42.csv")

# Case 1: Logit AME with HC3
fit = smf.logit("y_logit ~ x1 + x2 + C(grp)", data=df).fit(disp=False)
M = Margins(fit, cov_type="HC3")
ame = M.dydx(["x1", "x2"])

ref = _read_r_csv(HERE / "r_logit_ame_hc3.csv")
for _, row in ref.iterrows():
    if row["term"] == "grp":
        continue
    idx = _match_label(ame.labels, row["term"])
    assert_allclose(ame.estimate[idx], row["estimate"], atol=1e-6)
    assert_allclose(ame.se[idx], row["std.error"], atol=5e-5)
print("Case 1 (Logit AME HC3): PASS")

Variations¶

Case 2: Poisson AME with HC3¶

df = pd.read_csv(HERE / "data" / "data_seed42.csv")
fit = smf.glm("count ~ x1 + x2", data=df, family=sm.families.Poisson()).fit()
M = Margins(fit, cov_type="HC3")
ame = M.dydx(["x1", "x2"])

ref = _read_r_csv(HERE / "r_poisson_ame.csv")
for _, row in ref.iterrows():
    idx = _match_label(ame.labels, row["term"])
    assert_allclose(ame.estimate[idx], row["estimate"], atol=1e-6)
    assert_allclose(ame.se[idx], row["std.error"], atol=5e-4)
print("Case 2 (Poisson AME HC3): PASS")

Case 3: OLS pairs bootstrap¶

df = pd.read_csv(HERE / "data" / "data_seed12345.csv")
fit = smf.ols("y_ols ~ x1 + x2", data=df).fit()
M = Margins(fit)
ame = M.dydx(["x1", "x2"], vce="bootstrap", n_boot=1000, boot_seed=42)

ref = _read_r_csv(HERE / "r_ols_bootstrap.csv")
for _, row in ref.iterrows():
    idx = _match_label(ame.labels, row["term"])
    assert_allclose(ame.estimate[idx], row["estimate"], atol=1e-10)
    r_se = row["std.error"] if pd.notna(row.get("std.error")) else (row["conf.high"] - row["conf.low"]) / (2 * 1.96)
    mc_tol = 5 * max(ame.se[idx], r_se) / np.sqrt(1000)
    assert abs(ame.se[idx] - r_se) < mc_tol
print("Case 3 (OLS bootstrap): PASS")

Regenerating reference outputs¶

cd tests/comparison && Rscript generate_r.R

Tolerance summary¶

Case	Estimate tol	SE tol
Logit AME with HC3	1e-6	5e-5
Poisson AME with HC3	1e-6	5e-4
OLS pairs bootstrap (n=1000)	1e-10	Monte Carlo (5 sigma)
OLS with HC1, polynomial + interaction	1e-6	1e-6
Logit cluster-robust AME	1e-5	1e-4
Logit AME at representative x2 values	1e-5	1e-4

⚠️ Trade-off: The R parity tests are the strongest validation of the package’s numerical correctness, but they require a working R installation. They auto-skip when reference files are missing, so CI pipelines without R still pass. The bootstrap SE tolerance is Monte Carlo — it scales with the bootstrap SE itself, not an absolute threshold.

When to use this¶

Run the parity test suite whenever you modify the core inference machinery (Jacobian computation, delta method, VCE workers) or upgrade statsmodels. It is also useful when porting analyses from R to Python — the same data and model specification should match to the tolerances above.

When NOT to use this¶

⚠️ Trade-off: Do not treat these tests as a substitute for understanding your model — they verify numerical parity, not substantive correctness. The factor-contrast handling differs between smmargins and R marginaleffects (see the grp skip in Case 1), so categorical variable comparisons may not align one-for-one.

See also¶

tests/comparison/generate_r.R — the R script that generates reference CSVs
tests/comparison/test_r.py — the full pytest parity suite
Mathematical motivation — the mathematical foundation being verified