How to choose between formula mode and raw exog mode¶
Prerequisites¶
Tutorial: First steps with smmargins — fitting a model and computing a basic AME
Explanation: Design matrix reconstruction — how patsy rebuilds the design matrix when variables are perturbed
Problem statement¶
You need to decide whether to fit your model with a formula (e.g., smf.logit("y ~ x1 + x2", data=df)) or with raw matrices (e.g., sm.Logit(y, X).fit()). The choice affects whether interactions, polynomials, and transformations are correctly propagated when Margins perturbs variables for marginal effects.
Minimal working solution¶
Formula mode (recommended)¶
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins
rng = np.random.default_rng(7)
n = 500
df = pd.DataFrame({
"x1": rng.standard_normal(n),
"x2": rng.standard_normal(n),
"grp": rng.choice(["A", "B", "C"], n),
})
df["y"] = (0.5 + 1.0 * df.x1 - 0.5 * df.x2 + (df.grp == "B") + 0.3 * rng.standard_normal(n) > 0).astype(int)
# Formula mode: interactions and polynomials are handled correctly
fit_formula = smf.logit("y ~ x1 + I(x1**2) + x1:x2 + C(grp)", data=df).fit(disp=False)
M_formula = Margins(fit_formula)
# All of these correctly update x1**2 and x1:x2 when perturbing x1
print(M_formula.dydx("x1"))
print(M_formula.dydx("grp"))
Raw mode (limited)¶
import statsmodels.api as sm
# Raw mode: manually build design matrix
X = sm.add_constant(df[["x1", "x2"]])
y = df["y"]
fit_raw = sm.Logit(y, X).fit(disp=False)
M_raw = Margins(fit_raw)
# Works for simple additive models
print(M_raw.dydx("x1"))
Variations¶
Raw mode with manually created interaction (incorrect)¶
# Raw mode with an interaction column — MARGINS WILL BE WRONG
X_bad = sm.add_constant(df[["x1", "x2"]].copy())
X_bad["x1_x2"] = X_bad["x1"] * X_bad["x2"] # manually created interaction
fit_bad = sm.Logit(y, X_bad).fit(disp=False)
M_bad = Margins(fit_bad)
# This perturbs only the "x1" column, NOT "x1_x2"
# The marginal effect of x1 will be incorrect
print(M_bad.dydx("x1")) # WARNING: interaction not updated!
Checking which mode is active¶
# Inspect raw_mode flag
print(f"Formula mode: {not M_formula._raw_mode}") # False (formula)
print(f"Raw mode: {M_raw._raw_mode}") # True (raw)
Forcing raw mode with explicit data¶
# If model.data.frame is missing, pass data explicitly
X_extra = sm.add_constant(df[["x1", "x2"]])
fit_extra = sm.Logit(y, X_extra).fit(disp=False)
M_extra = Margins(fit_extra, data=df[["x1", "x2"]])
print(M_extra.dydx("x1"))
⚠️ Trade-off: Formula mode uses patsy’s
DesignInfoto rebuild the design matrix whenever a variable is perturbed. This correctly handlesI(x**2),x1:x2,C(group), splines (bs(x, df=4)), and all other patsy transforms. Raw mode perturbs only the literal column with the matching name — interactions and transformations are not automatically updated. Raw mode is faster and uses less memory for simple additive models without interactions.
When to use formula mode¶
Use formula mode whenever your model includes interactions (x1:x2), polynomial terms (I(x**2)), categorical variables (C(group)), spline terms (bs(x)), or any other patsy transform. Formula mode is the default recommendation for all but the simplest models.
When to use raw mode¶
Use raw mode when you have a simple additive model with no interactions or transformations and you want to avoid the patsy overhead. Raw mode is also necessary when working with models that do not support formulas (some custom statsmodels subclasses).
When NOT to use raw mode¶
⚠️ Trade-off: Do not use raw mode when your design matrix contains manually created interaction columns, polynomial columns, or any derived feature that depends on the variable being perturbed. The marginal effects will be wrong because
Marginsdoes not know about the relationship between columns. If you must use raw mode with interactions, compute the marginal effect by hand or switch to a formula.
See also¶
Explanation: Why patsy — the design matrix reconstruction mechanism
Reference: Margins raw_mode — inspecting the mode flag
StatsModels documentation on
patsyan