How to set covariate profiles with `values=`, `Expr`, and `newdata=`¶

Prerequisites¶

Tutorial: First steps with smmargins — fitting a model and computing basic predictions
Tutorial: Adjusted predictions — at= and atexog=

Problem statement¶

You want to evaluate predictions or marginal effects under hypothetical covariate profiles: setting income to its 25th percentile, increasing everyone’s age by five years, or using an entirely new data frame. The values= keyword provides a per-variable DSL for this; Expr lets you write formula-based transformations; and newdata= is the escape hatch for arbitrary frames.

Minimal working solution¶

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins, Expr

rng = np.random.default_rng(7)
N = 5_000
df = pd.DataFrame({
    "age": rng.normal(45, 12, N).clip(18, 90),
    "income": rng.lognormal(10.5, 0.4, N),
    "educ": rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
    "female": rng.integers(0, 2, N),
})
df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-(
    -4 + 0.05 * df.age + 0.00001 * df.income
    + 0.8 * (df.educ == "college") + 1.4 * (df.educ == "grad")
    + 0.3 * df.female
)))).astype(int)

fit = smf.logit("voted ~ age + income + C(educ) + female", data=df).fit(disp=False)
M = Margins(fit)

# Hold income at its median; everything else stays as observed
print(M.predict(values={"income": "p50"}))

# Increase everyone's age by 5 years
print(M.predict(values={"age": Expr("age + 5")}))

# Evaluate at an out-of-sample profile
new = pd.DataFrame({"age": [30, 50, 70], "income": [30000, 50000, 90000],
                    "educ": ["college", "hs", "grad"], "female": [0, 1, 0]})
print(M.predict(newdata=new))

The `values=` DSL¶

values= is a mapping from variable names to specifications. Each specification can be one of several kinds:

Kind	Example	Result
Scalar	`{"age": 45}`	Column set to 45 for every row
Categorical level	`{"educ": "college"}`	Column set to `"college"`
Reducer	`{"income": "mean"}`	Column reduced to its mean
Percentile	`{"income": "p25"}`	Column reduced to its 25th percentile
Sequence	`{"age": [25, 45, 65]}`	Cartesian product over the list
Callable	`{"income": lambda d: d["income"] * 1.1}`	Per-row transform via function
Expr	`{"income": Expr("income * 1.1")}`	Per-row transform via `df.eval`

Built-in reducers¶

Valid reducer strings are:

"mean", "median", "mode", "min", "max", "zero"
Percentiles: "p0" through "p100"
"asobserved" (the default for default_values=)

Numeric reducers raise on non-numeric columns; "mode" and "zero" work for any dtype. "zero" on a categorical column sets it to the reference (first observed) level.

Sequences and the cartesian product¶

If any value is a list, tuple, or ndarray, smmargins computes the cartesian product over all sequence specs:

# 3 ages × 2 sexes = 6 predictions
M.predict(values={"age": [25, 45, 65], "female": [0, 1]})

Non-sequence specs (scalars, reducers, callables, Expr) are applied first and do not multiply the grid.

Callables¶

A callable receives the original data frame and must return a pd.Series or np.ndarray of the same length:

M.predict(values={"income": lambda d: d["income"] * 1.10})

`Expr`¶

:class:~smmargins.Expr wraps a string that is evaluated via :meth:pandas.DataFrame.eval against the original data frame. Use it when the transformation is easier to write as a formula:

M.predict(values={"income": Expr("income * 1.10")})
M.predict(values={"age": Expr("(age - age.mean()) / age.std()")})

Strings that are not wrapped in Expr are interpreted as reducer names or categorical levels, not as formulas.

`default_values=` — what happens to the rest¶

By default, columns not mentioned in values= or atexog= keep their observed values (default_values="asobserved"). You can change this:

# Hold age at 45; set all other numerics to their mean
M.predict(values={"age": 45}, default_values="mean")

# Set all numerics to their median; factors stay as observed
M.predict(values={"age": 45}, default_values="median")

Valid options for default_values are "asobserved", "mean", "median", "mode", and "zero".

`newdata=` — arbitrary frames¶

newdata= accepts any :class:pandas.DataFrame with the same columns as the fitting data. It is mutually exclusive with at, atexog, values, and over:

hypo = pd.DataFrame({"age": [25, 45, 65], "income": [30000, 50000, 80000],
                     "educ": ["college", "hs", "grad"], "female": [0, 1, 0]})
M.predict(newdata=hypo)

Use newdata= for out-of-sample profiles or when the DSL does not give you enough control.

Using the DSL with `dydx` and `contrast`¶

values= and default_values= work on every post-estimation method, not just predict:

# AME of age when income is held at its 25th percentile
M.dydx("age", values={"income": "p25"})

# Marginal effect of education when everyone is female
M.dydx("educ", values={"female": 1})

# Contrast: female=1 vs female=0, holding age at 45
M.contrast(a={"female": 1, "age": 45}, b={"female": 0, "age": 45})

When to use this¶

Use values= for the common case where you want to tweak a few variables and leave the rest at observed or mean values. Use Expr when the counterfactual is naturally expressed as a formula (e.g. “income + 10%”). Use newdata= when you have a full data frame of profiles, especially out-of-sample ones.

When NOT to use this¶

⚠️ Trade-off: Do not mix newdata= with at/atexog/values/over — they are mutually exclusive. Pass one complete frame OR use the DSL, not both.

⚠️ Trade-off: default_values="mean" reduces unspecified numeric columns to their mean, but factors stay as observed (or at their mode/reference depending on factor_stat=). If you want a single representative individual, use at="mean" instead.

How to set covariate profiles with `values=`, `Expr`, and `newdata=`¶

Prerequisites¶

Problem statement¶

Minimal working solution¶

The `values=` DSL¶

Built-in reducers¶

Sequences and the cartesian product¶

Callables¶

`Expr`¶

`default_values=` — what happens to the rest¶

`newdata=` — arbitrary frames¶

Using the DSL with `dydx` and `contrast`¶

When to use this¶

When NOT to use this¶

See also¶

smmargins

Navigation

Related Topics

How to set covariate profiles with values=, Expr, and newdata=¶

Prerequisites¶

Problem statement¶

Minimal working solution¶

The values= DSL¶

Built-in reducers¶

Sequences and the cartesian product¶

Callables¶

Expr¶

default_values= — what happens to the rest¶

newdata= — arbitrary frames¶

Using the DSL with dydx and contrast¶

When to use this¶

When NOT to use this¶

See also¶

How to set covariate profiles with `values=`, `Expr`, and `newdata=`¶

The `values=` DSL¶

`Expr`¶

`default_values=` — what happens to the rest¶

`newdata=` — arbitrary frames¶

Using the DSL with `dydx` and `contrast`¶