# How to set covariate profiles with `values=`, `Expr`, and `newdata=` ## Prerequisites - {doc}`Tutorial: First steps with smmargins ` — fitting a model and computing basic predictions - {doc}`Tutorial: Adjusted predictions ` — `at=` and `atexog=` ## Problem statement You want to evaluate predictions or marginal effects under hypothetical covariate profiles: setting income to its 25th percentile, increasing everyone's age by five years, or using an entirely new data frame. The `values=` keyword provides a per-variable DSL for this; `Expr` lets you write formula-based transformations; and `newdata=` is the escape hatch for arbitrary frames. ## Minimal working solution ```python import numpy as np import pandas as pd import statsmodels.formula.api as smf from smmargins import Margins, Expr rng = np.random.default_rng(7) N = 5_000 df = pd.DataFrame({ "age": rng.normal(45, 12, N).clip(18, 90), "income": rng.lognormal(10.5, 0.4, N), "educ": rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]), "female": rng.integers(0, 2, N), }) df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-( -4 + 0.05 * df.age + 0.00001 * df.income + 0.8 * (df.educ == "college") + 1.4 * (df.educ == "grad") + 0.3 * df.female )))).astype(int) fit = smf.logit("voted ~ age + income + C(educ) + female", data=df).fit(disp=False) M = Margins(fit) # Hold income at its median; everything else stays as observed print(M.predict(values={"income": "p50"})) # Increase everyone's age by 5 years print(M.predict(values={"age": Expr("age + 5")})) # Evaluate at an out-of-sample profile new = pd.DataFrame({"age": [30, 50, 70], "income": [30000, 50000, 90000], "educ": ["college", "hs", "grad"], "female": [0, 1, 0]}) print(M.predict(newdata=new)) ``` ## The `values=` DSL `values=` is a mapping from variable names to **specifications**. Each specification can be one of several kinds: | Kind | Example | Result | |------|---------|--------| | Scalar | `{"age": 45}` | Column set to 45 for every row | | Categorical level | `{"educ": "college"}` | Column set to `"college"` | | Reducer | `{"income": "mean"}` | Column reduced to its mean | | Percentile | `{"income": "p25"}` | Column reduced to its 25th percentile | | Sequence | `{"age": [25, 45, 65]}` | Cartesian product over the list | | Callable | `{"income": lambda d: d["income"] * 1.1}` | Per-row transform via function | | Expr | `{"income": Expr("income * 1.1")}` | Per-row transform via `df.eval` | ### Built-in reducers Valid reducer strings are: - `"mean"`, `"median"`, `"mode"`, `"min"`, `"max"`, `"zero"` - Percentiles: `"p0"` through `"p100"` - `"asobserved"` (the default for `default_values=`) Numeric reducers raise on non-numeric columns; `"mode"` and `"zero"` work for any dtype. `"zero"` on a categorical column sets it to the reference (first observed) level. ### Sequences and the cartesian product If any value is a list, tuple, or ndarray, `smmargins` computes the cartesian product over all sequence specs: ```python # 3 ages × 2 sexes = 6 predictions M.predict(values={"age": [25, 45, 65], "female": [0, 1]}) ``` Non-sequence specs (scalars, reducers, callables, `Expr`) are applied first and do not multiply the grid. ### Callables A callable receives the original data frame and must return a `pd.Series` or `np.ndarray` of the same length: ```python M.predict(values={"income": lambda d: d["income"] * 1.10}) ``` ### `Expr` :class:`~smmargins.Expr` wraps a string that is evaluated via :meth:`pandas.DataFrame.eval` against the original data frame. Use it when the transformation is easier to write as a formula: ```python M.predict(values={"income": Expr("income * 1.10")}) M.predict(values={"age": Expr("(age - age.mean()) / age.std()")}) ``` Strings that are *not* wrapped in `Expr` are interpreted as reducer names or categorical levels, not as formulas. ## `default_values=` — what happens to the rest By default, columns not mentioned in `values=` or `atexog=` keep their **observed** values (`default_values="asobserved"`). You can change this: ```python # Hold age at 45; set all other numerics to their mean M.predict(values={"age": 45}, default_values="mean") # Set all numerics to their median; factors stay as observed M.predict(values={"age": 45}, default_values="median") ``` Valid options for `default_values` are `"asobserved"`, `"mean"`, `"median"`, `"mode"`, and `"zero"`. ## `newdata=` — arbitrary frames `newdata=` accepts any :class:`pandas.DataFrame` with the same columns as the fitting data. It is **mutually exclusive** with `at`, `atexog`, `values`, and `over`: ```python hypo = pd.DataFrame({"age": [25, 45, 65], "income": [30000, 50000, 80000], "educ": ["college", "hs", "grad"], "female": [0, 1, 0]}) M.predict(newdata=hypo) ``` Use `newdata=` for out-of-sample profiles or when the DSL does not give you enough control. ## Using the DSL with `dydx` and `contrast` `values=` and `default_values=` work on every post-estimation method, not just `predict`: ```python # AME of age when income is held at its 25th percentile M.dydx("age", values={"income": "p25"}) # Marginal effect of education when everyone is female M.dydx("educ", values={"female": 1}) # Contrast: female=1 vs female=0, holding age at 45 M.contrast(a={"female": 1, "age": 45}, b={"female": 0, "age": 45}) ``` ## When to use this Use `values=` for the common case where you want to tweak a few variables and leave the rest at observed or mean values. Use `Expr` when the counterfactual is naturally expressed as a formula (e.g. "income + 10%"). Use `newdata=` when you have a full data frame of profiles, especially out-of-sample ones. ## When NOT to use this > ⚠️ **Trade-off:** Do not mix `newdata=` with `at`/`atexog`/`values`/`over` — they are mutually exclusive. Pass one complete frame OR use the DSL, not both. > > ⚠️ **Trade-off:** `default_values="mean"` reduces unspecified numeric columns to their mean, but factors stay as observed (or at their mode/reference depending on `factor_stat=`). If you want a single representative individual, use `at="mean"` instead. ## See also - {doc}`How to compute counterfactual predictions with values, Expr, and newdata ` — additional examples including joint contrasts - {doc}`How to compute elasticities ` — `values=` inside elasticity calls - {doc}`Reference: Expr ` — API reference for the `Expr` wrapper