Tutorial 2: Adjusted Predictions at Different Points

In Tutorial 1 we computed a single average adjusted prediction. In practice, we often want predictions at specific covariate profiles: at the means, at representative values, or across a grid. This tutorial introduces three types of adjusted predictions and shows how to compute each.

What you will learn

  • Average Adjusted Prediction (AAP)

  • Adjusted Prediction at the Mean (APM)

  • Adjusted Prediction at Representative values (APR)

  • How to use the atexog parameter

  • How to build tables suitable for plotting

Setup

We continue with the same fitted model and Margins object from Tutorial 1:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins

rng = np.random.default_rng(7)
N = 5_000
df = pd.DataFrame({
    "age":    rng.normal(45, 12, N).clip(18, 90),
    "income": rng.lognormal(10.5, 0.4, N),
    "educ":   rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
    "female": rng.integers(0, 2, N),
})
eta = (-4.0 + 0.05 * df["age"] + 0.00001 * df["income"]
       + 0.8 * (df["educ"] == "college") + 1.4 * (df["educ"] == "grad")
       + 0.3 * df["female"] - 0.0004 * df["age"] * df["female"])
df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)

fit = smf.logit("voted ~ age + income + C(educ) + female + age:female", data=df).fit(disp=False)
M = Margins(fit)

Average Adjusted Prediction (AAP)

The AAP is the default. It computes the predicted probability for every observation in the dataset using each person’s actual covariate values, then averages across all observations:

M.predict()
     prediction  std err          z  P>|z|  [95% Conf.  Interval]
AAP      0.3622  0.00634  57.128076    0.0    0.349774   0.374626

The AAP answers: “What is the average predicted probability of voting in our sample?” It respects the joint distribution of all covariates in the data.

Adjusted Prediction at the Mean (APM)

The APM sets every continuous covariate to its sample mean and every categorical covariate to its reference level, then computes a single prediction. Use at="mean":

M.predict(at="mean")
     prediction   std err          z  P>|z|  [95% Conf.  Interval]
APM    0.342606  0.007269  47.133266    0.0    0.328359   0.356853

The APM answers: “What is the predicted probability of voting for an average person?” This is a single representative individual, not an average across the sample. Note that the APM (0.4387) differs slightly from the AAP (0.4521) because the logistic function is nonlinear.

Adjusted Prediction at Representative values (APR)

The APR lets you choose specific values for one or more covariates and compute predictions at those profiles. You pass a dictionary to atexog where keys are variable names and values are lists of levels:

M.predict(atexog={"age": [25, 45, 65]})
        prediction   std err          z         P>|z|  [95% Conf.  Interval]
age=25    0.176056  0.009420  18.690339  5.933527e-78    0.157594   0.194518
age=45    0.354894  0.006769  52.430979  0.000000e+00    0.341627   0.368160
age=65    0.583282  0.013865  42.069884  0.000000e+00    0.556107   0.610456

The APR answers: “What is the predicted probability of voting at ages 25, 45, and 65?” Each row is a separate prediction at the specified age, with all other variables held at their observed values (averaged in the AAP style).

Grids with multiple variables

You can vary multiple variables simultaneously. smmargins expands the combinations into a full grid. Here we vary age in 10-year increments and gender:

M.predict(atexog={"age": list(range(20, 91, 10)), "female": [0, 1]})
                  prediction   std err          z          P>|z|  [95% Conf.  Interval]
age=20, female=0    0.123347  0.012292  10.035101   1.068496e-23    0.099256   0.147439
age=20, female=1    0.164470  0.014674  11.208286   3.713123e-29    0.135709   0.193230
age=30, female=0    0.186845  0.011839  15.781666   4.160572e-56    0.163641   0.210050
age=30, female=1    0.239910  0.013264  18.087966   3.964516e-73    0.213914   0.265906
age=40, female=0    0.271600  0.009876  27.501919  1.665249e-166    0.252244   0.290956
age=40, female=1    0.334786  0.010396  32.203557  1.573744e-227    0.314410   0.355162
age=50, female=0    0.375697  0.010377  36.205507  4.986692e-287    0.355359   0.396036
age=50, female=1    0.444252  0.010654  41.699137   0.000000e+00    0.423371   0.465133
age=60, female=0    0.491955  0.016432  29.939437  6.038951e-197    0.459750   0.524161
age=60, female=1    0.559185  0.016084  34.765950  7.957754e-265    0.527661   0.590710
age=70, female=0    0.609209  0.023393  26.042006  1.657268e-149    0.563359   0.655059
age=70, female=1    0.668635  0.021582  30.981020  9.712740e-211    0.626335   0.710936
age=80, female=0    0.715867  0.027463  26.066971  8.639534e-150    0.662042   0.769693
age=80, female=1    0.763351  0.024108  31.663543  4.937110e-220    0.716100   0.810603
age=90, female=0    0.803790  0.027474  29.256573  3.703925e-188    0.749942   0.857637
age=90, female=1    0.838400  0.023250  36.060650  9.390909e-285    0.792831   0.883969

This produces 16 rows (8 age levels times 2 gender levels). Each row is a prediction at a specific age-gender profile, with education and income averaged over the sample. This table is ideal for plotting: you can plot estimate against age, grouping by female, and add conf_low and conf_high as error bands.

AAP versus APM versus APR: when to use each

Type

Code

Use when

AAP

M.predict()

You want the average prediction in your sample

APM

M.predict(at="mean")

You want a single representative individual

APR

M.predict(atexog={...})

You want predictions at specific covariate profiles

Recap

In this tutorial we learned three types of adjusted predictions:

  1. AAP averages predictions across the observed sample

  2. APM predicts at the mean of all covariates

  3. APR predicts at user-specified covariate profiles using atexog

The atexog parameter accepts dictionaries of variable-value lists and expands them into a full grid. These grid tables are the building blocks for plots, which we cover in Tutorial 6.

Next steps