Prediction Scales and Link Functions¶

The Same Quantity, Different Lenses¶

If you come from Stata, you know that predict, pr gives predicted probabilities, predict, xb gives linear predictions, and margins defaults to the response scale. What Stata does not make explicit is that these are all the same underlying model evaluated through different lenses — and choosing the wrong lens for your estimand can produce nonsensical results.

If you use statsmodels.get_margeff, you get marginal effects on the response (probability) scale with no option to change. smmargins exposes the full set of scales and enforces valid scale-model combinations.


The Link Function Chain¶

Every GLM consists of three layers:

\[\underbrace{y}_{\text{response}} \leftarrow \underbrace{\mu = f(\eta)}_{\text{mean function}} \leftarrow \underbrace{\eta = X\beta}_{\text{linear predictor}}\]

The link function \(g(\mu) = \eta\) connects the linear predictor to the mean. Its inverse \(f = g^{-1}\) maps back to the response. A “scale” in smmargins is simply the point in this chain at which you choose to report.

The Response Scale¶

The default scale is "response" — the mean function applied to the linear predictor:

\[\hat{\mu}_i = f(x_i'\hat{\beta})\]

For logit, this is the predicted probability \(\Lambda(x_i'\hat{\beta})\). For Poisson, this is the predicted count \(\exp(x_i'\hat{\beta})\).

What this means in code: predict(scale="response") evaluates family.link.inverse(eta) for each observation and returns the mean of these values. For a counterfactual prediction, it evaluates the same function at the modified covariate values.

The Linear Scale¶

The "linear" scale reports the linear predictor directly:

\[\hat{\eta}_i = x_i'\hat{\beta}\]

This is Stata’s predict, xb. It is on the scale of the link function, not the response.

What this means in code: predict(scale="linear") skips the inverse link entirely and returns \(\hat{\eta}\). The delta-method Jacobian is simply \(X/n\), and the standard errors come from the linear model’s covariance structure directly.

⚠️ Trade-off: Linear-scale predictions are easier to interpret in terms of “one-unit changes in \(x\),” but they live on the log-odds scale (for logit) or the log-count scale (for Poisson), which is not where policy decisions are made. Response-scale predictions are on the probability or count scale, which is more intuitive but nonlinear in \(\beta\).


Model-Specific Scales¶

Odds Ratios: "or"¶

For logit models only, the "or" scale reports exponentiated linear predictions:

\[\text{OR}_i = \exp(x_i'\hat{\beta})\]

The odds ratio compares the odds of success \((P/(1-P))\) between two covariate profiles. A coefficient \(\beta_j\) of 0.5 means a one-unit increase in \(x_j\) multiplies the odds by \(e^{0.5} \approx 1.65\).

What this means in code: predict(scale="or") computes \(\exp(\hat{\eta}_i)\) for each observation. The delta-method Jacobian uses the chain rule: \(\partial \exp(\eta)/\partial \beta = \exp(\eta) \cdot x_i'\).

⚠️ Trade-off: Odds ratios are convenient for reporting (no probability-scale nonlinearity), but they are not collapsible over covariates. The average odds ratio is not the odds ratio at average covariates. Use with care in heterogeneous populations.

Incidence Rates: "ir"¶

For Poisson and negative binomial models, the "ir" scale is the predicted rate:

\[\hat{\lambda}_i = \exp(x_i'\hat{\beta})\]

This is mathematically identical to "response" for count models (since the mean of a Poisson is its rate), but the naming convention signals that the quantity is an incidence rate per unit exposure.

What this means in code: predict(scale="ir") on a Poisson model returns the same numeric values as scale="response", but the labeling and validation logic ensure the scale-model pairing is documented.

Exponential and Log Scales¶

The "exp" scale is \(\exp(\eta)\) — generic and available for any model where it is mathematically defined. The "log" scale is \(\log(\mu)\), which maps a response-scale prediction back to the linear predictor.

\[\text{"exp"}: \hat{y}_i = \exp(x_i'\hat{\beta})\]
\[\text{"log"}: \hat{y}_i = \log(f(x_i'\hat{\beta}))\]

What this means in code: scale="exp" on an OLS model gives exponentiated linear predictions (useful for log-linear models fit with OLS). scale="log" on a Poisson model gives \(\log(\hat{\lambda})\), which is the linear predictor — effectively undoing the inverse link.


Invalid Combinations¶

Not all scales make sense for all models. smmargins validates scale-model pairings and raises a clear error for invalid combinations:

Scale

Logit

Probit

Poisson

OLS

MNLogit

"response"

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

"linear"

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

"pr"

\(\checkmark\)

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

"ir"

\(\times\)

\(\times\)

\(\checkmark\)

\(\times\)

\(\times\)

"or"

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

\(\times\)

"exp"

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

For example, predict(scale="or") on a Poisson model raises:

ValueError: scale="or" (odds ratio) is only valid for logit models.

What this means in code: the validation happens before any computation, so you get a clear message rather than a silently wrong result.


Why Scale Matters for Marginal Effects¶

The scale parameter in dydx() controls the scale of the prediction that the derivative is taken with respect to, not the scale of the derivative itself. For a logit model:

  • dydx(scale="response"): \(\partial \Lambda(\eta) / \partial x_j\) — change in probability per unit change in \(x_j\)

  • dydx(scale="linear"): \(\partial \eta / \partial x_j = \beta_j\) — change in log-odds per unit change in \(x_j\)

The relationship between them is:

\[\frac{\partial \Lambda(\eta)}{\partial x_j} = \Lambda(\eta)(1 - \Lambda(\eta)) \cdot \beta_j\]

What this means in code: scale="response" marginal effects vary with \(x\) (they are largest near \(\Lambda = 0.5\) and approach zero in the tails), while scale="linear" marginal effects are constant. The choice is substantive, not cosmetic.


Related Documentation¶

  • Tutorial: Comparing Scales in Logit and Poisson Models — visualize how marginal effects differ across scales.

  • Reference: Margins.predict() for the complete list of valid scale-model combinations.

smmargins

Navigation

Getting started

  • Introduction

Tutorials — learning by doing

  • Tutorial 1: Getting Started with smmargins
  • Tutorial 2: Adjusted Predictions at Different Points
  • Tutorial 3: Marginal Effects
  • Tutorial 4: Inference and Standard Errors
  • Tutorial 5: Difference-in-Differences
  • Tutorial 6: Counterfactual Predictions and Plotting

How-to guides — task-focused recipes

  • How to compute robust and cluster-robust standard errors for marginal effects
  • How to use Krinsky–Robb simulation for standard errors and confidence intervals
  • How to compute bootstrap standard errors with pairs, cluster, or block resampling
  • How to compute simultaneous confidence intervals for families of margins
  • How to report marginal effects on custom scales and with user-defined transforms
  • How to compute subgroup-specific average marginal effects with the over parameter
  • How to perform joint Wald tests and pairwise comparisons on marginal effects
  • How to compute counterfactual predictions with values, Expr, and newdata
  • How to set covariate profiles with values=, Expr, and newdata=
  • How to compute elasticities and semi-elasticities for marginal effects
  • How to compute marginal effects for multinomial and ordered outcome models
  • How to plot predictions, slopes, and comparisons
  • How to verify smmargins results against R marginaleffects
  • How to choose between formula mode and raw exog mode

Reference

  • API reference
  • Demos

Explanations — theory and design

  • Mathematical motivation
  • Why Expr and the values= DSL
  • Analytic vs. Finite-Difference Jacobians
  • Why Patsy Design Matrix Rebuilding Matters
  • Prediction Scales and Link Functions
  • The Ai & Norton Difference-in-Differences Problem
  • Comparing Inference Methods: Delta vs. KR vs. Bootstrap
  • Multiple Comparison Adjustments
  • Discrete vs. Continuous Variable Detection
  • Formula Mode vs. Raw Exog Mode
  • Why Margins.contrast exists: joint covariance for two arms

Related Topics

  • Documentation overview
    • Previous: Why Patsy Design Matrix Rebuilding Matters
    • Next: The Ai & Norton Difference-in-Differences Problem
©2026. | Powered by Sphinx 9.1.0 & Alabaster 1.0.0 | Page source