{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": "# Tutorial 2: Adjusted Predictions at Different Points\n\nIn Tutorial 1 we computed a single average adjusted prediction. In practice, we often want predictions at specific covariate profiles: at the means, at representative values, or across a grid. This tutorial introduces three types of adjusted predictions and shows how to compute each.\n\n## What you will learn\n\n- Average Adjusted Prediction (AAP)\n- Adjusted Prediction at the Mean (APM)\n- Adjusted Prediction at Representative values (APR)\n- How to use the `atexog` parameter\n- How to build tables suitable for plotting\n\n## Setup\n\nWe continue with the same fitted model and `Margins` object from Tutorial 1:", "id": "3089d8e8" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "import numpy as np\nimport pandas as pd\nimport statsmodels.formula.api as smf\nfrom smmargins import Margins\n\nrng = np.random.default_rng(7)\nN = 5_000\ndf = pd.DataFrame({\n \"age\": rng.normal(45, 12, N).clip(18, 90),\n \"income\": rng.lognormal(10.5, 0.4, N),\n \"educ\": rng.choice([\"hs\", \"college\", \"grad\"], N, p=[0.4, 0.4, 0.2]),\n \"female\": rng.integers(0, 2, N),\n})\neta = (-4.0 + 0.05 * df[\"age\"] + 0.00001 * df[\"income\"]\n + 0.8 * (df[\"educ\"] == \"college\") + 1.4 * (df[\"educ\"] == \"grad\")\n + 0.3 * df[\"female\"] - 0.0004 * df[\"age\"] * df[\"female\"])\ndf[\"voted\"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)\n\nfit = smf.logit(\"voted ~ age + income + C(educ) + female + age:female\", data=df).fit(disp=False)\nM = Margins(fit)", "id": "d6a9a7bf" }, { "cell_type": "markdown", "metadata": {}, "source": "## Average Adjusted Prediction (AAP)\n\nThe AAP is the default. It computes the predicted probability for every observation in the dataset using each person's actual covariate values, then averages across all observations:", "id": "48aae78e" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.predict()", "id": "07adf8b1" }, { "cell_type": "markdown", "metadata": {}, "source": "The AAP answers: \"What is the average predicted probability of voting in our sample?\" It respects the joint distribution of all covariates in the data.\n\n## Adjusted Prediction at the Mean (APM)\n\nThe APM sets every continuous covariate to its sample mean and every categorical covariate to its reference level, then computes a single prediction. Use `at=\"mean\"`:", "id": "b67df4d0" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.predict(at=\"mean\")", "id": "fe4dac39" }, { "cell_type": "markdown", "metadata": {}, "source": "The APM answers: \"What is the predicted probability of voting for an average person?\" This is a single representative individual, not an average across the sample. Note that the APM (0.4387) differs slightly from the AAP (0.4521) because the logistic function is nonlinear.\n\n## Adjusted Prediction at Representative values (APR)\n\nThe APR lets you choose specific values for one or more covariates and compute predictions at those profiles. You pass a dictionary to `atexog` where keys are variable names and values are lists of levels:", "id": "1928a89e" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.predict(atexog={\"age\": [25, 45, 65]})", "id": "b3b1c647" }, { "cell_type": "markdown", "metadata": {}, "source": "The APR answers: \"What is the predicted probability of voting at ages 25, 45, and 65?\" Each row is a separate prediction at the specified age, with all other variables held at their observed values (averaged in the AAP style).\n\n## Grids with multiple variables\n\nYou can vary multiple variables simultaneously. smmargins expands the combinations into a full grid. Here we vary age in 10-year increments and gender:", "id": "819c40ed" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.predict(atexog={\"age\": list(range(20, 91, 10)), \"female\": [0, 1]})", "id": "243cf356" }, { "cell_type": "markdown", "metadata": {}, "source": "This produces 16 rows (8 age levels times 2 gender levels). Each row is a prediction at a specific age-gender profile, with education and income averaged over the sample. This table is ideal for plotting: you can plot `estimate` against `age`, grouping by `female`, and add `conf_low` and `conf_high` as error bands.\n\n## AAP versus APM versus APR: when to use each\n\n| Type | Code | Use when |\n|------|------|----------|\n| AAP | `M.predict()` | You want the average prediction in your sample |\n| APM | `M.predict(at=\"mean\")` | You want a single representative individual |\n| APR | `M.predict(atexog={...})` | You want predictions at specific covariate profiles |\n\n## Recap\n\nIn this tutorial we learned three types of adjusted predictions:\n\n1. **AAP** averages predictions across the observed sample\n2. **APM** predicts at the mean of all covariates\n3. **APR** predicts at user-specified covariate profiles using `atexog`\n\nThe `atexog` parameter accepts dictionaries of variable-value lists and expands them into a full grid. These grid tables are the building blocks for plots, which we cover in Tutorial 6.\n\n## Next steps\n\n- Learn about different types of marginal effects in {doc}`Tutorial 3: Marginal Effects `\n- See how to turn prediction grids into plots in {doc}`Tutorial 6: Counterfactuals and Plotting `\n- Read the reference for the {doc}`predict() method `\n- Understand why link functions matter in {doc}`Explanation: Scales and Link Functions `\n- Learn about advanced counterfactual scenarios in {doc}`How-To: Counterfactual Predictions `", "id": "4aaf52d3" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }