{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "# Tutorial 2: Adjusted Predictions at Different Points\n\nIn Tutorial 1 we computed a single average adjusted prediction. In practice, we often want predictions at specific covariate profiles: at the means, at representative values, or across a grid. This tutorial introduces three types of adjusted predictions and shows how to compute each.\n\n## What you will learn\n\n- Average Adjusted Prediction (AAP)\n- Adjusted Prediction at the Mean (APM)\n- Adjusted Prediction at Representative values (APR)\n- How to use the `atexog` parameter\n- How to build tables suitable for plotting\n\n## Setup\n\nWe continue with the same fitted model and `Margins` object from Tutorial 1:",
   "id": "3089d8e8"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "import numpy as np\nimport pandas as pd\nimport statsmodels.formula.api as smf\nfrom smmargins import Margins\n\nrng = np.random.default_rng(7)\nN = 5_000\ndf = pd.DataFrame({\n    \"age\":    rng.normal(45, 12, N).clip(18, 90),\n    \"income\": rng.lognormal(10.5, 0.4, N),\n    \"educ\":   rng.choice([\"hs\", \"college\", \"grad\"], N, p=[0.4, 0.4, 0.2]),\n    \"female\": rng.integers(0, 2, N),\n})\neta = (-4.0 + 0.05 * df[\"age\"] + 0.00001 * df[\"income\"]\n       + 0.8 * (df[\"educ\"] == \"college\") + 1.4 * (df[\"educ\"] == \"grad\")\n       + 0.3 * df[\"female\"] - 0.0004 * df[\"age\"] * df[\"female\"])\ndf[\"voted\"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)\n\nfit = smf.logit(\"voted ~ age + income + C(educ) + female + age:female\", data=df).fit(disp=False)\nM = Margins(fit)",
   "id": "d6a9a7bf"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Average Adjusted Prediction (AAP)\n\nThe AAP is the default. It computes the predicted probability for every observation in the dataset using each person's actual covariate values, then averages across all observations:",
   "id": "48aae78e"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.predict()",
   "id": "07adf8b1"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The AAP answers: \"What is the average predicted probability of voting in our sample?\" It respects the joint distribution of all covariates in the data.\n\n## Adjusted Prediction at the Mean (APM)\n\nThe APM sets every continuous covariate to its sample mean and every categorical covariate to its reference level, then computes a single prediction. Use `at=\"mean\"`:",
   "id": "b67df4d0"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.predict(at=\"mean\")",
   "id": "fe4dac39"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The APM answers: \"What is the predicted probability of voting for an average person?\" This is a single representative individual, not an average across the sample. Note that the APM (0.4387) differs slightly from the AAP (0.4521) because the logistic function is nonlinear.\n\n## Adjusted Prediction at Representative values (APR)\n\nThe APR lets you choose specific values for one or more covariates and compute predictions at those profiles. You pass a dictionary to `atexog` where keys are variable names and values are lists of levels:",
   "id": "1928a89e"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.predict(atexog={\"age\": [25, 45, 65]})",
   "id": "b3b1c647"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The APR answers: \"What is the predicted probability of voting at ages 25, 45, and 65?\" Each row is a separate prediction at the specified age, with all other variables held at their observed values (averaged in the AAP style).\n\n## Grids with multiple variables\n\nYou can vary multiple variables simultaneously. smmargins expands the combinations into a full grid. Here we vary age in 10-year increments and gender:",
   "id": "819c40ed"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.predict(atexog={\"age\": list(range(20, 91, 10)), \"female\": [0, 1]})",
   "id": "243cf356"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "This produces 16 rows (8 age levels times 2 gender levels). Each row is a prediction at a specific age-gender profile, with education and income averaged over the sample. This table is ideal for plotting: you can plot `estimate` against `age`, grouping by `female`, and add `conf_low` and `conf_high` as error bands.\n\n## AAP versus APM versus APR: when to use each\n\n| Type | Code | Use when |\n|------|------|----------|\n| AAP | `M.predict()` | You want the average prediction in your sample |\n| APM | `M.predict(at=\"mean\")` | You want a single representative individual |\n| APR | `M.predict(atexog={...})` | You want predictions at specific covariate profiles |\n\n## Recap\n\nIn this tutorial we learned three types of adjusted predictions:\n\n1. **AAP** averages predictions across the observed sample\n2. **APM** predicts at the mean of all covariates\n3. **APR** predicts at user-specified covariate profiles using `atexog`\n\nThe `atexog` parameter accepts dictionaries of variable-value lists and expands them into a full grid. These grid tables are the building blocks for plots, which we cover in Tutorial 6.\n\n## Next steps\n\n- Learn about different types of marginal effects in {doc}`Tutorial 3: Marginal Effects </tutorials/marginal_effects>`\n- See how to turn prediction grids into plots in {doc}`Tutorial 6: Counterfactuals and Plotting </tutorials/counterfactuals_and_plotting>`\n- Read the reference for the {doc}`predict() method </api>`\n- Understand why link functions matter in {doc}`Explanation: Scales and Link Functions </explanations/scales_link_functions>`\n- Learn about advanced counterfactual scenarios in {doc}`How-To: Counterfactual Predictions </howto/counterfactual_predictions>`",
   "id": "4aaf52d3"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}