{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "# Tutorial 6: Counterfactual Predictions and Plotting\n\nIn previous tutorials we computed predictions and marginal effects at specific covariate profiles using `atexog`. This tutorial introduces two additional tools:\n\n- The `values=` DSL for specifying counterfactual scenarios with reducers, callables, and `Expr` transforms.\n- Built-in plotting functions (`plot_predictions`, `plot_slopes`, `plot_comparisons`) that render prediction curves with correct confidence bands.\n\n## What you will learn\n\n- `values=` with percentile reducers (`\"p25\"`, `\"p50\"`, `\"p75\"`)\n- `values=` with `Expr` for mathematical counterfactuals\n- `Margins.contrast` for joint-SE treatment comparisons\n- `plot_predictions`, `plot_slopes`, `plot_comparisons`\n- Simultaneous vs pointwise confidence bands",
   "id": "97c2680f"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Setup\n\nWe continue with the same voter-turnout model from earlier tutorials.",
   "id": "7ec1ec11"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport statsmodels.formula.api as smf\nfrom smmargins import Margins, Expr\nfrom smmargins.plot import plot_predictions, plot_slopes, plot_comparisons\n\nrng = np.random.default_rng(7)\nN = 5_000\ndf = pd.DataFrame({\n    \"age\":    rng.normal(45, 12, N).clip(18, 90),\n    \"income\": rng.lognormal(10.5, 0.4, N),\n    \"educ\":   rng.choice([\"hs\", \"college\", \"grad\"], N, p=[0.4, 0.4, 0.2]),\n    \"female\": rng.integers(0, 2, N),\n})\neta = (-4.0 + 0.05 * df[\"age\"] + 0.00001 * df[\"income\"]\n       + 0.8 * (df[\"educ\"] == \"college\") + 1.4 * (df[\"educ\"] == \"grad\")\n       + 0.3 * df[\"female\"] - 0.0004 * df[\"age\"] * df[\"female\"])\ndf[\"voted\"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)\n\nfit = smf.logit(\"voted ~ age + income + C(educ) + female + age:female\", data=df).fit(disp=False)\nM = Margins(fit)",
   "id": "2ecad0a4"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Counterfactual predictions with `values=`\n\nThe `values=` parameter is a per-variable DSL. For each key you can supply:\n\n| Kind | Example | Effect |\n|------|---------|--------|\n| Scalar | `{\"age\": 45}` | Fix `age` at 45 for all rows |\n| Sequence | `{\"age\": [30, 45, 60]}` | Cartesian-product grid over three ages |\n| Reducer string | `{\"age\": \"mean\"}` | Fix at the column mean |\n| Percentile | `{\"income\": \"p25\"}` | Fix at the 25th percentile |\n| Callable | `{\"income\": lambda df: df.income * 1.1}` | Row-varying transform |\n| `Expr` | `{\"income\": Expr(\"income * 1.1\")}` | Same via `df.eval` |\n\n`default_values=` controls how unspecified columns are handled (default: `\"asobserved\"`).\n\n### Percentile reducers\n\nPredict at the 25th, 50th, and 75th income percentiles, leaving all other variables as observed:",
   "id": "096399af"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "for pct in [25, 50, 75]:\n    res = M.predict(values={\"income\": f\"p{pct}\"})\n    q_val = float(np.percentile(df[\"income\"], pct))\n    print(f\"income p{pct} (${q_val:,.0f}): P(vote) = {res.estimate[0]:.4f} \"\n          f\"[{res.ci_lower[0]:.4f}, {res.ci_upper[0]:.4f}]\")",
   "id": "6b4f68ba"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "To hold all other numeric variables at their mean, use `default_values=\"mean\"`.\n\nA sequence in `values=` creates a Cartesian-product grid \u2014 here, three income grid points with everything else at the mean:",
   "id": "7809989c"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "income_grid = [float(np.percentile(df[\"income\"], p)) for p in [25, 50, 75]]\nres_grid = M.predict(values={\"income\": income_grid}, default_values=\"mean\")\nfor lab, est, lo, hi in zip(res_grid.labels, res_grid.estimate, res_grid.ci_lower, res_grid.ci_upper):\n    print(f\"{lab}: {est:.4f} [{lo:.4f}, {hi:.4f}]\")",
   "id": "eba528e7"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "### Mathematical counterfactuals with `Expr`\n\n`Expr` evaluates a pandas expression against the data frame row by row. Here we ask: what if everyone's income were 10% higher?",
   "id": "cdbad464"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "res_baseline = M.predict()\nres_10pct    = M.predict(values={\"income\": Expr(\"income * 1.10\")})\n\nprint(f\"Baseline:     {res_baseline.estimate[0]:.4f}\")\nprint(f\"+10% income:  {res_10pct.estimate[0]:.4f}\")\nprint(f\"Increase:     {res_10pct.estimate[0] - res_baseline.estimate[0]:+.4f}\")",
   "id": "845085ea"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "### Joint-SE contrasts with `Margins.contrast`\n\nNa\u00efvely subtracting two predictions loses the off-diagonal covariance between arms (they share the same $\\hat{\\beta}$). `Margins.contrast` computes the PATE with the correct joint SE:",
   "id": "2f826050"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "joint = M.contrast(a={\"female\": 1}, b={\"female\": 0})\nprint(joint.summary())\n# Row 3 (A - B) is the gender gap with the correct joint SE.",
   "id": "c2d9c6f4"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Plotting: `plot_predictions`\n\n`plot_predictions` auto-grids the conditioning variable over its observed range (50 points for numerics), computes predictions at each grid point, and renders the curve with a 95% pointwise CI.",
   "id": "5c9dd9bc"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "fig, ax = plot_predictions(M, \"age\")\nax.set_title(\"Predicted P(voted) across age\");",
   "id": "1c413b2a"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "### Faceting with `by=`\n\nPass a categorical column to `by=` for one line per level:",
   "id": "43e387ff"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "fig, ax = plot_predictions(M, \"age\", by=\"educ\")\nax.set_title(\"Predicted P(voted) across age, by education\");",
   "id": "1f559fe4"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "### Simultaneous vs pointwise confidence bands\n\nThe default CI is **pointwise**: each grid point has 95% coverage individually. For claims about the curve as a whole, use a **simultaneous** band. `ci_method=\"sup-t\"` is tightest but requires a draw-based VCE.",
   "id": "58cc2356"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "fig, axes = plt.subplots(1, 2, figsize=(11, 4), sharey=True)\n\nplot_predictions(M, \"age\", ax=axes[0])\naxes[0].set_title(\"Pointwise 95% CI\")\n\nplot_predictions(\n    M, \"age\",\n    vce=\"simulation\", n_sims=2000, sim_seed=0,\n    ci_method=\"sup-t\",\n    ax=axes[1],\n)\naxes[1].set_title(\"Simultaneous (sup-t) 95% band\")\nfig.tight_layout();",
   "id": "b9c964b9"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Plotting slopes: `plot_slopes`\n\nThe first argument is the variable whose slope (`dy/dx`) you want; the second is the x-axis conditioning variable.",
   "id": "5abfc6ea"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "fig, ax = plot_slopes(M, \"age\", \"income\")\nax.set_title(\"Marginal effect of age, across income\");",
   "id": "e50c3624"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Plotting contrasts: `plot_comparisons`\n\n`plot_comparisons` plots a discrete comparison (factor levels or a unit step for numerics) across a conditioning variable. It calls `Margins.contrast` internally \u2014 the CI uses the joint SE, not the na\u00efve sum of individual SEs.",
   "id": "fffb4833"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "fig, ax = plot_comparisons(M, \"female\", condition=\"age\")\nax.set_title(\"Gender gap in P(voted), across age\");",
   "id": "ae402197"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "### Custom grids\n\nPass a dict for `condition` to override the auto-grid:",
   "id": "7d61b3a9"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "fig, ax = plot_predictions(M, {\"age\": np.arange(20, 80, 5)})\nax.set_title(\"Predicted P(voted), 5-year age steps\");",
   "id": "2af97c5c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Recap\n\n| Feature | Call |\n|---------|------|\n| Percentile reducer | `M.predict(values={\"x\": \"p25\"})` |\n| Eval transform | `M.predict(values={\"x\": Expr(\"x * 1.1\")})` |\n| Grid + default mean | `M.predict(values={\"x\": [1, 2, 3]}, default_values=\"mean\")` |\n| Joint-SE contrast | `M.contrast(a={\"treat\": 1}, b={\"treat\": 0})` |\n| Prediction curve | `plot_predictions(M, \"x\")` |\n| Slope curve | `plot_slopes(M, \"x1\", \"x2\")` |\n| Contrast curve | `plot_comparisons(M, \"treat\", condition=\"x\")` |\n| Simultaneous band | `plot_predictions(M, \"x\", vce=\"simulation\", ci_method=\"sup-t\")` |\n\n## See also\n\n- How-to: {doc}`Counterfactual predictions </howto/counterfactual_predictions>`\n- Explanation: {doc}`Why contrast uses joint covariance </explanations/contrast_joint_covariance>`\n- API: `smmargins.Margins`, `smmargins.plot`",
   "id": "7a780601"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}