{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": "# Tutorial 3: Marginal Effects\n\nIn Tutorial 1 we computed a basic average marginal effect for age. This tutorial goes deeper. We will learn how to compute marginal effects for continuous and categorical variables, how to evaluate them at different points, and how to use them with interaction terms.\n\n## What you will learn\n\n- Average Marginal Effect (AME), Marginal Effect at the Mean (MEM), and Marginal Effect at Representative values (MER)\n- Discrete contrasts for categorical variables\n- Discrete change for dummy variables\n- Subgroup marginal effects via `atexog`\n\n## Setup\n\nWe continue with the same fitted model and `Margins` object from the previous tutorials:", "id": "270f55ff" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "import numpy as np\nimport pandas as pd\nimport statsmodels.formula.api as smf\nfrom smmargins import Margins\n\nrng = np.random.default_rng(7)\nN = 5_000\ndf = pd.DataFrame({\n \"age\": rng.normal(45, 12, N).clip(18, 90),\n \"income\": rng.lognormal(10.5, 0.4, N),\n \"educ\": rng.choice([\"hs\", \"college\", \"grad\"], N, p=[0.4, 0.4, 0.2]),\n \"female\": rng.integers(0, 2, N),\n})\neta = (-4.0 + 0.05 * df[\"age\"] + 0.00001 * df[\"income\"]\n + 0.8 * (df[\"educ\"] == \"college\") + 1.4 * (df[\"educ\"] == \"grad\")\n + 0.3 * df[\"female\"] - 0.0004 * df[\"age\"] * df[\"female\"])\ndf[\"voted\"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)\n\nfit = smf.logit(\"voted ~ age + income + C(educ) + female + age:female\", data=df).fit(disp=False)\nM = Margins(fit)", "id": "5db79f9a" }, { "cell_type": "markdown", "metadata": {}, "source": "## Continuous variables: AME, MEM, and MER\n\n### Average Marginal Effect (AME)\n\nThe AME is the default. It computes the marginal effect at each observation using that observation's actual covariate values, then averages:", "id": "12c47b95" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"age\").summary()", "id": "2d887851" }, { "cell_type": "markdown", "metadata": {}, "source": "The AME answers: \"What is the average effect of a one-year age increase across our sample?\"\n\n### Marginal Effect at the Mean (MEM)\n\nThe MEM sets all covariates to their means (or reference levels) and computes the marginal effect at that single point:", "id": "15414f32" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"age\", at=\"mean\").summary()", "id": "4a92705d" }, { "cell_type": "markdown", "metadata": {}, "source": "The MEM answers: \"What is the effect of age for an average individual?\"\n\n### Marginal Effect at Representative values (MER)\n\nThe MER lets you choose specific values. Here we compute the marginal effect of age at ages 25, 45, and 65:", "id": "536b2b7f" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"age\", atexog={\"age\": [25, 45, 65]}).summary()", "id": "960ccba8" }, { "cell_type": "markdown", "metadata": {}, "source": "The MER answers: \"How does the effect of age differ at ages 25, 45, and 65?\" Notice that the effect is slightly smaller at the extremes because the logistic curve flattens near 0 and 1.\n\n## Categorical variables: discrete contrasts\n\nFor categorical variables like `educ`, `dydx()` computes pairwise contrasts between levels. By default it compares each level to the reference level (the first level alphabetically, `\"college\"` in our case because `C(educ)` uses treatment coding):", "id": "16b3ff77" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"educ\").summary()", "id": "3d0570c8" }, { "cell_type": "markdown", "metadata": {}, "source": "Each row is the difference in predicted probability between two education levels, averaged across the sample. For example, individuals with a graduate degree have a predicted probability of voting that is about 14.2 percentage points higher than those with a high school diploma.\n\nYou can change the reference level:", "id": "0aa872f3" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"educ\", reference=\"hs\").summary()", "id": "e471f996" }, { "cell_type": "markdown", "metadata": {}, "source": "## Binary variables: discrete change\n\nFor binary (dummy) variables like `female`, `dydx()` computes the discrete change: the difference in predicted probability when moving from 0 to 1:", "id": "fde37c31" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"female\").summary()", "id": "7dbfb205" }, { "cell_type": "markdown", "metadata": {}, "source": "Being female is associated with a 1.8 percentage point decrease in the predicted probability of voting. This is a discrete change, not a derivative.\n\n## Subgroup marginal effects with interactions\n\nOur model includes an interaction between `age` and `female`. We can compute the marginal effect of age separately for men and women using `atexog`:", "id": "6d4cb944" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"age\", atexog={\"female\": [0, 1]}).summary()", "id": "598ec595" }, { "cell_type": "markdown", "metadata": {}, "source": "The marginal effect of age is slightly larger for males (0.0128) than for females (0.0122). The difference is small but the interaction term in the model allows these effects to differ.\n\nYou can also combine multiple `atexog` variables. Here we compute the effect of education by gender:", "id": "10e0b874" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"educ\", atexog={\"female\": [0, 1]}).summary()", "id": "f28432f6" }, { "cell_type": "markdown", "metadata": {}, "source": "## Summary table: when to use each type\n\n| Type | Code | Best for |\n|------|------|----------|\n| AME | `M.dydx(\"age\")` | Reporting a single main effect |\n| MEM | `M.dydx(\"age\", at=\"mean\")` | Comparing effects at a reference point |\n| MER | `M.dydx(\"age\", atexog={\"age\": [...]})` | Showing how effects vary |\n\n## Recap\n\nIn this tutorial we covered:\n\n1. **AME**: average marginal effect across the sample\n2. **MEM**: marginal effect at the mean covariate profile\n3. **MER**: marginal effect at user-specified values\n4. **Discrete contrasts** for categorical variables like `educ`\n5. **Discrete change** for binary variables like `female`\n6. **Subgroup effects** using `atexog` to condition on interaction partners\n\n## Next steps\n\n- Learn how to obtain different kinds of standard errors in {doc}`Tutorial 4: Inference and Standard Errors `\n- Read the reference for the {doc}`dydx() method `\n- Learn about elasticities in {doc}`How-To: Elasticities `\n- Learn about subgroup-specific marginal effects in {doc}`How-To: Subgroup-Specific Marginal Effects `\n- Learn about custom transforms and scales in {doc}`How-To: Custom Transforms and Scales `\n- Understand the distinction between discrete and continuous approaches in {doc}`Explanation: Discrete vs Continuous `", "id": "f04a1b15" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }