{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": "# Tutorial 5: Difference-in-Differences\n\nDifference-in-differences (DiD) is a popular design for causal inference. In a 2\u00d72 DiD, we compare the change in outcomes for a treated group to the change for a control group. This tutorial shows how to compute DiD on nonlinear (logit) models using smmargins.\n\n## What you will learn\n\n- How to set up a DiD design with a logit outcome model\n- How to compute cell predictions, simple effects, and the DiD contrast\n- How to compute profile-specific DiD estimates\n- How to interpret the results\n\n## The DiD design\n\nIn our example, we have:\n\n- `group`: treatment group (`\"A\"` = control, `\"B\"` = treated)\n- `preexist_Y`: pre-existing condition indicator (0 = without condition, 1 = with condition)\n- `condition_X`: binary outcome (presence of a health condition)\n- `age`: age in years\n- `female`: gender indicator\n\nThe pre-existing condition (`preexist_Y`) serves as our \"time\" dimension: individuals with the condition represent the \"post\" period, and those without represent the \"pre\" period. The interaction between `group` and `preexist_Y` captures the DiD effect.\n\n## Step 1: Set up the data and model", "id": "4e7b8fe9" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "import numpy as np\nimport pandas as pd\nimport statsmodels.formula.api as smf\nfrom smmargins import Margins\n\nrng = np.random.default_rng(42)\nN = 6_000\ndf_did = pd.DataFrame({\n \"group\": rng.choice([\"A\", \"B\"], N, p=[0.55, 0.45]),\n \"preexist_Y\": rng.integers(0, 2, N),\n \"age\": rng.normal(55, 15, N).clip(18, 95),\n \"female\": rng.integers(0, 2, N),\n})\neta = (-3.5 + 0.04*df_did[\"age\"] - 0.3*df_did[\"female\"]\n + 0.5*(df_did[\"group\"]==\"B\") + 1.1*df_did[\"preexist_Y\"]\n + 0.8*(df_did[\"group\"]==\"B\")*df_did[\"preexist_Y\"])\ndf_did[\"condition_X\"] = (rng.uniform(0,1,N) < 1/(1+np.exp(-eta))).astype(int)\n\nfit_did = smf.logit(\"condition_X ~ C(group) + preexist_Y + C(group):preexist_Y + age + female\",\n data=df_did).fit(disp=False)\nM_did = Margins(fit_did)", "id": "2a6ab9cc" }, { "cell_type": "markdown", "metadata": {}, "source": "The model includes an interaction between group and pre-existing condition, which captures the DiD effect. The coefficient on `C(group)[T.B]:preexist_Y` is the logit-scale interaction.\n\n## Step 2: Compute the DiD\n\nCall the `did()` method with the group variable, the condition (\"time\") variable, and their levels:", "id": "71ff0604" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "did = M_did.did(\"group\", \"preexist_Y\",\n group_levels=[\"A\", \"B\"],\n condition_levels=[0, 1])", "id": "ea28cf85" }, { "cell_type": "markdown", "metadata": {}, "source": "The `did` object contains three components:\n\n1. `did.cells`: Predictions in each of the four cells\n2. `did.simple_effects`: First differences (simple effects)\n3. `did.did`: The difference-in-differences contrast\n\n## Step 3: View the four cells", "id": "4bafafe3" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "did.cells.summary()", "id": "6c985738" }, { "cell_type": "markdown", "metadata": {}, "source": "These are the predicted probabilities of `condition_X` in each of the four group-condition combinations. Notice that the outcome is higher for those with `preexist_Y = 1` in both groups, and higher in group B overall.\n\n## Step 4: View the simple effects\n\nThe simple effects show the change from `preexist_Y = 0` to `preexist_Y = 1` within each group:", "id": "b144f465" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "did.simple_effects.summary()", "id": "94652d87" }, { "cell_type": "markdown", "metadata": {}, "source": "In group A (control), the predicted probability increases by 25.5 percentage points. In group B (treated), it increases by 33.9 percentage points.\n\n## Step 5: View the DiD contrast\n\nThe DiD contrast subtracts the simple effect in group A from the simple effect in group B:", "id": "8489a8cd" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "did.did.summary()", "id": "b9de9d8a" }, { "cell_type": "markdown", "metadata": {}, "source": "The DiD estimate is 0.0835. This means that the increase in the predicted probability of `condition_X` associated with having `preexist_Y = 1` is 8.35 percentage points larger in group B than in group A. This is the causal effect of interest under the parallel trends assumption.\n\n## Step 6: Profile-specific DiD\n\nThe DiD estimate can vary by covariate profile. We can compute the DiD at specific values of age and gender using `atexog`:", "id": "e2725fa6" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "did_profile = M_did.did(\"group\", \"preexist_Y\",\n group_levels=[\"A\", \"B\"],\n condition_levels=[0, 1],\n atexog={\"age\": 60, \"female\": 0})", "id": "8082c9cf" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "did_profile.cells.summary()", "id": "f1d4cbf1" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "did_profile.did.summary()", "id": "9cae433c" }, { "cell_type": "markdown", "metadata": {}, "source": "For a 60-year-old male, the DiD estimate is 0.0938, slightly larger than the population-averaged estimate of 0.0835. This illustrates how the treatment effect can vary across covariate profiles.\n\n## Summary: the DiD workflow\n\n1. Fit a model with the group-condition interaction\n2. Call `M.did()` with group and condition variables\n3. Inspect `cells`, `simple_effects`, and `did` tables\n4. Use `atexog` for profile-specific estimates\n\n## Recap\n\nIn this tutorial we:\n\n1. Set up a 2\u00d72 DiD design with a logit outcome model\n2. Computed predictions in the four cells\n3. Computed simple effects (first differences) within each group\n4. Computed the DiD contrast (difference of differences)\n5. Showed how to get profile-specific DiD estimates with `atexog`\n\n## Next steps\n\n- Learn about counterfactual predictions and plotting in {doc}`Tutorial 6: Counterfactuals and Plotting `\n- Read the reference for the {doc}`did() method ` and {doc}`DiD result object `\n- Understand the Ai-Norton approach to DiD on nonlinear models in {doc}`Explanation: Ai-Norton DiD `\n- Learn about joint tests and pairwise comparisons in {doc}`How-To: Joint Tests and Pairwise Comparisons `", "id": "4e8c1594" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }