{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": "# Tutorial 1: Getting Started with smmargins\n\nWelcome to smmargins. In this tutorial we will install the package, fit a logistic regression model, and compute our first adjusted predictions and marginal effects. No prior knowledge of marginal effects is assumed.\n\n## What you will learn\n\n- How to install smmargins\n- How to create a `Margins` object from a fitted StatsModels model\n- How to compute average adjusted predictions\n- How to compute average marginal effects\n- How to read a summary table\n\n## Step 1: Installation\n\nInstall smmargins from PyPI:", "id": "daa91bab" }, { "cell_type": "markdown", "metadata": {}, "source": "```\npip install smmargins\n```", "id": "47008e37" }, { "cell_type": "markdown", "metadata": {}, "source": "This tutorial also requires StatsModels and pandas, which are installed automatically as dependencies:", "id": "8b33a6d3" }, { "cell_type": "markdown", "metadata": {}, "source": "```\npip install statsmodels pandas\n```", "id": "ca2e709f" }, { "cell_type": "markdown", "metadata": {}, "source": "## Step 2: Set up the data and fit a model\n\nWe will use a simulated dataset about voter turnout. The dataset contains five variables:\n\n- `age`: age in years\n- `income`: annual income in dollars\n- `educ`: education level (`\"hs\"`, `\"college\"`, or `\"grad\"`)\n- `female`: binary indicator (0 = male, 1 = female)\n- `voted`: binary outcome (0 = did not vote, 1 = voted)\n\nLet us simulate the data and fit a logistic regression model:", "id": "4e0ede63" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "import numpy as np\nimport pandas as pd\nimport statsmodels.formula.api as smf\nfrom smmargins import Margins\n\nrng = np.random.default_rng(7)\nN = 5_000\ndf = pd.DataFrame({\n \"age\": rng.normal(45, 12, N).clip(18, 90),\n \"income\": rng.lognormal(10.5, 0.4, N),\n \"educ\": rng.choice([\"hs\", \"college\", \"grad\"], N, p=[0.4, 0.4, 0.2]),\n \"female\": rng.integers(0, 2, N),\n})\neta = (-4.0 + 0.05 * df[\"age\"] + 0.00001 * df[\"income\"]\n + 0.8 * (df[\"educ\"] == \"college\") + 1.4 * (df[\"educ\"] == \"grad\")\n + 0.3 * df[\"female\"] - 0.0004 * df[\"age\"] * df[\"female\"])\ndf[\"voted\"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)\n\nfit = smf.logit(\"voted ~ age + income + C(educ) + female + age:female\", data=df).fit(disp=False)", "id": "f3f71577" }, { "cell_type": "markdown", "metadata": {}, "source": "The model estimates the probability of voting as a function of age, income, education, gender, and an interaction between age and female. The `disp=False` argument suppresses convergence output.\n\n## Step 3: Create a Margins object\n\nAll post-estimation work in smmargins starts with a `Margins` object. We pass our fitted model to it:", "id": "9a3a3e33" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M = Margins(fit)", "id": "1c21997d" }, { "cell_type": "markdown", "metadata": {}, "source": "The `Margins` object stores the fitted model and prepares the data for computing predictions and marginal effects. We will use this same `M` object throughout the remaining tutorials.\n\n## Step 4: Compute average adjusted predictions\n\nAn adjusted prediction is the predicted probability of voting for each observation, averaged over the dataset. By default, `predict()` returns the Average Adjusted Prediction (AAP):", "id": "7cd7525f" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.predict()", "id": "17b813a8" }, { "cell_type": "markdown", "metadata": {}, "source": "The output is a DataFrame with one row. The `estimate` column shows that the average predicted probability of voting across all 5,000 simulated individuals is about 0.45. The `std_err`, `conf_low`, and `conf_high` columns quantify the uncertainty around that estimate.\n\n## Step 5: Compute a marginal effect\n\nA marginal effect tells us how the predicted probability of voting changes when a predictor changes. For a continuous variable like `age`, `dydx(\"age\")` computes the average marginal effect (AME): the average rate of change in the predicted probability with respect to age, evaluated at each observation's actual covariate values:", "id": "d70289e7" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"age\")", "id": "714b5f8c" }, { "cell_type": "markdown", "metadata": {}, "source": "The estimate of 0.0125 means that, on average across the sample, a one-year increase in age is associated with approximately a 1.25 percentage point increase in the probability of voting. This is an average across all individuals, each evaluated at their own values of income, education, and gender.\n\n## Step 6: View a formatted summary\n\nFor a cleaner view, call `.summary()` on any result:", "id": "ca60c9be" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"age\").summary()", "id": "7506951d" }, { "cell_type": "markdown", "metadata": {}, "source": "The summary method formats the output for readability while preserving all the information.\n\n## Step 7: Try another variable\n\nLet us also look at the effect of gender, which is a binary variable:", "id": "001d3ea9" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "M.dydx(\"female\").summary()", "id": "4eac8e3c" }, { "cell_type": "markdown", "metadata": {}, "source": "For binary variables, the contrast is `1 - 0`, showing the change in predicted probability when moving from male to female. Here the estimate is negative, indicating that being female is associated with a lower predicted probability of voting in this simulated dataset.\n\n## Recap\n\nIn this tutorial we:\n\n1. Installed smmargins\n2. Simulated a voter turnout dataset\n3. Fit a logistic regression model\n4. Created a `Margins` object\n5. Computed an average adjusted prediction\n6. Computed average marginal effects for a continuous variable (`age`) and a binary variable (`female`)\n7. Viewed formatted summaries\n\nThese three operations `Margins()`, `predict()`, and `dydx()` form the foundation of everything that follows.\n\n## Next steps\n\n- Learn the different types of adjusted predictions in {doc}`Tutorial 2: Adjusted Predictions `\n- Explore the full capabilities of marginal effects in {doc}`Tutorial 3: Marginal Effects `\n- Consult the reference documentation for the {doc}`Margins class ` and the {doc}`dydx() method `\n- Learn about robust and clustered standard errors in {doc}`How-To: Robust and Clustered Standard Errors `\n- Verify your results against R in {doc}`How-To: Verify Against R `", "id": "16397c8f" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }