{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "# Tutorial 1: Getting Started with smmargins\n\nWelcome to smmargins. In this tutorial we will install the package, fit a logistic regression model, and compute our first adjusted predictions and marginal effects. No prior knowledge of marginal effects is assumed.\n\n## What you will learn\n\n- How to install smmargins\n- How to create a `Margins` object from a fitted StatsModels model\n- How to compute average adjusted predictions\n- How to compute average marginal effects\n- How to read a summary table\n\n## Step 1: Installation\n\nInstall smmargins from PyPI:",
   "id": "daa91bab"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "```\npip install smmargins\n```",
   "id": "47008e37"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "This tutorial also requires StatsModels and pandas, which are installed automatically as dependencies:",
   "id": "8b33a6d3"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "```\npip install statsmodels pandas\n```",
   "id": "ca2e709f"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Step 2: Set up the data and fit a model\n\nWe will use a simulated dataset about voter turnout. The dataset contains five variables:\n\n- `age`: age in years\n- `income`: annual income in dollars\n- `educ`: education level (`\"hs\"`, `\"college\"`, or `\"grad\"`)\n- `female`: binary indicator (0 = male, 1 = female)\n- `voted`: binary outcome (0 = did not vote, 1 = voted)\n\nLet us simulate the data and fit a logistic regression model:",
   "id": "4e0ede63"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "import numpy as np\nimport pandas as pd\nimport statsmodels.formula.api as smf\nfrom smmargins import Margins\n\nrng = np.random.default_rng(7)\nN = 5_000\ndf = pd.DataFrame({\n    \"age\":    rng.normal(45, 12, N).clip(18, 90),\n    \"income\": rng.lognormal(10.5, 0.4, N),\n    \"educ\":   rng.choice([\"hs\", \"college\", \"grad\"], N, p=[0.4, 0.4, 0.2]),\n    \"female\": rng.integers(0, 2, N),\n})\neta = (-4.0 + 0.05 * df[\"age\"] + 0.00001 * df[\"income\"]\n       + 0.8 * (df[\"educ\"] == \"college\") + 1.4 * (df[\"educ\"] == \"grad\")\n       + 0.3 * df[\"female\"] - 0.0004 * df[\"age\"] * df[\"female\"])\ndf[\"voted\"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)\n\nfit = smf.logit(\"voted ~ age + income + C(educ) + female + age:female\", data=df).fit(disp=False)",
   "id": "f3f71577"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The model estimates the probability of voting as a function of age, income, education, gender, and an interaction between age and female. The `disp=False` argument suppresses convergence output.\n\n## Step 3: Create a Margins object\n\nAll post-estimation work in smmargins starts with a `Margins` object. We pass our fitted model to it:",
   "id": "9a3a3e33"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M = Margins(fit)",
   "id": "1c21997d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The `Margins` object stores the fitted model and prepares the data for computing predictions and marginal effects. We will use this same `M` object throughout the remaining tutorials.\n\n## Step 4: Compute average adjusted predictions\n\nAn adjusted prediction is the predicted probability of voting for each observation, averaged over the dataset. By default, `predict()` returns the Average Adjusted Prediction (AAP):",
   "id": "7cd7525f"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.predict()",
   "id": "17b813a8"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The output is a DataFrame with one row. The `estimate` column shows that the average predicted probability of voting across all 5,000 simulated individuals is about 0.45. The `std_err`, `conf_low`, and `conf_high` columns quantify the uncertainty around that estimate.\n\n## Step 5: Compute a marginal effect\n\nA marginal effect tells us how the predicted probability of voting changes when a predictor changes. For a continuous variable like `age`, `dydx(\"age\")` computes the average marginal effect (AME): the average rate of change in the predicted probability with respect to age, evaluated at each observation's actual covariate values:",
   "id": "d70289e7"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.dydx(\"age\")",
   "id": "714b5f8c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The estimate of 0.0125 means that, on average across the sample, a one-year increase in age is associated with approximately a 1.25 percentage point increase in the probability of voting. This is an average across all individuals, each evaluated at their own values of income, education, and gender.\n\n## Step 6: View a formatted summary\n\nFor a cleaner view, call `.summary()` on any result:",
   "id": "ca60c9be"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.dydx(\"age\").summary()",
   "id": "7506951d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The summary method formats the output for readability while preserving all the information.\n\n## Step 7: Try another variable\n\nLet us also look at the effect of gender, which is a binary variable:",
   "id": "001d3ea9"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "M.dydx(\"female\").summary()",
   "id": "4eac8e3c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "For binary variables, the contrast is `1 - 0`, showing the change in predicted probability when moving from male to female. Here the estimate is negative, indicating that being female is associated with a lower predicted probability of voting in this simulated dataset.\n\n## Recap\n\nIn this tutorial we:\n\n1. Installed smmargins\n2. Simulated a voter turnout dataset\n3. Fit a logistic regression model\n4. Created a `Margins` object\n5. Computed an average adjusted prediction\n6. Computed average marginal effects for a continuous variable (`age`) and a binary variable (`female`)\n7. Viewed formatted summaries\n\nThese three operations `Margins()`, `predict()`, and `dydx()` form the foundation of everything that follows.\n\n## Next steps\n\n- Learn the different types of adjusted predictions in {doc}`Tutorial 2: Adjusted Predictions </tutorials/adjusted_predictions>`\n- Explore the full capabilities of marginal effects in {doc}`Tutorial 3: Marginal Effects </tutorials/marginal_effects>`\n- Consult the reference documentation for the {doc}`Margins class </api>` and the {doc}`dydx() method </api>`\n- Learn about robust and clustered standard errors in {doc}`How-To: Robust and Clustered Standard Errors </howto/robust_clustered_ses>`\n- Verify your results against R in {doc}`How-To: Verify Against R </howto/verify_against_r>`",
   "id": "16397c8f"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}