Tutorial 1: Getting Started with smmargins¶
Welcome to smmargins. In this tutorial we will install the package, fit a logistic regression model, and compute our first adjusted predictions and marginal effects. No prior knowledge of marginal effects is assumed.
What you will learn¶
How to install smmargins
How to create a
Marginsobject from a fitted StatsModels modelHow to compute average adjusted predictions
How to compute average marginal effects
How to read a summary table
Step 1: Installation¶
Install smmargins from PyPI:
pip install smmargins
This tutorial also requires StatsModels and pandas, which are installed automatically as dependencies:
pip install statsmodels pandas
Step 2: Set up the data and fit a model¶
We will use a simulated dataset about voter turnout. The dataset contains five variables:
age: age in yearsincome: annual income in dollarseduc: education level ("hs","college", or"grad")female: binary indicator (0 = male, 1 = female)voted: binary outcome (0 = did not vote, 1 = voted)
Let us simulate the data and fit a logistic regression model:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins
rng = np.random.default_rng(7)
N = 5_000
df = pd.DataFrame({
"age": rng.normal(45, 12, N).clip(18, 90),
"income": rng.lognormal(10.5, 0.4, N),
"educ": rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
"female": rng.integers(0, 2, N),
})
eta = (-4.0 + 0.05 * df["age"] + 0.00001 * df["income"]
+ 0.8 * (df["educ"] == "college") + 1.4 * (df["educ"] == "grad")
+ 0.3 * df["female"] - 0.0004 * df["age"] * df["female"])
df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)
fit = smf.logit("voted ~ age + income + C(educ) + female + age:female", data=df).fit(disp=False)
The model estimates the probability of voting as a function of age, income, education, gender, and an interaction between age and female. The disp=False argument suppresses convergence output.
Step 3: Create a Margins object¶
All post-estimation work in smmargins starts with a Margins object. We pass our fitted model to it:
M = Margins(fit)
The Margins object stores the fitted model and prepares the data for computing predictions and marginal effects. We will use this same M object throughout the remaining tutorials.
Step 4: Compute average adjusted predictions¶
An adjusted prediction is the predicted probability of voting for each observation, averaged over the dataset. By default, predict() returns the Average Adjusted Prediction (AAP):
M.predict()
prediction std err z P>|z| [95% Conf. Interval]
AAP 0.3622 0.00634 57.128076 0.0 0.349774 0.374626
The output is a DataFrame with one row. The estimate column shows that the average predicted probability of voting across all 5,000 simulated individuals is about 0.45. The std_err, conf_low, and conf_high columns quantify the uncertainty around that estimate.
Step 5: Compute a marginal effect¶
A marginal effect tells us how the predicted probability of voting changes when a predictor changes. For a continuous variable like age, dydx("age") computes the average marginal effect (AME): the average rate of change in the predicted probability with respect to age, evaluated at each observation’s actual covariate values:
M.dydx("age")
dy/dx std err z P>|z| [95% Conf. Interval]
dage 0.010118 0.000505 20.037431 2.598390e-89 0.009128 0.011108
The estimate of 0.0125 means that, on average across the sample, a one-year increase in age is associated with approximately a 1.25 percentage point increase in the probability of voting. This is an average across all individuals, each evaluated at their own values of income, education, and gender.
Step 6: View a formatted summary¶
For a cleaner view, call .summary() on any result:
M.dydx("age").summary()
| dy/dx | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| dage | 0.010118 | 0.000505 | 20.037431 | 2.598390e-89 | 0.009128 | 0.011108 |
The summary method formats the output for readability while preserving all the information.
Step 7: Try another variable¶
Let us also look at the effect of gender, which is a binary variable:
M.dydx("female").summary()
| contrast | std err | z | P>|z| | [95% Conf. | Interval] | |
|---|---|---|---|---|---|---|
| female: 1 vs 0 | 0.062486 | 0.012681 | 4.927392 | 8.333442e-07 | 0.037631 | 0.087341 |
For binary variables, the contrast is 1 - 0, showing the change in predicted probability when moving from male to female. Here the estimate is negative, indicating that being female is associated with a lower predicted probability of voting in this simulated dataset.
Recap¶
In this tutorial we:
Installed smmargins
Simulated a voter turnout dataset
Fit a logistic regression model
Created a
MarginsobjectComputed an average adjusted prediction
Computed average marginal effects for a continuous variable (
age) and a binary variable (female)Viewed formatted summaries
These three operations Margins(), predict(), and dydx() form the foundation of everything that follows.
Next steps¶
Learn the different types of adjusted predictions in Tutorial 2: Adjusted Predictions
Explore the full capabilities of marginal effects in Tutorial 3: Marginal Effects
Consult the reference documentation for the Margins class and the dydx() method
Learn about robust and clustered standard errors in How-To: Robust and Clustered Standard Errors
Verify your results against R in How-To: Verify Against R