Tutorial 1: Getting Started with smmargins

Welcome to smmargins. In this tutorial we will install the package, fit a logistic regression model, and compute our first adjusted predictions and marginal effects. No prior knowledge of marginal effects is assumed.

What you will learn

  • How to install smmargins

  • How to create a Margins object from a fitted StatsModels model

  • How to compute average adjusted predictions

  • How to compute average marginal effects

  • How to read a summary table

Step 1: Installation

Install smmargins from PyPI:

pip install smmargins

This tutorial also requires StatsModels and pandas, which are installed automatically as dependencies:

pip install statsmodels pandas

Step 2: Set up the data and fit a model

We will use a simulated dataset about voter turnout. The dataset contains five variables:

  • age: age in years

  • income: annual income in dollars

  • educ: education level ("hs", "college", or "grad")

  • female: binary indicator (0 = male, 1 = female)

  • voted: binary outcome (0 = did not vote, 1 = voted)

Let us simulate the data and fit a logistic regression model:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from smmargins import Margins

rng = np.random.default_rng(7)
N = 5_000
df = pd.DataFrame({
    "age":    rng.normal(45, 12, N).clip(18, 90),
    "income": rng.lognormal(10.5, 0.4, N),
    "educ":   rng.choice(["hs", "college", "grad"], N, p=[0.4, 0.4, 0.2]),
    "female": rng.integers(0, 2, N),
})
eta = (-4.0 + 0.05 * df["age"] + 0.00001 * df["income"]
       + 0.8 * (df["educ"] == "college") + 1.4 * (df["educ"] == "grad")
       + 0.3 * df["female"] - 0.0004 * df["age"] * df["female"])
df["voted"] = (rng.uniform(0, 1, N) < 1 / (1 + np.exp(-eta))).astype(int)

fit = smf.logit("voted ~ age + income + C(educ) + female + age:female", data=df).fit(disp=False)

The model estimates the probability of voting as a function of age, income, education, gender, and an interaction between age and female. The disp=False argument suppresses convergence output.

Step 3: Create a Margins object

All post-estimation work in smmargins starts with a Margins object. We pass our fitted model to it:

M = Margins(fit)

The Margins object stores the fitted model and prepares the data for computing predictions and marginal effects. We will use this same M object throughout the remaining tutorials.

Step 4: Compute average adjusted predictions

An adjusted prediction is the predicted probability of voting for each observation, averaged over the dataset. By default, predict() returns the Average Adjusted Prediction (AAP):

M.predict()
     prediction  std err          z  P>|z|  [95% Conf.  Interval]
AAP      0.3622  0.00634  57.128076    0.0    0.349774   0.374626

The output is a DataFrame with one row. The estimate column shows that the average predicted probability of voting across all 5,000 simulated individuals is about 0.45. The std_err, conf_low, and conf_high columns quantify the uncertainty around that estimate.

Step 5: Compute a marginal effect

A marginal effect tells us how the predicted probability of voting changes when a predictor changes. For a continuous variable like age, dydx("age") computes the average marginal effect (AME): the average rate of change in the predicted probability with respect to age, evaluated at each observation’s actual covariate values:

M.dydx("age")
         dy/dx   std err          z         P>|z|  [95% Conf.  Interval]
dage  0.010118  0.000505  20.037431  2.598390e-89    0.009128   0.011108

The estimate of 0.0125 means that, on average across the sample, a one-year increase in age is associated with approximately a 1.25 percentage point increase in the probability of voting. This is an average across all individuals, each evaluated at their own values of income, education, and gender.

Step 6: View a formatted summary

For a cleaner view, call .summary() on any result:

M.dydx("age").summary()
dy/dx std err z P>|z| [95% Conf. Interval]
dage 0.010118 0.000505 20.037431 2.598390e-89 0.009128 0.011108

The summary method formats the output for readability while preserving all the information.

Step 7: Try another variable

Let us also look at the effect of gender, which is a binary variable:

M.dydx("female").summary()
contrast std err z P>|z| [95% Conf. Interval]
female: 1 vs 0 0.062486 0.012681 4.927392 8.333442e-07 0.037631 0.087341

For binary variables, the contrast is 1 - 0, showing the change in predicted probability when moving from male to female. Here the estimate is negative, indicating that being female is associated with a lower predicted probability of voting in this simulated dataset.

Recap

In this tutorial we:

  1. Installed smmargins

  2. Simulated a voter turnout dataset

  3. Fit a logistic regression model

  4. Created a Margins object

  5. Computed an average adjusted prediction

  6. Computed average marginal effects for a continuous variable (age) and a binary variable (female)

  7. Viewed formatted summaries

These three operations Margins(), predict(), and dydx() form the foundation of everything that follows.

Next steps