Simple Migration Flows with pytidycensus

This notebook demonstrates how to use the get_flows() function in pytidycensus to retrieve migration flow data from the Census Migration Flows API.

The Migration Flows API provides data on population movement between geographic areas based on American Community Survey (ACS) 5-year estimates.

import pytidycensus as tc
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pytidycensus as tc
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import LineString

# Set your Census API key
# tc.set_census_api_key("your_key_here")

Basic County-to-County Migration Flows

Let’s start with basic county-to-county migration flows for Texas:

# Get county-to-county migration flows for Texas
tx_flows = tc.get_flows(
    geography="county",
    state="TX",
    year=2018,
    output="wide"
)

print(f"Shape: {tx_flows.shape}")
print(f"Columns: {list(tx_flows.columns)}")
tx_flows.head()
Shape: (36641, 10)
Columns: ['GEOID1', 'GEOID2', 'FULL1_NAME', 'FULL2_NAME', 'MOVEDIN', 'MOVEDIN_M', 'MOVEDOUT', 'MOVEDOUT_M', 'MOVEDNET', 'MOVEDNET_M']
GEOID1 GEOID2 FULL1_NAME FULL2_NAME MOVEDIN MOVEDIN_M MOVEDOUT MOVEDOUT_M MOVEDNET MOVEDNET_M
0 48001 None Anderson County, Texas Africa 38 52.0 NaN NaN NaN NaN
1 48001 None Anderson County, Texas Asia 4 6.0 NaN NaN NaN NaN
2 48001 None Anderson County, Texas Central America 2 3.0 NaN NaN NaN NaN
3 48001 01089 Anderson County, Texas Madison County, Alabama 13 20.0 0.0 28.0 13.0 20.0
4 48001 02016 Anderson County, Texas Aleutians West Census Area, Alaska 0 31.0 7.0 9.0 -7.0 9.0

Understanding the Data Structure

The flow data includes several key variables:

  • MOVEDIN: Number of people who moved into the destination

  • MOVEDOUT: Number of people who moved out of the origin

  • MOVEDNET: Net migration (MOVEDIN - MOVEDOUT)

  • FULL1_NAME: Origin location name

  • FULL2_NAME: Destination location name

  • Variables ending in _M: Margin of error

if geometry=True:

  • centroid1: Origin centroid

  • centroid2: Destination centroid

# Look at the largest migration flows
largest_flows = tx_flows.nlargest(10, 'MOVEDIN')
print("Top 10 largest migration flows:")
largest_flows[['FULL1_NAME', 'FULL2_NAME', 'MOVEDIN', 'MOVEDIN_M']].head()
Top 10 largest migration flows:
FULL1_NAME FULL2_NAME MOVEDIN MOVEDIN_M
12973 Fort Bend County, Texas Harris County, Texas 20139 1842.0
30609 Tarrant County, Texas Dallas County, Texas 19149 1603.0
10162 Denton County, Texas Dallas County, Texas 18807 2114.0
15530 Harris County, Texas Asia 18170 1557.0
6588 Collin County, Texas Dallas County, Texas 17264 1567.0

Tidy Format for Analysis

The tidy format is better for analysis and visualization:

# Get the same data in tidy format
tx_flows_tidy = tc.get_flows(
    geography="county",
    state="TX", 
    year=2018,
    output="tidy"
)

print(f"Tidy format shape: {tx_flows_tidy.shape}")
tx_flows_tidy.head()
Tidy format shape: (109923, 7)
GEOID1 GEOID2 FULL1_NAME FULL2_NAME variable estimate moe
0 48001 None Anderson County, Texas Africa MOVEDIN 38.0 52.0
1 48001 None Anderson County, Texas Africa MOVEDOUT NaN NaN
2 48001 None Anderson County, Texas Africa MOVEDNET NaN NaN
3 48001 None Anderson County, Texas Asia MOVEDIN 4.0 6.0
4 48001 None Anderson County, Texas Asia MOVEDOUT NaN NaN
# Analyze migration patterns
migration_summary = tx_flows_tidy.groupby('variable')['estimate'].agg(['sum', 'mean', 'std'])
print("Migration flow summary:")
migration_summary
Migration flow summary:
sum mean std
variable
MOVEDIN 1920233.0 52.406676 384.593025
MOVEDNET 130846.0 3.637744 147.398580
MOVEDOUT 1575040.0 43.788818 347.139862

Migration Flows with Demographic Breakdowns

Note: Breakdown characteristics are only available for years 2006-2015.

# Get flows with age and sex breakdowns (2015 data)
ri_flows_breakdown = tc.get_flows(
    geography="county",
    breakdown=["AGE", "SEX"],
    breakdown_labels=True,
    state="RI",
    year=2015,  # Breakdown only available before 2016
    output="tidy"
)

print(f"With breakdowns shape: {ri_flows_breakdown.shape}")
print(f"Breakdown columns: {[col for col in ri_flows_breakdown.columns if 'label' in col]}")
ri_flows_breakdown.head()
With breakdowns shape: (63072, 11)
Breakdown columns: ['AGE_label', 'SEX_label']
GEOID1 GEOID2 FULL1_NAME FULL2_NAME AGE SEX AGE_label SEX_label variable estimate moe
0 44001 None Bristol County, Rhode Island Asia 00 00 All ages All sexes MOVEDIN 58.0 49.0
1 44001 None Bristol County, Rhode Island Asia 00 00 All ages All sexes MOVEDOUT NaN NaN
2 44001 None Bristol County, Rhode Island Asia 00 00 All ages All sexes MOVEDNET NaN NaN
3 44001 None Bristol County, Rhode Island Europe 00 00 All ages All sexes MOVEDIN 197.0 170.0
4 44001 None Bristol County, Rhode Island Europe 00 00 All ages All sexes MOVEDOUT NaN NaN
# Analyze migration by age group
if 'AGE_label' in ri_flows_breakdown.columns:
    age_migration = ri_flows_breakdown[ri_flows_breakdown['variable'] == 'MOVEDIN'].groupby('AGE_label')['estimate'].sum().sort_values(ascending=False)

    plt.figure(figsize=(12, 6))
    age_migration.plot(kind='bar')
    plt.title('Migration into Rhode Island Counties by Age Group (2015)')
    plt.xlabel('Age Group')
    plt.ylabel('Total People Moved In')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
../_images/de06d6e0d6cd07bd26e5d4d32a31a2de29234b9ee5ad7eb2d00af823c65f029b.png

Metropolitan Statistical Area Flows

Migration flows are also available at the MSA level (2013+):

# Get MSA-level migration flows
msa_flows = tc.get_flows(
    geography="metropolitan statistical area",
    year=2018,
    output="wide"
)

print(f"MSA flows shape: {msa_flows.shape}")
msa_flows.head()
MSA flows shape: (70428, 10)
GEOID1 GEOID2 FULL1_NAME FULL2_NAME MOVEDIN MOVEDIN_M MOVEDOUT MOVEDOUT_M MOVEDNET MOVEDNET_M
0 10180 None Abilene, TX Metro Area Outside Metro Area within U.S. or Puerto Rico 3883 508.0 2785.0 410.0 1098.0 684.0
1 10180 None Abilene, TX Metro Area Africa 134 152.0 NaN NaN NaN NaN
2 10180 None Abilene, TX Metro Area Asia 504 207.0 NaN NaN NaN NaN
3 10180 None Abilene, TX Metro Area Central America 56 42.0 NaN NaN NaN NaN
4 10180 None Abilene, TX Metro Area Europe 264 176.0 NaN NaN NaN NaN
# Find the largest MSA flows
largest_msa_flows = msa_flows.nlargest(10, 'MOVEDIN')
print("Largest MSA-to-MSA migration flows:")
for _, row in largest_msa_flows.iterrows():
    print(f"{row['FULL1_NAME']} > {row['FULL2_NAME']}: {row['MOVEDIN']:,} people")
Largest MSA-to-MSA migration flows:
Riverside-San Bernardino-Ontario, CA Metro Area > Los Angeles-Long Beach-Anaheim, CA Metro Area: 85,361 people
New York-Newark-Jersey City, NY-NJ-PA Metro Area > Asia: 65,239 people
Los Angeles-Long Beach-Anaheim, CA Metro Area > Asia: 57,528 people
Los Angeles-Long Beach-Anaheim, CA Metro Area > Riverside-San Bernardino-Ontario, CA Metro Area: 42,989 people
Dallas-Fort Worth-Arlington, TX Metro Area > Outside Metro Area within U.S. or Puerto Rico: 34,853 people
Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area > New York-Newark-Jersey City, NY-NJ-PA Metro Area: 31,621 people
Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area > Asia: 31,025 people
Miami-Fort Lauderdale-West Palm Beach, FL Metro Area > Caribbean: 30,633 people
New York-Newark-Jersey City, NY-NJ-PA Metro Area > Europe: 29,904 people
San Francisco-Oakland-Hayward, CA Metro Area > Asia: 29,747 people

Margin of Error and Confidence Levels

You can adjust the confidence level for margin of error calculations:

# Compare different confidence levels
flows_90 = tc.get_flows(geography="county", state="NY", year=2018, moe_level=90, output="wide")
flows_95 = tc.get_flows(geography="county", state="NY", year=2018, moe_level=95, output="wide")
flows_99 = tc.get_flows(geography="county", state="NY", year=2018, moe_level=99, output="wide")

print("Margin of error comparison for first flow:")
print(f"90% confidence: {flows_90['MOVEDIN_M'].iloc[0]:.1f}")
print(f"95% confidence: {flows_95['MOVEDIN_M'].iloc[0]:.1f}") 
print(f"99% confidence: {flows_99['MOVEDIN_M'].iloc[0]:.1f}")
Margin of error comparison for first flow:
90% confidence: 33.0
95% confidence: 39.3
99% confidence: 51.4

Geometry Integration for Mapping

We automatically include shapely Point for the origin centroid1 and destination centroid2.

To help you install the required modules to help with plotting and webmaps please install:

pip install pytidycensus[map]

Here we add geographic centroids for mapping migration flows for Rhode Island:

# Get flows with geometry
flows_geo = tc.get_flows(
    geography="county",
    state="RI",
    year=2018,
    geometry=True,
    output="wide"
)
flows_geo.tail()
/home/mmann1123/Documents/github/pytidycensus/pytidycensus/flows.py:654: UserWarning: Could not find centroids for 9 GEOIDs: ['09011', '09013', '09001', '02261', '09009'].... These flows will not have geometry data.
  warnings.warn(
GEOID1 GEOID2 FULL1_NAME FULL2_NAME MOVEDIN MOVEDIN_M MOVEDOUT MOVEDOUT_M MOVEDNET MOVEDNET_M centroid1 centroid2
1128 44009 55007 Washington County, Rhode Island Bayfield County, Wisconsin 0 30.0 6.0 5.0 -6.0 5.0 POINT (-71.62272 41.46965) POINT (-91.20137011438374 46.52300774226001)
1129 44009 55025 Washington County, Rhode Island Dane County, Wisconsin 0 30.0 9.0 13.0 -9.0 13.0 POINT (-71.62272 41.46965) POINT (-89.41818343109118 43.0673096742963)
1130 44009 55079 Washington County, Rhode Island Milwaukee County, Wisconsin 0 30.0 14.0 23.0 -14.0 23.0 POINT (-71.62272 41.46965) POINT (-87.96684239379597 43.00702311746134)
1131 44009 55101 Washington County, Rhode Island Racine County, Wisconsin 36 56.0 0.0 20.0 36.0 56.0 POINT (-71.62272 41.46965) POINT (-88.0613195061766 42.747515942441765)
1132 44009 72137 Washington County, Rhode Island Toa Baja Municipio, Puerto Rico 2 3.0 0.0 32.0 2.0 3.0 POINT (-71.62272 41.46965) POINT (-66.21454600086274 18.431092395056837)

We can use those points to construct LineString to show flows on a map. Here we look a movements to and from California:

import pytidycensus as tc
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import LineString
import contextily as ctx 

# Get flows with geometry
flows_geo = tc.get_flows(
    geography="county",
    state="CA",
    year=2018,
    geometry=True,
    output="wide"
)
/home/mmann1123/Documents/github/pytidycensus/pytidycensus/flows.py:654: UserWarning: Could not find centroids for 115 GEOIDs: ['0900352140', '0901372090', '0901301080', '0901171670', '0900566420'].... These flows will not have geometry data.
  warnings.warn(
lines = flows_geo.copy()
lines = lines.dropna(subset=['centroid1', 'centroid2'])

lines['geometry'] = lines.apply(lambda r: LineString([r['centroid1'], r['centroid2']]) , axis=1)
lines = gpd.GeoDataFrame(lines, geometry='geometry', crs=flows_geo.crs)

# project to web mercator for basemap
lines_web = lines.to_crs(epsg=3857)

# dramatic width scaling: use log1p to increase contrast, then map to [min_w, max_w]
min_w, max_w = 0.5, 10.0
vals = np.log1p(lines_web['MOVEDIN'].astype(float).fillna(0))
max_val = vals.max() if vals.max() > 0 else 1.0
lines_web['lw'] = np.interp(vals, [0, max_val], [min_w, max_w])

# draw small flows first, large flows last (so large flows sit on top)
lines_web_sorted = lines_web.sort_values('MOVEDIN', ascending=True)

fig, ax = plt.subplots(figsize=(12, 10))
lines_web_sorted[lines_web_sorted['MOVEDNET'] > 200].plot(
    ax=ax,
    linewidth=lines_web_sorted['lw'],
    column='MOVEDNET',            # color by net migration
    cmap='RdYlBu',
    alpha=0.85,
    legend=True,
    zorder=1
)

# plot origin/destination centroids on top
orig_pts = gpd.GeoDataFrame(geometry=lines[['centroid1']].rename(columns={'centroid1':'geometry'})['geometry'], crs=lines.crs).set_geometry('geometry').to_crs(epsg=3857)
dest_pts = gpd.GeoDataFrame(geometry=lines[['centroid2']].rename(columns={'centroid2':'geometry'})['geometry'], crs=lines.crs).set_geometry('geometry').to_crs(epsg=3857)

# basemap with fallbacks
try:
    ctx.add_basemap(ax, source=ctx.providers.Stamen.TonerLite)
except Exception:
    try:
        ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
    except Exception:
        ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

ax.set_axis_off()
plt.title('County-to-county migration flows (linewidth ~ log(MOVEDIN), color ~ MOVEDNET)', fontsize=14)
plt.tight_layout()
plt.show()
../_images/d3c328ce653100f9920267d4c170f8704c00342b70f4ff177e4e545f11591c96.png

Simple Flow Map Without Basemap

For a cleaner look without external basemap dependencies:

import pytidycensus as tc
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import LineString
import numpy as np

# Get flows
flows_geo = tc.get_flows(
    geography="county",
    state="TX",
    year=2018,
    geometry=True,
    output="wide"
)

# Get Texas county boundaries for context
tx_counties = tc.get_geography("county", state="TX", year=2018)

# Create flow lines
lines = flows_geo.copy()
lines = lines.dropna(subset=['centroid1', 'centroid2'])
lines['geometry'] = lines.apply(lambda r: LineString([r['centroid1'], r['centroid2']]), axis=1)
lines = gpd.GeoDataFrame(lines, geometry='geometry', crs=flows_geo.crs)

# Filter significant flows
significant_flows = lines[lines['MOVEDIN'] > 1000].copy()

# Calculate line widths
min_w, max_w = 0.5, 5.0
vals = np.log1p(significant_flows['MOVEDIN'].astype(float))
max_val = vals.max() if vals.max() > 0 else 1.0
significant_flows['lw'] = np.interp(vals, [0, max_val], [min_w, max_w])

# Plot
fig, ax = plt.subplots(figsize=(14, 10))

# County boundaries as background
tx_counties.boundary.plot(ax=ax, linewidth=0.5, color='gray', alpha=0.3, zorder=1)
buf= tx_counties.geometry.buffer(0.5)

# Flow lines
significant_flows.clip(mask=buf).plot(
    ax=ax,
    linewidth=significant_flows['lw'],
    column='MOVEDNET',
    cmap='coolwarm',
    alpha=0.8,
    legend=True,
    zorder=2
)

ax.set_axis_off()
plt.title('Texas County-to-County Migration Flows >1,000 people (2018)', fontsize=14)
plt.tight_layout()
plt.show()
/home/mmann1123/Documents/github/pytidycensus/pytidycensus/flows.py:654: UserWarning: Could not find centroids for 97 GEOIDs: ['0900579510', '0900156060', '0901385950', '0900949950', '0900174190'].... These flows will not have geometry data.
  warnings.warn(
/tmp/ipykernel_54434/2996565908.py:39: UserWarning: Geometry is in a geographic CRS. Results from 'buffer' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  buf= tx_counties.geometry.buffer(0.5)
../_images/eb1fd27e5842b9b1ed56dc8314b002bc08d0cbecb36e0cf007759d00eeeec12b.png
# ...existing code...
import folium
import branca.colormap as cm

# ensure numeric stroke width exists
significant_flows['lw'] = significant_flows['lw'].astype(float).fillna(1.0)

# folium expects GeoJSON in WGS84 (EPSG:4326)
gdf = significant_flows.drop(columns=['centroid1', 'centroid2'], errors='ignore').copy()
gjson = gdf.to_crs(epsg=4326).to_json()

# colormap for MOVEDNET
vmin = significant_flows['MOVEDNET'].min()
vmax = significant_flows['MOVEDNET'].max()
colormap = cm.LinearColormap(['#2166ac', '#ffffff', '#b2182b'], vmin=vmin, vmax=vmax)

def style_function(feature):
    props = feature.get('properties', {})
    lw = float(props.get('lw', 1.0))
    mv = props.get('MOVEDNET', 0)
    return {
        'weight': max(0.5, lw),                # ensure a minimum width
        'color': colormap(float(mv)) if mv is not None else '#888888',
        'opacity': 0.85
    }

# center map on data centroid (fallback coordinates provided)
centroid = significant_flows.geometry.unary_union.centroid
center = [centroid.y, centroid.x] if centroid is not None else [37.8, -96.9]

m = folium.Map(location=center, tiles='CartoDB positron', zoom_start=6)
folium.GeoJson(
    gjson,
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(fields=['FULL1_NAME', 'FULL2_NAME', 'MOVEDIN', 'MOVEDNET'])
).add_to(m)
colormap.caption = 'MOVEDNET'
colormap.add_to(m)

m
# ...existing code...
/tmp/ipykernel_54434/1650107880.py:28: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  centroid = significant_flows.geometry.unary_union.centroid
Make this Notebook Trusted to load map: File -> Trust Notebook

Error Handling and Best Practices

Here are some common errors and how to handle them:

# Error: Invalid geography
try:
    tc.get_flows(geography="invalid", year=2018)
except ValueError as e:
    print(f"Geography error: {e}")

# Error: Year too early
try:
    tc.get_flows(geography="county", year=2009)
except ValueError as e:
    print(f"Year error: {e}")

# Error: Breakdown variables after 2015
try:
    tc.get_flows(geography="county", breakdown=["AGE"], year=2016)
except ValueError as e:
    print(f"Breakdown error: {e}")

# Error: MSA data before 2013
try:
    tc.get_flows(geography="metropolitan statistical area", year=2012)
except ValueError as e:
    print(f"MSA year error: {e}")
Geography error: Geography must be one of: county, county subdivision, metropolitan statistical area
Year error: Migration flows are available beginning in 2010
Breakdown error: Breakdown characteristics are only available for surveys before 2016
MSA year error: MSA-level data is only available beginning with 2013 (2009-2013 5-year ACS)

Summary

The get_flows() function provides comprehensive access to Census migration flow data with features including:

  • Multiple geographic levels: county, county subdivision, MSA

  • Flexible output formats: wide (API format) or tidy (analysis-ready)

  • Demographic breakdowns: age, sex, race, income, etc. (2006-2015 only)

  • Confidence levels: 90%, 95%, or 99% for margin of error

  • Geometry integration: centroids for mapping flows

  • Robust validation: comprehensive error checking and helpful messages

For more information, see the pytidycensus documentation.