Getting Started
This guide will help you get started with pytidycensus, a Python library for accessing US Census data.
Installation
Install pytidycensus using pip:
pip install pytidycensus
For development installation:
git clone https://github.com/walkerke/tidycensus
cd tidycensus/pytidycensus
pip install -e .
Census API Key
To use pytidycensus, you need a free API key from the US Census Bureau:
- Visit https://api.census.gov/data/key_signup.html 
- Fill out the form to request an API key 
- Check your email for the API key 
Once you have your key, set it in Python:
import pytidycensus as tc
tc.set_census_api_key("your_api_key_here")
Alternatively, you can set it as an environment variable:
export CENSUS_API_KEY="your_api_key_here"
Basic Usage
Getting ACS Data
The American Community Survey (ACS) is the most commonly used Census dataset:
import pytidycensus as tc
# Get median household income by state
income_data = tc.get_acs(
    geography="state",
    variables="B19013_001",
    year=2022
)
print(income_data.head())
Adding Geography
To include geographic boundaries for mapping:
# Get data with geometry
income_geo = tc.get_acs(
    geography="state",
    variables="B19013_001", 
    year=2022,
    geometry=True
)
# Now you can map it
income_geo.plot(column='value', legend=True)
Multiple Variables
You can request multiple variables at once:
# Get population and median income
demo_data = tc.get_acs(
    geography="county",
    variables=["B01003_001", "B19013_001"],  # Population, Median Income
    state="CA",
    year=2022
)
Searching for Variables
Find variables by searching their descriptions:
# Search for income-related variables
income_vars = tc.search_variables("income", 2022, "acs", "acs5")
print(income_vars[['name', 'label']].head(10))
Data Formats
Tidy Format (Default)
By default, data is returned in “tidy” format where each row represents one geography-variable combination:
data = tc.get_acs(
    geography="state",
    variables=["B01003_001", "B19013_001"],
    output="tidy"  # This is the default
)
# Result: One row per state-variable combination
Wide Format
You can also get data in “wide” format where each row represents one geography:
data = tc.get_acs(
    geography="state",
    variables=["B01003_001", "B19013_001"],
    output="wide"
)
# Result: One row per state, variables as columns
Geographic Levels
pytidycensus supports many geographic levels:
- "us"- United States
- "region"- Census regions
- "division"- Census divisions
- "state"- States
- "county"- Counties
- "tract"- Census tracts
- "block group"- Block groups
- "place"- Places/cities
- "zcta"- ZIP Code Tabulation Areas
Geographic Filtering
Filter data to specific geographies:
# County data for Texas only
tx_counties = tc.get_acs(
    geography="county",
    variables="B01003_001",
    state="TX"
)
# Tract data for Harris County, Texas
harris_tracts = tc.get_acs(
    geography="tract", 
    variables="B01003_001",
    state="TX",
    county="201"  # Harris County FIPS code
)
We have implemented a county name lookup, so you can also use:
    county="Harris County"  # instead of FIPS code
Survey Types
The ACS has different survey periods:
- "acs5"- 5-year estimates (default, more reliable for small areas)
- "acs1"- 1-year estimates (more current, less reliable for small areas)
# Get 1-year ACS data
current_data = tc.get_acs(
    geography="state",
    variables="B01003_001",
    survey="acs1",
    year=2022
)
Margin of Error
ACS data includes margins of error. These are automatically included:
data = tc.get_acs(
    geography="state",
    variables="B19013_001"
)
# The result includes both estimate and margin of error
print(data.columns)
# ['GEOID', 'NAME', 'variable', 'value', 'B19013_001_moe']
Population Estimates Program
The Population Estimates Program provides annual population estimates and demographic characteristics. For years 2020 and later, pytidycensus retrieves data from CSV files; for earlier years (2015-2019), it uses the Census API.
Basic Population Estimates
# Get total population by state for 2022
state_pop = tc.get_estimates(
    geography="state",
    variables="POP", 
    vintage=2022
)
Components of Population Change
# Get births, deaths, and migration data
components = tc.get_estimates(
    geography="state",
    variables=["BIRTHS", "DEATHS", "DOMESTICMIG", "INTERNATIONALMIG"],
    vintage=2022
)
Demographic Breakdowns
Use the breakdown parameter to get population estimates by demographics:
# Population by sex and race
demographics = tc.get_estimates(
    geography="state",
    variables="POP",
    breakdown=["SEX", "RACE"],
    breakdown_labels=True,  # Include human-readable labels
    year=2022
)
Geographic Levels
Population estimates support multiple geographies:
# County-level data for Texas
tx_counties = tc.get_estimates(
    geography="county",
    variables="POP",
    state="TX",
    year=2022
)
# Metro areas (CBSAs)
metros = tc.get_estimates(
    geography="cbsa", 
    variables="POP",
    year=2022
)
Time Series Data
Get population estimates across multiple years:
# Time series for states from 2020-2023
time_series = tc.get_estimates(
    geography="state",
    variables="POP",
    time_series=True,
    vintage=2023
)
Data Products
Use the product parameter to specify the type of data:
# Basic population totals (default)
population = tc.get_estimates(
    geography="state",
    product="population",  # or omit for default
    variables="POP",
    year=2022
)
# Components of population change
components = tc.get_estimates(
    geography="state", 
    product="components",
    variables=["BIRTHS", "DEATHS"],
    year=2022
)
# Population characteristics by demographics
characteristics = tc.get_estimates(
    geography="state",
    product="characteristics",
    variables="POP",
    breakdown=["SEX"],
    year=2022
)
Next Steps
- Explore comprehensive Jupyter notebook examples 
- Check the API reference for detailed function documentation 
- Visit the GitHub repository for the latest updates 
Come study with us at The George Washington University
