Getting Started
This guide will help you get started with pytidycensus, a Python library for accessing US Census data.
Installation
Install pytidycensus using pip:
pip install pytidycensus
For development installation:
git clone https://github.com/walkerke/tidycensus
cd tidycensus/pytidycensus
pip install -e .
Census API Key
To use pytidycensus, you need a free API key from the US Census Bureau:
Visit https://api.census.gov/data/key_signup.html
Fill out the form to request an API key
Check your email for the API key
Once you have your key, set it in Python:
import pytidycensus as tc
tc.set_census_api_key("your_api_key_here")
Alternatively, you can set it as an environment variable:
export CENSUS_API_KEY="your_api_key_here"
Basic Usage
Getting ACS Data
The American Community Survey (ACS) is the most commonly used Census dataset:
import pytidycensus as tc
# Get median household income by state
income_data = tc.get_acs(
geography="state",
variables="B19013_001",
year=2022
)
print(income_data.head())
Adding Geography
To include geographic boundaries for mapping:
# Get data with geometry
income_geo = tc.get_acs(
geography="state",
variables="B19013_001",
year=2022,
geometry=True
)
# Now you can map it
income_geo.plot(column='value', legend=True)
Multiple Variables
You can request multiple variables at once:
# Get population and median income
demo_data = tc.get_acs(
geography="county",
variables=["B01003_001", "B19013_001"], # Population, Median Income
state="CA",
year=2022
)
Searching for Variables
Find variables by searching their descriptions:
# Search for income-related variables
income_vars = tc.search_variables("income", 2022, "acs", "acs5")
print(income_vars[['name', 'label']].head(10))
Data Formats
Tidy Format (Default)
By default, data is returned in “tidy” format where each row represents one geography-variable combination:
data = tc.get_acs(
geography="state",
variables=["B01003_001", "B19013_001"],
output="tidy" # This is the default
)
# Result: One row per state-variable combination
Wide Format
You can also get data in “wide” format where each row represents one geography:
data = tc.get_acs(
geography="state",
variables=["B01003_001", "B19013_001"],
output="wide"
)
# Result: One row per state, variables as columns
Geographic Levels
pytidycensus supports many geographic levels:
"us"
- United States"region"
- Census regions"division"
- Census divisions"state"
- States"county"
- Counties"tract"
- Census tracts"block group"
- Block groups"place"
- Places/cities"zcta"
- ZIP Code Tabulation Areas
Geographic Filtering
Filter data to specific geographies:
# County data for Texas only
tx_counties = tc.get_acs(
geography="county",
variables="B01003_001",
state="TX"
)
# Tract data for Harris County, Texas
harris_tracts = tc.get_acs(
geography="tract",
variables="B01003_001",
state="TX",
county="201" # Harris County FIPS code
)
We have implemented a county name lookup, so you can also use:
county="Harris County" # instead of FIPS code
Survey Types
The ACS has different survey periods:
"acs5"
- 5-year estimates (default, more reliable for small areas)"acs1"
- 1-year estimates (more current, less reliable for small areas)
# Get 1-year ACS data
current_data = tc.get_acs(
geography="state",
variables="B01003_001",
survey="acs1",
year=2022
)
Margin of Error
ACS data includes margins of error. These are automatically included:
data = tc.get_acs(
geography="state",
variables="B19013_001"
)
# The result includes both estimate and margin of error
print(data.columns)
# ['GEOID', 'NAME', 'variable', 'value', 'B19013_001_moe']
Population Estimates Program
The Population Estimates Program provides annual population estimates and demographic characteristics. For years 2020 and later, pytidycensus retrieves data from CSV files; for earlier years (2015-2019), it uses the Census API.
Basic Population Estimates
# Get total population by state for 2022
state_pop = tc.get_estimates(
geography="state",
variables="POP",
vintage=2022
)
Components of Population Change
# Get births, deaths, and migration data
components = tc.get_estimates(
geography="state",
variables=["BIRTHS", "DEATHS", "DOMESTICMIG", "INTERNATIONALMIG"],
vintage=2022
)
Demographic Breakdowns
Use the breakdown
parameter to get population estimates by demographics:
# Population by sex and race
demographics = tc.get_estimates(
geography="state",
variables="POP",
breakdown=["SEX", "RACE"],
breakdown_labels=True, # Include human-readable labels
year=2022
)
Geographic Levels
Population estimates support multiple geographies:
# County-level data for Texas
tx_counties = tc.get_estimates(
geography="county",
variables="POP",
state="TX",
year=2022
)
# Metro areas (CBSAs)
metros = tc.get_estimates(
geography="cbsa",
variables="POP",
year=2022
)
Time Series Data
Get population estimates across multiple years:
# Time series for states from 2020-2023
time_series = tc.get_estimates(
geography="state",
variables="POP",
time_series=True,
vintage=2023
)
Data Products
Use the product
parameter to specify the type of data:
# Basic population totals (default)
population = tc.get_estimates(
geography="state",
product="population", # or omit for default
variables="POP",
year=2022
)
# Components of population change
components = tc.get_estimates(
geography="state",
product="components",
variables=["BIRTHS", "DEATHS"],
year=2022
)
# Population characteristics by demographics
characteristics = tc.get_estimates(
geography="state",
product="characteristics",
variables="POP",
breakdown=["SEX"],
year=2022
)
Next Steps
Explore comprehensive Jupyter notebook examples
Check the API reference for detailed function documentation
Visit the GitHub repository for the latest updates