pytidycensus package 

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get median household income by county in Texas
>>> tx_income = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     year=2022
... )
>>>
>>> # Get data with geometry for mapping
>>> tx_income_geo = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     geometry=True
... )
>>>
>>> # Get data with named variables
>>> tx_demo = tc.get_acs(
...     geography="county",
...     variables={"total_pop": "B01003_001", "median_income": "B19013_001"},
...     state="TX",
...     year=2022
... )

pytidycensus.get_decennial(geography, variables=None, table=None, cache_table=False, year=2020, sumfile=None, state=None, county=None, output='wide', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, pop_group=None, pop_group_label=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Decennial Census.

Parameters:

geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).
variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.
table (str, optional) – Census table ID to retrieve all variables from.
cache_table (bool, default False) – Whether to cache table names for faster future access.
year (int, default 2020) – Census year (2000, 2010, or 2020). Note: 1990 data is not available via the API.
sumfile (str, optional) – Summary file to use. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years. Available options vary by year.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.
summary_var (str, optional) – Summary variable from the decennial Census to include for comparison.
pop_group (str, optional) – Population group code for which you’d like to request data (for selected sumfiles).
pop_group_label (bool, default False) – If True, return a pop_group_label column with the population group description.
api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.
show_call (bool, default False) – Whether to print the API call URL.
**kwargs – Additional parameters passed to geography functions.

Returns:

Decennial Census data, optionally with geometry.

Return type:

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population by state for 2020
>>> pop_2020 = tc.get_decennial(
...     geography="state",
...     variables="P1_001N",
...     year=2020
... )
>>>
>>> # Get race/ethnicity data with geometry
>>> race_data = tc.get_decennial(
...     geography="county",
...     variables=["P1_003N", "P1_004N", "P1_005N"],
...     state="CA",
...     year=2020,
...     geometry=True
... )
>>>
>>> # Get data with named variables and summary variable
>>> pop_data = tc.get_decennial(
...     geography="county",
...     variables={"total": "P1_001N", "white": "P1_003N"},
...     state="TX",
...     year=2020,
...     summary_var="P1_001N"
... )

pytidycensus.get_estimates(geography, product=None, variables=None, breakdown=None, breakdown_labels=False, vintage=2024, year=None, state=None, county=None, time_series=False, output='tidy', geometry=False, keep_geo_vars=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Census Bureau Population Estimates Program.

The Population Estimates Program (PEP) produces estimates of the population for the United States, its states, counties, cities, and towns. For years 2020 and later, data is retrieved from flat CSV files. For years 2019 and earlier, data comes from the Census API.

Parameters:

geography (str) – The geography of your data. Options include: - ‘us’ (United States) - ‘region’ (Census regions) - ‘division’ (Census divisions) - ‘state’ (States and DC) - ‘county’ (Counties) - ‘cbsa’ (Core Based Statistical Areas) - ‘metropolitan statistical area/micropolitan statistical area’ (alias for cbsa) - ‘combined statistical area’ (Combined Statistical Areas) - ‘place’ (Incorporated places and Census designated places)
product (str, optional) – The data product. Options include: - ‘population’ (population totals) - ‘components’ (components of population change) - ‘characteristics’ (population by demographics) For years 2020+, only ‘characteristics’ requires this parameter.
variables (str or list of str, optional) – Variable ID(s) to retrieve. Use ‘all’ to get all available variables. Common variables include: ‘POP’, ‘BIRTHS’, ‘DEATHS’, ‘DOMESTICMIG’, ‘INTERNATIONALMIG’
breakdown (list of str, optional) – Population breakdown for characteristics product. Options include: - ‘AGEGROUP’ (age groups) - ‘SEX’ (sex) - ‘RACE’ (race) - ‘HISP’ (Hispanic origin) Can be combined, e.g., [‘SEX’, ‘RACE’]
breakdown_labels (bool, default False) – Whether to include human-readable labels for breakdown categories.
vintage (int, default 2024) – The PEP vintage (dataset version year). Recommended to use the most recent.
year (int, optional) – The specific data year. Defaults to vintage if not specified.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
time_series (bool, default False) – Whether to retrieve time series data back to 2010.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
api_key (str, optional) – Census API key for years 2019 and earlier.
show_call (bool, default False) – Whether to print the API call URL (for API-based requests).
**kwargs – Additional parameters passed to geography functions.

Returns:

Population estimates data, optionally with geometry.

Return type:

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population estimates by state
>>> state_pop = tc.get_estimates(
...     geography="state",
...     variables="POP",
...     year=2022
... )
>>>
>>> # Get population by age and sex for counties in Texas
>>> tx_pop_demo = tc.get_estimates(
...     geography="county",
...     variables="POP",
...     breakdown=["SEX", "AGEGROUP"],
...     state="TX",
...     breakdown_labels=True
... )

pytidycensus.get_flows(geography, variables=None, breakdown=None, breakdown_labels=False, year=2018, output='tidy', state=None, county=None, msa=None, geometry=False, api_key=None, moe_level=90, show_call=False)[source]

Retrieve migration flow data from the Census Migration Flows API.

The Migration Flows API provides data on population movement between geographic areas based on American Community Survey 5-year estimates.

Parameters:

geography (str) – Geographic level for the data. Must be one of: - “county” - “county subdivision” - “metropolitan statistical area”
variables (list of str, optional) – Census variable names to retrieve. If None, returns default flow variables.
breakdown (list of str, optional) – Demographic breakdown characteristics. Available for years 2006-2015 only. Options include: AGE, SEX, RACE, HSGP, REL, HHT, TEN, ENG, POB, YEARS, ESR, OCC, WKS, SCHL, AHINC, APINC, HISP_ORIGIN.
breakdown_labels (bool, default False) – If True, replace breakdown variable codes with descriptive labels.
year (int, default 2018) – ACS 5-year survey ending year. Available years: 2010-2018.
output (str, default "tidy") – Output format. Options: - “tidy”: Long format focusing on core migration variables (MOVEDIN, MOVEDOUT, MOVEDNET) - “wide”: Wide format matching API response structure (recommended for breakdown variables)
state (str or list of str, optional) – State(s) to filter by. Can be state abbreviation, name, or FIPS code.
county (str or list of str, optional) – County(ies) to filter by. Can be county name or FIPS code.
msa (str or list of str, optional) – Metropolitan Statistical Area(s) to filter by.
geometry (bool, default False) –
If True, include geographic centroids for mapping flows. Raises RuntimeError if geometry data cannot be downloaded.

Centroids are calculated in EPSG:2163 (US National Atlas Equal Area) projection for accuracy and to properly position Alaska and Hawaii, then transformed back to EPSG:4269 (NAD83) for compatibility.

Note: The Census API may return mixed geographic levels in GEOID2 (destination). For example, when requesting county-level flows, GEOID2 may contain both 5-digit county codes and 10-digit county subdivision codes. When geometry=True, centroids for both counties and subdivisions will be automatically retrieved.
api_key (str, optional) – Census API key. If None, uses CENSUS_API_KEY environment variable.
moe_level (int, default 90) – Confidence level for margin of error. Options: 90, 95, 99.
show_call (bool, default False) – If True, print the API call URL.

Returns:

Migration flow data. If geometry=True, returns GeoDataFrame with origin and destination centroids.

GEOID columns contain Census geographic identifiers: - GEOID1, GEOID2: Origin and destination GEOIDs - 5-digit codes: County level (e.g., ‘12133’ = Washington County, FL) - 10-digit codes: County subdivision level (e.g., ‘3400557510’ =

Pemberton township, Burlington County, NJ)

First 5 digits of subdivision codes represent the parent county

Use identify_geoid_type() to determine the geographic level of any GEOID.

Return type:

Examples

Get county-to-county migration flows for Texas:

>>> import pytidycensus as tc
>>> tx_flows = tc.get_flows(
...     geography="county",
...     state="TX",
...     year=2018
... )

Get flows with demographic breakdowns (pre-2016 only):

>>> flows_by_age = tc.get_flows(
...     geography="county",
...     breakdown=["AGE", "SEX"],
...     breakdown_labels=True,
...     state="CA",
...     year=2015
... )

pytidycensus.identify_geoid_type(geoid)[source]

Identify the geographic level of a GEOID based on its length.

Parameters:: geoid (str, int, or None) – A Census GEOID code
Returns:: Geographic type: ‘county’, ‘county subdivision’, ‘state’, ‘tract’, or ‘unknown’
Return type:: str

Examples

>>> identify_geoid_type('12133')
'county'
>>> identify_geoid_type('3400557510')
'county subdivision'
>>> identify_geoid_type(None)
'unknown'

pytidycensus.get_geography(geography, year=2022, state=None, county=None, keep_geo_vars=False, cache_dir=None, cb=True, **kwargs)[source]

Download and load geographic boundary data using pygris.

Parameters:

geography (str) – Geography type (e.g., ‘county’, ‘tract’, ‘block group’, ‘state’, ‘zcta’, ‘place’)
year (int, default 2022) – Census year for boundaries
state (str, int, or list, optional) – State(s) to filter data for. Can be state name, abbreviation, or FIPS code.
county (str, int, or list, optional) – County(ies) to filter data for (requires state). Can be county name or FIPS code.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables
cache_dir (str, optional) – Directory for caching downloaded files (currently not used with pygris)
cb (bool, default True) – If True, download generalized cartographic boundary files (1:500k). If False, download detailed TIGER/Line files. Note: For 2020 state-level data, cartographic boundaries may fail due to Census Bureau access restrictions. The function will automatically fall back to detailed TIGER/Line files (cb=False) if this occurs.
**kwargs – Additional parameters passed to underlying pygris functions

Returns:

Geographic boundary data

Return type:

Notes

Automatic Fallback: If downloading cartographic boundary files (cb=True) fails with file system errors (common for 2020 state-level GENZ files), the function will automatically retry with detailed TIGER/Line files (cb=False) and issue a warning. This ensures robust data retrieval without requiring manual intervention.

Examples

>>> # Get county boundaries for Texas
>>> tx_counties = get_geography("county", state="TX", year=2022)
>>>
>>> # Get tract boundaries for Harris County, TX
>>> harris_tracts = get_geography(
...     "tract",
...     state="TX",
...     county="201",
...     year=2022
... )
>>>
>>> # Get 2020 state boundaries (will auto-fallback if needed)
>>> states_2020 = get_geography("state", year=2020)

pytidycensus.get_time_series(geography, variables, years, dataset='acs5', base_year=None, extensive_variables=None, intensive_variables=None, geometry=True, output='wide', crs='EPSG:3857', **kwargs)[source]

Collect time series data from Census APIs with area interpolation support.

This function automatically handles boundary changes by interpolating data to a consistent set of geographic boundaries (base year). It supports both ACS and Decennial Census data.

Parameters:

geography (str) – Geographic level (e.g., ‘tract’, ‘county’, ‘state’).
variables (str, list, or dict) – Variable codes to retrieve. Can be: - Single variable code as string - List of variable codes - Dictionary mapping custom names to variable codes
years (list of int) – Years to retrieve data for.
dataset (str, default "acs5") – Dataset type. Options: - “acs5”: ACS 5-year estimates - “acs1”: ACS 1-year estimates - “decennial”: Decennial Census
base_year (int, optional) – Year to use for base geography boundaries. If None, uses the most recent year. All other years will be interpolated to these boundaries.
extensive_variables (list of str, optional) – Variables representing counts/totals that should be redistributed proportionally by area during interpolation (e.g., population, housing units). REQUIRED when area interpolation is needed (changing tract/block group boundaries).
intensive_variables (list of str, optional) – Variables representing rates/densities that should be area-weighted during interpolation (e.g., median income, poverty rate, percentages). REQUIRED when area interpolation is needed (changing tract/block group boundaries).
geometry (bool, default True) – Whether to include geographic boundaries. Required for area interpolation.
output (str, default "wide") – Output format: - “wide”: Variables as columns, years as separate DataFrames or multi-index - “tidy”: Long format with separate rows for each variable-year combination
crs (str or dict, default "EPSG:3857") – Coordinate reference system to use for area calculations during interpolation.
**kwargs – Additional arguments passed to get_acs() or get_decennial().

Returns:

Time series data with consistent geographic boundaries. - If output=”wide”: Multi-index DataFrame with years and variables as columns - If output=”tidy”: Long format with ‘year’, ‘variable’, ‘estimate’ columns

Return type:

pd.DataFrame

Examples

>>> # ACS 5-year time series with area interpolation
>>> data = get_time_series(
...     geography="tract",
...     variables={"total_pop": "B01003_001E", "median_income": "B19013_001E"},
...     years=[2015, 2020],
...     dataset="acs5",
...     state="CA",
...     county="037",
...     base_year=2020,
...     extensive_variables=["total_pop"],
...     intensive_variables=["median_income"]
... )

>>> # Decennial census time series
>>> data = get_time_series(
...     geography="tract",
...     variables={"total_pop": {"2010": "P001001", "2020": "P1_001N"}},
...     years=[2010, 2020],
...     dataset="decennial",
...     state="DC",
...     base_year=2020
... )

Notes

Area interpolation requires the tobler package: pip install tobler
For geographies that don’t change (state, county), interpolation is skipped
Decennial census variables may differ between years - use a dict to specify
When base_year is None, the most recent year is used as the base
IMPORTANT: When area interpolation is needed, ALL variables must be classified as either extensive or intensive. This ensures proper redistribution of values across changing boundaries.
- Extensive: counts/totals (population, housing units) - redistributed by area
- Intensive: rates/medians/percentages (median income, poverty rate) - area-weighted

pytidycensus.compare_time_periods(data, base_period, comparison_period, variables=None, calculate_change=True, calculate_percent_change=True)[source]

Compare data between two time periods.

Parameters:

data (pd.DataFrame) – Time series data from get_time_series() with wide format.
base_period (int or str) – Base time period for comparison.
comparison_period (int or str) – Comparison time period.
variables (list of str, optional) – Variables to compare. If None, uses all available variables.
calculate_change (bool, default True) – Whether to calculate absolute change.
calculate_percent_change (bool, default True) – Whether to calculate percent change.

Returns:

DataFrame with comparison results.

Return type:

pd.DataFrame

pytidycensus.load_variables(year, dataset=None, survey=None, cache=True, cache_dir=None)[source]

Load Census variables for a given dataset and year.

Parameters:

year (int) – Census year
dataset (str, optional) – Dataset name (‘acs’, ‘dec’, ‘pep’, etc.). Provide either dataset or survey.
survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’, ‘sf1’, ‘pl’). If provided, the dataset will be inferred from the survey. Provide either dataset or survey, not both.
cache (bool, default True) – Whether to cache variables for faster future access
cache_dir (str, optional) – Directory for caching. Defaults to user cache directory.

Returns:

Variables with columns: name, label, concept, predicateType, group, limit

Return type:

pd.DataFrame

Examples

>>> # Load ACS 5-year variables for 2022
>>> acs_vars = load_variables(2022, "acs", "acs5")
>>>
>>> # Search for income-related variables
>>> income_vars = acs_vars[acs_vars['label'].str.contains('income', case=False)]
>>>
>>> # Load decennial census variables for 2020
>>> dec_vars = load_variables(2020, "dec", "pl")

pytidycensus.search_variables(pattern, year, dataset, survey=None, field='concept')[source]

Search for variables by pattern in labels, concepts, or names.

Parameters:

pattern (str) – Search pattern (case-insensitive)
year (int) – Census year
dataset (str, optional) – Dataset name (‘acs’, ‘dec’, ‘pep’, etc.). Provide either dataset or survey.
survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’, ‘sf1’, ‘pl’). If provided, the dataset will be inferred from the survey. Provide either dataset or survey, not both.
field (str, default "label") – Field to search in (‘label’, ‘concept’, ‘name’, or ‘all’)

Returns:

Matching variables

Return type:

pd.DataFrame

Examples

>>> # Search for income variables in ACS
>>> income_vars = search_variables("income", 2022, "acs", "acs5")
>>>
>>> # Search for population in concepts
>>> pop_vars = search_variables("population", 2020, "dec", "pl", field="concept")

pytidycensus.get_table_variables(table, year, dataset, survey=None)[source]

Get all variables for a specific table.

Parameters:

table (str) – Table code (e.g., ‘B19013’, ‘P1’)
year (int) – Census year
dataset (str) – Dataset name
survey (str, optional) – Survey type

Returns:

Variables for the specified table

Return type:

pd.DataFrame

Examples

>>> # Get all variables for median household income table
>>> b19013_vars = get_table_variables("B19013", 2022, "acs", "acs5")
>>>
>>> # Get all variables for race table in 2020 Census
>>> p1_vars = get_table_variables("P1", 2020, "dec", "pl")

pytidycensus.get_credentials()[source]

Submodules

pytidycensus.api module

Core Census API client for making requests to the US Census Bureau APIs.

class pytidycensus.api.CensusAPI(api_key=None, cache_dir=None)[source]

Bases: object

Core client for interacting with US Census Bureau APIs.

Handles authentication, rate limiting, caching, and error handling for Census API requests.

BASE_URL = 'https://api.census.gov/data'

__init__(api_key=None, cache_dir=None)[source]

Initialize Census API client.

Parameters:

api_key (str, optional) – Census API key. If not provided, will look for CENSUS_API_KEY environment variable.
cache_dir (str, optional) – Directory for caching API responses. Defaults to user cache directory.

get(year, dataset, variables, geography, survey=None, show_call=False)[source]

Make a request to the Census API.

Parameters:

year (int) – Census year
dataset (str) – Dataset name (e.g., ‘acs’, ‘dec’)
variables (List[str]) – List of variable codes to retrieve
geography (Dict[str, str]) – Geography specification (e.g., {‘for’: ‘county:*’, ‘in’: ‘state:06’})
survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’)
show_call (bool, default False) – Whether to print the API call URL

Returns:

Parsed JSON response from API

Return type:

List[Dict[str, Any]]

Raises:

requests.RequestException – If API request fails
ValueError – If API returns error response

get_geography_codes(year, dataset, survey=None)[source]

Get available geography codes for a dataset.

Parameters:

year (int) – Census year
dataset (str) – Dataset name
survey (str, optional) – Survey type

Returns:

Available geography codes

Return type:

Dict[str, Any]

get_variables(year, dataset, survey=None)[source]

Get available variables for a dataset.

Parameters:

year (int) – Census year
dataset (str) – Dataset name
survey (str, optional) – Survey type

Returns:

Available variables with metadata

Return type:

Dict[str, Any]

pytidycensus.api.set_census_api_key(api_key)[source]

Set Census API key as environment variable.

Parameters:: api_key (str) – Census API key obtained from https://api.census.gov/data/key_signup.html
Raises:: ValueError – If the API key is not a string of exactly 40 characters
Return type:: None

pytidycensus.acs module

American Community Survey (ACS) data retrieval functions.

pytidycensus.acs.get_acs(geography, variables=None, table=None, cache_table=False, year=2022, survey='acs5', state=None, county=None, zcta=None, output='wide', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, moe_level=90, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the American Community Survey (ACS).

Parameters:

geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).
variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.
table (str, optional) – ACS table ID to retrieve all variables from.
cache_table (bool, default False) – Whether to cache table names for faster future access.
year (int, default 2022) – Year of ACS data (2009-2022 for 5-year, 2005-2022 for 1-year).
survey (str, default "acs5") – ACS survey type (“acs1”, “acs3”, or “acs5”).
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
zcta (str or list of str, optional) – ZIP Code Tabulation Area(s) to retrieve data for. Geography must be “zcta”.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.
summary_var (str, optional) – Summary variable from the ACS to include for comparison (e.g. total population).
moe_level (int, default 90) – Confidence level for margin of error (90, 95, or 99).
api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.
show_call (bool, default False) – Whether to print the API call URL.
**kwargs – Additional parameters passed to geography functions.

Returns:

ACS data, optionally with geometry.

Return type:

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get median household income by county in Texas
>>> tx_income = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     year=2022
... )
>>>
>>> # Get data with geometry for mapping
>>> tx_income_geo = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     geometry=True
... )
>>>
>>> # Get data with named variables
>>> tx_demo = tc.get_acs(
...     geography="county",
...     variables={"total_pop": "B01003_001", "median_income": "B19013_001"},
...     state="TX",
...     year=2022
... )

pytidycensus.acs.get_acs_variables(year=2022, survey='acs5')[source]

Get available ACS variables for a given year and survey.

Parameters:

year (int, default 2022) – ACS year
survey (str, default "acs5") – Survey type (“acs1” or “acs5”)

Returns:

Available variables with metadata

Return type:

pd.DataFrame

pytidycensus.decennial module

Decennial Census data retrieval functions.

pytidycensus.decennial.get_decennial(geography, variables=None, table=None, cache_table=False, year=2020, sumfile=None, state=None, county=None, output='wide', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, pop_group=None, pop_group_label=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Decennial Census.

Parameters:

geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).
variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.
table (str, optional) – Census table ID to retrieve all variables from.
cache_table (bool, default False) – Whether to cache table names for faster future access.
year (int, default 2020) – Census year (2000, 2010, or 2020). Note: 1990 data is not available via the API.
sumfile (str, optional) – Summary file to use. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years. Available options vary by year.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.
summary_var (str, optional) – Summary variable from the decennial Census to include for comparison.
pop_group (str, optional) – Population group code for which you’d like to request data (for selected sumfiles).
pop_group_label (bool, default False) – If True, return a pop_group_label column with the population group description.
api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.
show_call (bool, default False) – Whether to print the API call URL.
**kwargs – Additional parameters passed to geography functions.

Returns:

Decennial Census data, optionally with geometry.

Return type:

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population by state for 2020
>>> pop_2020 = tc.get_decennial(
...     geography="state",
...     variables="P1_001N",
...     year=2020
... )
>>>
>>> # Get race/ethnicity data with geometry
>>> race_data = tc.get_decennial(
...     geography="county",
...     variables=["P1_003N", "P1_004N", "P1_005N"],
...     state="CA",
...     year=2020,
...     geometry=True
... )
>>>
>>> # Get data with named variables and summary variable
>>> pop_data = tc.get_decennial(
...     geography="county",
...     variables={"total": "P1_001N", "white": "P1_003N"},
...     state="TX",
...     year=2020,
...     summary_var="P1_001N"
... )

pytidycensus.decennial.get_decennial_variables(year=2020, sumfile=None)[source]

Get available decennial Census variables for a given year.

Parameters:

year (int, default 2020) – Census year
sumfile (str, optional) – Summary file. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years.

Returns:

Available variables with metadata

Return type:

pd.DataFrame

pytidycensus.estimates module

Population estimates data retrieval functions.

exception pytidycensus.estimates.PopulationEstimatesError[source]

Bases: Exception

Base exception class for Population Estimates errors.

exception pytidycensus.estimates.InvalidGeographyError[source]

Raised when an invalid geography is specified.

exception pytidycensus.estimates.InvalidVariableError[source]

Raised when an invalid variable is specified.

exception pytidycensus.estimates.DataNotAvailableError[source]

Raised when requested data is not available.

exception pytidycensus.estimates.APIError[source]

Raised when there are issues with API requests.

pytidycensus.estimates.get_estimates(geography, product=None, variables=None, breakdown=None, breakdown_labels=False, vintage=2024, year=None, state=None, county=None, time_series=False, output='tidy', geometry=False, keep_geo_vars=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Census Bureau Population Estimates Program.

The Population Estimates Program (PEP) produces estimates of the population for the United States, its states, counties, cities, and towns. For years 2020 and later, data is retrieved from flat CSV files. For years 2019 and earlier, data comes from the Census API.

Parameters:

geography (str) – The geography of your data. Options include: - ‘us’ (United States) - ‘region’ (Census regions) - ‘division’ (Census divisions) - ‘state’ (States and DC) - ‘county’ (Counties) - ‘cbsa’ (Core Based Statistical Areas) - ‘metropolitan statistical area/micropolitan statistical area’ (alias for cbsa) - ‘combined statistical area’ (Combined Statistical Areas) - ‘place’ (Incorporated places and Census designated places)
product (str, optional) – The data product. Options include: - ‘population’ (population totals) - ‘components’ (components of population change) - ‘characteristics’ (population by demographics) For years 2020+, only ‘characteristics’ requires this parameter.
variables (str or list of str, optional) – Variable ID(s) to retrieve. Use ‘all’ to get all available variables. Common variables include: ‘POP’, ‘BIRTHS’, ‘DEATHS’, ‘DOMESTICMIG’, ‘INTERNATIONALMIG’
breakdown (list of str, optional) – Population breakdown for characteristics product. Options include: - ‘AGEGROUP’ (age groups) - ‘SEX’ (sex) - ‘RACE’ (race) - ‘HISP’ (Hispanic origin) Can be combined, e.g., [‘SEX’, ‘RACE’]
breakdown_labels (bool, default False) – Whether to include human-readable labels for breakdown categories.
vintage (int, default 2024) – The PEP vintage (dataset version year). Recommended to use the most recent.
year (int, optional) – The specific data year. Defaults to vintage if not specified.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
time_series (bool, default False) – Whether to retrieve time series data back to 2010.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
api_key (str, optional) – Census API key for years 2019 and earlier.
show_call (bool, default False) – Whether to print the API call URL (for API-based requests).
**kwargs – Additional parameters passed to geography functions.

Returns:

Population estimates data, optionally with geometry.

Return type:

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population estimates by state
>>> state_pop = tc.get_estimates(
...     geography="state",
...     variables="POP",
...     year=2022
... )
>>>
>>> # Get population by age and sex for counties in Texas
>>> tx_pop_demo = tc.get_estimates(
...     geography="county",
...     variables="POP",
...     breakdown=["SEX", "AGEGROUP"],
...     state="TX",
...     breakdown_labels=True
... )

pytidycensus.estimates.discover_available_variables(vintage=2024, geography='state')[source]

Discover all available variables in a PEP dataset.

Parameters:

vintage (int, default 2024) – The vintage year of the dataset
geography (str, default "state") – The geography to check for available variables

Returns:

DataFrame with variable names and descriptions

Return type:

pd.DataFrame

pytidycensus.estimates.get_estimates_variables(year=2022)[source]

Get available population estimates variables for a given year.

Parameters:: year (int, default 2022) – Estimates year
Returns:: Available variables with metadata
Return type:: pd.DataFrame

pytidycensus.geography module

Geographic boundary data retrieval and processing using pygris.

pytidycensus.geography.get_geography(geography, year=2022, state=None, county=None, keep_geo_vars=False, cache_dir=None, cb=True, **kwargs)[source]

Download and load geographic boundary data using pygris.

Parameters:

geography (str) – Geography type (e.g., ‘county’, ‘tract’, ‘block group’, ‘state’, ‘zcta’, ‘place’)
year (int, default 2022) – Census year for boundaries
state (str, int, or list, optional) – State(s) to filter data for. Can be state name, abbreviation, or FIPS code.
county (str, int, or list, optional) – County(ies) to filter data for (requires state). Can be county name or FIPS code.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables
cache_dir (str, optional) – Directory for caching downloaded files (currently not used with pygris)
cb (bool, default True) – If True, download generalized cartographic boundary files (1:500k). If False, download detailed TIGER/Line files. Note: For 2020 state-level data, cartographic boundaries may fail due to Census Bureau access restrictions. The function will automatically fall back to detailed TIGER/Line files (cb=False) if this occurs.
**kwargs – Additional parameters passed to underlying pygris functions

Returns:

Geographic boundary data

Return type:

Notes

Automatic Fallback: If downloading cartographic boundary files (cb=True) fails with file system errors (common for 2020 state-level GENZ files), the function will automatically retry with detailed TIGER/Line files (cb=False) and issue a warning. This ensures robust data retrieval without requiring manual intervention.

Examples

>>> # Get county boundaries for Texas
>>> tx_counties = get_geography("county", state="TX", year=2022)
>>>
>>> # Get tract boundaries for Harris County, TX
>>> harris_tracts = get_geography(
...     "tract",
...     state="TX",
...     county="201",
...     year=2022
... )
>>>
>>> # Get 2020 state boundaries (will auto-fallback if needed)
>>> states_2020 = get_geography("state", year=2020)

pytidycensus.geography.get_state_boundaries(year=2022, cb=True, **kwargs)[source]

Get US state boundaries.

Parameters:

year (int, default 2022) – Census year for boundaries
cb (bool, default True) – If True, download generalized cartographic boundary files
**kwargs – Additional parameters

Returns:

State boundaries

Return type:

pytidycensus.geography.get_county_boundaries(state=None, year=2022, cb=True, **kwargs)[source]

Get US county boundaries, optionally filtered by state.

Parameters:

state (str, int, or list, optional) – State(s) to filter by
year (int, default 2022) – Census year for boundaries
cb (bool, default True) – If True, download generalized cartographic boundary files
**kwargs – Additional parameters

Returns:

County boundaries

Return type:

pytidycensus.geography.get_tract_boundaries(state, county=None, year=2022, cb=True, **kwargs)[source]

Get census tract boundaries for a state, optionally filtered by county.

Parameters:

state (str or int) – State to get tracts for
county (str, int, or list, optional) – County(ies) to filter by
year (int, default 2022) – Census year for boundaries
cb (bool, default True) – If True, download generalized cartographic boundary files
**kwargs – Additional parameters

Returns:

Tract boundaries

Return type:

pytidycensus.geography.get_block_group_boundaries(state, county=None, year=2022, cb=True, **kwargs)[source]

Get block group boundaries for a state, optionally filtered by county.

Parameters:

state (str or int) – State to get block groups for
county (str, int, or list, optional) – County(ies) to filter by
year (int, default 2022) – Census year for boundaries
cb (bool, default True) – If True, download generalized cartographic boundary files
**kwargs – Additional parameters

Returns:

Block group boundaries

Return type: