pytidycensus package

pytidycensus: Python interface to US Census Bureau APIs

A Python library that provides an integrated interface to several United States Census Bureau APIs and geographic boundary files. Allows users to return Census and ACS data as pandas DataFrames, and optionally returns GeoPandas GeoDataFrames with feature geometry for mapping and spatial analysis.

class pytidycensus.CensusAPI(api_key=None, cache_dir=None)[source]

Bases: object

Core client for interacting with US Census Bureau APIs.

Handles authentication, rate limiting, caching, and error handling for Census API requests.

BASE_URL = 'https://api.census.gov/data'
__init__(api_key=None, cache_dir=None)[source]

Initialize Census API client.

Parameters:
  • api_key (str, optional) – Census API key. If not provided, will look for CENSUS_API_KEY environment variable.

  • cache_dir (str, optional) – Directory for caching API responses. Defaults to user cache directory.

get(year, dataset, variables, geography, survey=None, show_call=False)[source]

Make a request to the Census API.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name (e.g., ‘acs’, ‘dec’)

  • variables (List[str]) – List of variable codes to retrieve

  • geography (Dict[str, str]) – Geography specification (e.g., {‘for’: ‘county:*’, ‘in’: ‘state:06’})

  • survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’)

  • show_call (bool, default False) – Whether to print the API call URL

Returns:

Parsed JSON response from API

Return type:

List[Dict[str, Any]]

Raises:
  • requests.RequestException – If API request fails

  • ValueError – If API returns error response

get_geography_codes(year, dataset, survey=None)[source]

Get available geography codes for a dataset.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

Returns:

Available geography codes

Return type:

Dict[str, Any]

get_variables(year, dataset, survey=None)[source]

Get available variables for a dataset.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

Returns:

Available variables with metadata

Return type:

Dict[str, Any]

pytidycensus.set_census_api_key(api_key)[source]

Set Census API key as environment variable.

Parameters:

api_key (str) – Census API key obtained from https://api.census.gov/data/key_signup.html

Raises:

ValueError – If the API key is not a string of exactly 40 characters

Return type:

None

pytidycensus.get_acs(geography, variables=None, table=None, cache_table=False, year=2022, survey='acs5', state=None, county=None, zcta=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, moe_level=90, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the American Community Survey (ACS).

Parameters:
  • geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).

  • variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.

  • table (str, optional) – ACS table ID to retrieve all variables from.

  • cache_table (bool, default False) – Whether to cache table names for faster future access.

  • year (int, default 2022) – Year of ACS data (2009-2022 for 5-year, 2005-2022 for 1-year).

  • survey (str, default "acs5") – ACS survey type (“acs1”, “acs3”, or “acs5”).

  • state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.

  • county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.

  • zcta (str or list of str, optional) – ZIP Code Tabulation Area(s) to retrieve data for. Geography must be “zcta”.

  • output (str, default "tidy") – Output format (“tidy” or “wide”).

  • geometry (bool, default False) – Whether to include geometry for mapping.

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.

  • shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.

  • summary_var (str, optional) – Summary variable from the ACS to include for comparison (e.g. total population).

  • moe_level (int, default 90) – Confidence level for margin of error (90, 95, or 99).

  • api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.

  • show_call (bool, default False) – Whether to print the API call URL.

  • **kwargs – Additional parameters passed to geography functions.

Returns:

ACS data, optionally with geometry.

Return type:

pandas.DataFrame or geopandas.GeoDataFrame

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get median household income by county in Texas
>>> tx_income = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     year=2022
... )
>>>
>>> # Get data with geometry for mapping
>>> tx_income_geo = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     geometry=True
... )
>>>
>>> # Get data with named variables
>>> tx_demo = tc.get_acs(
...     geography="county",
...     variables={"total_pop": "B01003_001", "median_income": "B19013_001"},
...     state="TX",
...     year=2022
... )
pytidycensus.get_decennial(geography, variables=None, table=None, cache_table=False, year=2020, sumfile=None, state=None, county=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, pop_group=None, pop_group_label=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Decennial Census.

Parameters:
  • geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).

  • variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.

  • table (str, optional) – Census table ID to retrieve all variables from.

  • cache_table (bool, default False) – Whether to cache table names for faster future access.

  • year (int, default 2020) – Census year (2000, 2010, or 2020). Note: 1990 data is not available via the API.

  • sumfile (str, optional) – Summary file to use. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years. Available options vary by year.

  • state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.

  • county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.

  • output (str, default "tidy") – Output format (“tidy” or “wide”).

  • geometry (bool, default False) – Whether to include geometry for mapping.

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.

  • shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.

  • summary_var (str, optional) – Summary variable from the decennial Census to include for comparison.

  • pop_group (str, optional) – Population group code for which you’d like to request data (for selected sumfiles).

  • pop_group_label (bool, default False) – If True, return a pop_group_label column with the population group description.

  • api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.

  • show_call (bool, default False) – Whether to print the API call URL.

  • **kwargs – Additional parameters passed to geography functions.

Returns:

Decennial Census data, optionally with geometry.

Return type:

pandas.DataFrame or geopandas.GeoDataFrame

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population by state for 2020
>>> pop_2020 = tc.get_decennial(
...     geography="state",
...     variables="P1_001N",
...     year=2020
... )
>>>
>>> # Get race/ethnicity data with geometry
>>> race_data = tc.get_decennial(
...     geography="county",
...     variables=["P1_003N", "P1_004N", "P1_005N"],
...     state="CA",
...     year=2020,
...     geometry=True
... )
>>>
>>> # Get data with named variables and summary variable
>>> pop_data = tc.get_decennial(
...     geography="county",
...     variables={"total": "P1_001N", "white": "P1_003N"},
...     state="TX",
...     year=2020,
...     summary_var="P1_001N"
... )
pytidycensus.get_estimates(geography, product=None, variables=None, breakdown=None, breakdown_labels=False, vintage=2024, year=None, state=None, county=None, time_series=False, output='tidy', geometry=False, keep_geo_vars=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Census Bureau Population Estimates Program.

The Population Estimates Program (PEP) produces estimates of the population for the United States, its states, counties, cities, and towns. For years 2020 and later, data is retrieved from flat CSV files. For years 2019 and earlier, data comes from the Census API.

Parameters:
  • geography (str) – The geography of your data. Options include: - ‘us’ (United States) - ‘region’ (Census regions) - ‘division’ (Census divisions) - ‘state’ (States and DC) - ‘county’ (Counties) - ‘cbsa’ (Core Based Statistical Areas) - ‘metropolitan statistical area/micropolitan statistical area’ (alias for cbsa) - ‘combined statistical area’ (Combined Statistical Areas) - ‘place’ (Incorporated places and Census designated places)

  • product (str, optional) – The data product. Options include: - ‘population’ (population totals) - ‘components’ (components of population change) - ‘characteristics’ (population by demographics) For years 2020+, only ‘characteristics’ requires this parameter.

  • variables (str or list of str, optional) – Variable ID(s) to retrieve. Use ‘all’ to get all available variables. Common variables include: ‘POP’, ‘BIRTHS’, ‘DEATHS’, ‘DOMESTICMIG’, ‘INTERNATIONALMIG’

  • breakdown (list of str, optional) – Population breakdown for characteristics product. Options include: - ‘AGEGROUP’ (age groups) - ‘SEX’ (sex) - ‘RACE’ (race) - ‘HISP’ (Hispanic origin) Can be combined, e.g., [‘SEX’, ‘RACE’]

  • breakdown_labels (bool, default False) – Whether to include human-readable labels for breakdown categories.

  • vintage (int, default 2024) – The PEP vintage (dataset version year). Recommended to use the most recent.

  • year (int, optional) – The specific data year. Defaults to vintage if not specified.

  • state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.

  • county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.

  • time_series (bool, default False) – Whether to retrieve time series data back to 2010.

  • output (str, default "tidy") – Output format (“tidy” or “wide”).

  • geometry (bool, default False) – Whether to include geometry for mapping.

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.

  • api_key (str, optional) – Census API key for years 2019 and earlier.

  • show_call (bool, default False) – Whether to print the API call URL (for API-based requests).

  • **kwargs – Additional parameters passed to geography functions.

Returns:

Population estimates data, optionally with geometry.

Return type:

pandas.DataFrame or geopandas.GeoDataFrame

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population estimates by state
>>> state_pop = tc.get_estimates(
...     geography="state",
...     variables="POP",
...     year=2022
... )
>>>
>>> # Get population by age and sex for counties in Texas
>>> tx_pop_demo = tc.get_estimates(
...     geography="county",
...     variables="POP",
...     breakdown=["SEX", "AGEGROUP"],
...     state="TX",
...     breakdown_labels=True
... )
pytidycensus.get_geography(geography, year=2022, state=None, county=None, keep_geo_vars=False, cache_dir=None, **kwargs)[source]

Download and load geographic boundary data from TIGER/Line shapefiles.

Parameters:
  • geography (str) – Geography type (e.g., ‘county’, ‘tract’, ‘block group’)

  • year (int, default 2022) – Census year for boundaries

  • state (str, int, or list, optional) – State(s) to filter data for

  • county (str, int, or list, optional) – County(ies) to filter data for (requires state)

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables

  • cache_dir (str, optional) – Directory for caching downloaded files

  • **kwargs – Additional filtering parameters

Returns:

Geographic boundary data

Return type:

geopandas.GeoDataFrame

Examples

>>> # Get county boundaries for Texas
>>> tx_counties = get_geography("county", state="TX", year=2022)
>>>
>>> # Get tract boundaries for Harris County, TX
>>> harris_tracts = get_geography(
...     "tract",
...     state="TX",
...     county="201",
...     year=2022
... )
pytidycensus.load_variables(year, dataset, survey=None, cache=True, cache_dir=None)[source]

Load Census variables for a given dataset and year.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name (‘acs’, ‘dec’, ‘pep’, etc.)

  • survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’, ‘sf1’, ‘pl’)

  • cache (bool, default True) – Whether to cache variables for faster future access

  • cache_dir (str, optional) – Directory for caching. Defaults to user cache directory.

Returns:

Variables with columns: name, label, concept, predicateType, group, limit

Return type:

pd.DataFrame

Examples

>>> # Load ACS 5-year variables for 2022
>>> acs_vars = load_variables(2022, "acs", "acs5")
>>>
>>> # Search for income-related variables
>>> income_vars = acs_vars[acs_vars['label'].str.contains('income', case=False)]
>>>
>>> # Load decennial census variables for 2020
>>> dec_vars = load_variables(2020, "dec", "pl")
pytidycensus.search_variables(pattern, year, dataset, survey=None, field='label')[source]

Search for variables by pattern in labels, concepts, or names.

Parameters:
  • pattern (str) – Search pattern (case-insensitive)

  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

  • field (str, default "label") – Field to search in (‘label’, ‘concept’, ‘name’, or ‘all’)

Returns:

Matching variables

Return type:

pd.DataFrame

Examples

>>> # Search for income variables in ACS
>>> income_vars = search_variables("income", 2022, "acs", "acs5")
>>>
>>> # Search for population in concepts
>>> pop_vars = search_variables("population", 2020, "dec", "pl", field="concept")
pytidycensus.get_table_variables(table, year, dataset, survey=None)[source]

Get all variables for a specific table.

Parameters:
  • table (str) – Table code (e.g., ‘B19013’, ‘P1’)

  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

Returns:

Variables for the specified table

Return type:

pd.DataFrame

Examples

>>> # Get all variables for median household income table
>>> b19013_vars = get_table_variables("B19013", 2022, "acs", "acs5")
>>>
>>> # Get all variables for race table in 2020 Census
>>> p1_vars = get_table_variables("P1", 2020, "dec", "pl")
pytidycensus.get_credentials()[source]

Submodules

pytidycensus.api module

Core Census API client for making requests to the US Census Bureau APIs.

class pytidycensus.api.CensusAPI(api_key=None, cache_dir=None)[source]

Bases: object

Core client for interacting with US Census Bureau APIs.

Handles authentication, rate limiting, caching, and error handling for Census API requests.

BASE_URL = 'https://api.census.gov/data'
__init__(api_key=None, cache_dir=None)[source]

Initialize Census API client.

Parameters:
  • api_key (str, optional) – Census API key. If not provided, will look for CENSUS_API_KEY environment variable.

  • cache_dir (str, optional) – Directory for caching API responses. Defaults to user cache directory.

get(year, dataset, variables, geography, survey=None, show_call=False)[source]

Make a request to the Census API.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name (e.g., ‘acs’, ‘dec’)

  • variables (List[str]) – List of variable codes to retrieve

  • geography (Dict[str, str]) – Geography specification (e.g., {‘for’: ‘county:*’, ‘in’: ‘state:06’})

  • survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’)

  • show_call (bool, default False) – Whether to print the API call URL

Returns:

Parsed JSON response from API

Return type:

List[Dict[str, Any]]

Raises:
  • requests.RequestException – If API request fails

  • ValueError – If API returns error response

get_geography_codes(year, dataset, survey=None)[source]

Get available geography codes for a dataset.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

Returns:

Available geography codes

Return type:

Dict[str, Any]

get_variables(year, dataset, survey=None)[source]

Get available variables for a dataset.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

Returns:

Available variables with metadata

Return type:

Dict[str, Any]

pytidycensus.api.set_census_api_key(api_key)[source]

Set Census API key as environment variable.

Parameters:

api_key (str) – Census API key obtained from https://api.census.gov/data/key_signup.html

Raises:

ValueError – If the API key is not a string of exactly 40 characters

Return type:

None

pytidycensus.acs module

American Community Survey (ACS) data retrieval functions.

pytidycensus.acs.get_acs(geography, variables=None, table=None, cache_table=False, year=2022, survey='acs5', state=None, county=None, zcta=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, moe_level=90, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the American Community Survey (ACS).

Parameters:
  • geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).

  • variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.

  • table (str, optional) – ACS table ID to retrieve all variables from.

  • cache_table (bool, default False) – Whether to cache table names for faster future access.

  • year (int, default 2022) – Year of ACS data (2009-2022 for 5-year, 2005-2022 for 1-year).

  • survey (str, default "acs5") – ACS survey type (“acs1”, “acs3”, or “acs5”).

  • state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.

  • county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.

  • zcta (str or list of str, optional) – ZIP Code Tabulation Area(s) to retrieve data for. Geography must be “zcta”.

  • output (str, default "tidy") – Output format (“tidy” or “wide”).

  • geometry (bool, default False) – Whether to include geometry for mapping.

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.

  • shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.

  • summary_var (str, optional) – Summary variable from the ACS to include for comparison (e.g. total population).

  • moe_level (int, default 90) – Confidence level for margin of error (90, 95, or 99).

  • api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.

  • show_call (bool, default False) – Whether to print the API call URL.

  • **kwargs – Additional parameters passed to geography functions.

Returns:

ACS data, optionally with geometry.

Return type:

pandas.DataFrame or geopandas.GeoDataFrame

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get median household income by county in Texas
>>> tx_income = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     year=2022
... )
>>>
>>> # Get data with geometry for mapping
>>> tx_income_geo = tc.get_acs(
...     geography="county",
...     variables="B19013_001",
...     state="TX",
...     geometry=True
... )
>>>
>>> # Get data with named variables
>>> tx_demo = tc.get_acs(
...     geography="county",
...     variables={"total_pop": "B01003_001", "median_income": "B19013_001"},
...     state="TX",
...     year=2022
... )
pytidycensus.acs.get_acs_variables(year=2022, survey='acs5')[source]

Get available ACS variables for a given year and survey.

Parameters:
  • year (int, default 2022) – ACS year

  • survey (str, default "acs5") – Survey type (“acs1” or “acs5”)

Returns:

Available variables with metadata

Return type:

pd.DataFrame

pytidycensus.decennial module

Decennial Census data retrieval functions.

pytidycensus.decennial.get_decennial(geography, variables=None, table=None, cache_table=False, year=2020, sumfile=None, state=None, county=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, pop_group=None, pop_group_label=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Decennial Census.

Parameters:
  • geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).

  • variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.

  • table (str, optional) – Census table ID to retrieve all variables from.

  • cache_table (bool, default False) – Whether to cache table names for faster future access.

  • year (int, default 2020) – Census year (2000, 2010, or 2020). Note: 1990 data is not available via the API.

  • sumfile (str, optional) – Summary file to use. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years. Available options vary by year.

  • state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.

  • county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.

  • output (str, default "tidy") – Output format (“tidy” or “wide”).

  • geometry (bool, default False) – Whether to include geometry for mapping.

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.

  • shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.

  • summary_var (str, optional) – Summary variable from the decennial Census to include for comparison.

  • pop_group (str, optional) – Population group code for which you’d like to request data (for selected sumfiles).

  • pop_group_label (bool, default False) – If True, return a pop_group_label column with the population group description.

  • api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.

  • show_call (bool, default False) – Whether to print the API call URL.

  • **kwargs – Additional parameters passed to geography functions.

Returns:

Decennial Census data, optionally with geometry.

Return type:

pandas.DataFrame or geopandas.GeoDataFrame

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population by state for 2020
>>> pop_2020 = tc.get_decennial(
...     geography="state",
...     variables="P1_001N",
...     year=2020
... )
>>>
>>> # Get race/ethnicity data with geometry
>>> race_data = tc.get_decennial(
...     geography="county",
...     variables=["P1_003N", "P1_004N", "P1_005N"],
...     state="CA",
...     year=2020,
...     geometry=True
... )
>>>
>>> # Get data with named variables and summary variable
>>> pop_data = tc.get_decennial(
...     geography="county",
...     variables={"total": "P1_001N", "white": "P1_003N"},
...     state="TX",
...     year=2020,
...     summary_var="P1_001N"
... )
pytidycensus.decennial.get_decennial_variables(year=2020, sumfile=None)[source]

Get available decennial Census variables for a given year.

Parameters:
  • year (int, default 2020) – Census year

  • sumfile (str, optional) – Summary file. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years.

Returns:

Available variables with metadata

Return type:

pd.DataFrame

pytidycensus.estimates module

Population estimates data retrieval functions.

exception pytidycensus.estimates.PopulationEstimatesError[source]

Bases: Exception

Base exception class for Population Estimates errors.

exception pytidycensus.estimates.InvalidGeographyError[source]

Bases: PopulationEstimatesError

Raised when an invalid geography is specified.

exception pytidycensus.estimates.InvalidVariableError[source]

Bases: PopulationEstimatesError

Raised when an invalid variable is specified.

exception pytidycensus.estimates.DataNotAvailableError[source]

Bases: PopulationEstimatesError

Raised when requested data is not available.

exception pytidycensus.estimates.APIError[source]

Bases: PopulationEstimatesError

Raised when there are issues with API requests.

pytidycensus.estimates.get_estimates(geography, product=None, variables=None, breakdown=None, breakdown_labels=False, vintage=2024, year=None, state=None, county=None, time_series=False, output='tidy', geometry=False, keep_geo_vars=False, api_key=None, show_call=False, **kwargs)[source]

Obtain data from the US Census Bureau Population Estimates Program.

The Population Estimates Program (PEP) produces estimates of the population for the United States, its states, counties, cities, and towns. For years 2020 and later, data is retrieved from flat CSV files. For years 2019 and earlier, data comes from the Census API.

Parameters:
  • geography (str) – The geography of your data. Options include: - ‘us’ (United States) - ‘region’ (Census regions) - ‘division’ (Census divisions) - ‘state’ (States and DC) - ‘county’ (Counties) - ‘cbsa’ (Core Based Statistical Areas) - ‘metropolitan statistical area/micropolitan statistical area’ (alias for cbsa) - ‘combined statistical area’ (Combined Statistical Areas) - ‘place’ (Incorporated places and Census designated places)

  • product (str, optional) – The data product. Options include: - ‘population’ (population totals) - ‘components’ (components of population change) - ‘characteristics’ (population by demographics) For years 2020+, only ‘characteristics’ requires this parameter.

  • variables (str or list of str, optional) – Variable ID(s) to retrieve. Use ‘all’ to get all available variables. Common variables include: ‘POP’, ‘BIRTHS’, ‘DEATHS’, ‘DOMESTICMIG’, ‘INTERNATIONALMIG’

  • breakdown (list of str, optional) – Population breakdown for characteristics product. Options include: - ‘AGEGROUP’ (age groups) - ‘SEX’ (sex) - ‘RACE’ (race) - ‘HISP’ (Hispanic origin) Can be combined, e.g., [‘SEX’, ‘RACE’]

  • breakdown_labels (bool, default False) – Whether to include human-readable labels for breakdown categories.

  • vintage (int, default 2024) – The PEP vintage (dataset version year). Recommended to use the most recent.

  • year (int, optional) – The specific data year. Defaults to vintage if not specified.

  • state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.

  • county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.

  • time_series (bool, default False) – Whether to retrieve time series data back to 2010.

  • output (str, default "tidy") – Output format (“tidy” or “wide”).

  • geometry (bool, default False) – Whether to include geometry for mapping.

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.

  • api_key (str, optional) – Census API key for years 2019 and earlier.

  • show_call (bool, default False) – Whether to print the API call URL (for API-based requests).

  • **kwargs – Additional parameters passed to geography functions.

Returns:

Population estimates data, optionally with geometry.

Return type:

pandas.DataFrame or geopandas.GeoDataFrame

Examples

>>> import pytidycensus as tc
>>> tc.set_census_api_key("your_key_here")
>>>
>>> # Get total population estimates by state
>>> state_pop = tc.get_estimates(
...     geography="state",
...     variables="POP",
...     year=2022
... )
>>>
>>> # Get population by age and sex for counties in Texas
>>> tx_pop_demo = tc.get_estimates(
...     geography="county",
...     variables="POP",
...     breakdown=["SEX", "AGEGROUP"],
...     state="TX",
...     breakdown_labels=True
... )
pytidycensus.estimates.discover_available_variables(vintage=2024, geography='state')[source]

Discover all available variables in a PEP dataset.

Parameters:
  • vintage (int, default 2024) – The vintage year of the dataset

  • geography (str, default "state") – The geography to check for available variables

Returns:

DataFrame with variable names and descriptions

Return type:

pd.DataFrame

pytidycensus.estimates.get_estimates_variables(year=2022)[source]

Get available population estimates variables for a given year.

Parameters:

year (int, default 2022) – Estimates year

Returns:

Available variables with metadata

Return type:

pd.DataFrame

pytidycensus.geography module

Geographic boundary data retrieval and processing using TIGER shapefiles.

class pytidycensus.geography.TigerDownloader(cache_dir=None)[source]

Bases: object

Downloads and processes TIGER/Line shapefiles from the US Census Bureau.

BASE_URL = 'https://www2.census.gov/geo/tiger'
__init__(cache_dir=None)[source]

Initialize TIGER downloader.

Parameters:

cache_dir (str, optional) – Directory for caching downloaded files

static download_with_wget_or_curl(url, zip_path)[source]
download_and_extract(url, filename)[source]

Download and extract TIGER shapefile.

Parameters:
  • url (str) – Download URL

  • filename (str) – Local filename for caching

Returns:

Path to extracted shapefile directory

Return type:

str

get_shapefile_path(extract_dir)[source]

Find the shapefile (.shp) in the extracted directory.

Parameters:

extract_dir (str) – Directory containing extracted files

Returns:

Path to .shp file

Return type:

str

pytidycensus.geography.get_geography(geography, year=2022, state=None, county=None, keep_geo_vars=False, cache_dir=None, **kwargs)[source]

Download and load geographic boundary data from TIGER/Line shapefiles.

Parameters:
  • geography (str) – Geography type (e.g., ‘county’, ‘tract’, ‘block group’)

  • year (int, default 2022) – Census year for boundaries

  • state (str, int, or list, optional) – State(s) to filter data for

  • county (str, int, or list, optional) – County(ies) to filter data for (requires state)

  • keep_geo_vars (bool, default False) – Whether to keep all geographic variables

  • cache_dir (str, optional) – Directory for caching downloaded files

  • **kwargs – Additional filtering parameters

Returns:

Geographic boundary data

Return type:

geopandas.GeoDataFrame

Examples

>>> # Get county boundaries for Texas
>>> tx_counties = get_geography("county", state="TX", year=2022)
>>>
>>> # Get tract boundaries for Harris County, TX
>>> harris_tracts = get_geography(
...     "tract",
...     state="TX",
...     county="201",
...     year=2022
... )
pytidycensus.geography.get_state_boundaries(year=2022, **kwargs)[source]

Get US state boundaries.

Return type:

GeoDataFrame

pytidycensus.geography.get_county_boundaries(state=None, year=2022, **kwargs)[source]

Get US county boundaries, optionally filtered by state.

Return type:

GeoDataFrame

pytidycensus.geography.get_tract_boundaries(state, county=None, year=2022, **kwargs)[source]

Get census tract boundaries for a state, optionally filtered by county.

Return type:

GeoDataFrame

pytidycensus.geography.get_block_group_boundaries(state, county=None, year=2022, **kwargs)[source]

Get block group boundaries for a state, optionally filtered by county.

Return type:

GeoDataFrame

pytidycensus.utils module

Utility functions for data processing and validation.

pytidycensus.utils.get_credentials()[source]
pytidycensus.utils.load_county_lookup()[source]

Load county lookup table from national_county.txt.

Returns:

DataFrame with columns: state_abbrev, state_fips, county_fips, county_name

Return type:

pd.DataFrame

pytidycensus.utils.add_name_column(df)[source]

Add NAME column using national_county.txt lookup table for geographic areas.

Works for state, county, and tract level geographies by matching GEOID. For tract-level data, shows county and state name without tract number.

Parameters:

df (pd.DataFrame) – DataFrame with GEOID column

Returns:

DataFrame with NAME column added

Return type:

pd.DataFrame

pytidycensus.utils.validate_state(state)[source]

Validate and convert state identifiers to FIPS codes.

Parameters:

state (str, int, or list) – State name(s), abbreviation(s), or FIPS code(s)

Returns:

List of 2-digit FIPS codes

Return type:

List[str]

Raises:

ValueError – If state identifier is invalid

pytidycensus.utils.validate_county(county, state_fips)[source]

Validate and convert county identifiers to FIPS codes.

Parameters:
  • county (str, int, or list) – County name(s) or FIPS code(s)

  • state_fips (str) – State FIPS code

Returns:

List of 3-digit county FIPS codes

Return type:

List[str]

Raises:

ValueError – If county identifier is invalid

pytidycensus.utils.lookup_county_fips(county_name, state_fips)[source]

Look up county FIPS code by name.

Parameters:
  • county_name (str) – County name to look up

  • state_fips (str) – State FIPS code

Returns:

County FIPS code if found, None otherwise

Return type:

Optional[str]

pytidycensus.utils.validate_year(year, dataset)[source]

Validate year for given dataset.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset type (‘acs’, ‘dec’, ‘estimates’)

Returns:

Validated year

Return type:

int

Raises:

ValueError – If year is not available for dataset

pytidycensus.utils.validate_geography(geography, dataset=None)[source]

Validate geography parameter.

Parameters:
  • geography (str) – Geography level

  • dataset (str, optional) – Dataset type (“acs”, “decennial”, “estimates”) for context-aware validation

Returns:

Validated geography

Return type:

str

Raises:
pytidycensus.utils.build_geography_params(geography, state=None, county=None, **kwargs)[source]

Build geography parameters for Census API call.

Parameters:
  • geography (str) – Geography level

  • state (str, int, or list, optional) – State identifier(s)

  • county (str, int, or list, optional) – County identifier(s)

  • **kwargs – Additional geography parameters

Returns:

Geography parameters for API call

Return type:

Dict[str, str]

Raises:

NotImplementedError – If geography is recognized but not yet implemented

pytidycensus.utils.process_census_data(data, variables, output='tidy')[source]

Process raw Census API response into pandas DataFrame.

Parameters:
  • data (List[Dict[str, Any]]) – Raw Census API response

  • variables (List[str]) – Variable codes requested

  • output (str, default "tidy") – Output format (“tidy” or “wide”)

Returns:

Processed data

Return type:

pd.DataFrame

pytidycensus.utils.add_margin_of_error(df, variables, moe_level=90, output='tidy')[source]

Add margin of error columns for ACS data with confidence level adjustment.

Parameters:
  • df (pd.DataFrame) – Census data

  • variables (List[str]) – Variable codes

  • moe_level (int, default 90) – Confidence level (90, 95, or 99)

Returns:

Data with margin of error columns

Return type:

pd.DataFrame

pytidycensus.variables module

Census variable loading and caching functionality.

pytidycensus.variables.load_variables(year, dataset, survey=None, cache=True, cache_dir=None)[source]

Load Census variables for a given dataset and year.

Parameters:
  • year (int) – Census year

  • dataset (str) – Dataset name (‘acs’, ‘dec’, ‘pep’, etc.)

  • survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’, ‘sf1’, ‘pl’)

  • cache (bool, default True) – Whether to cache variables for faster future access

  • cache_dir (str, optional) – Directory for caching. Defaults to user cache directory.

Returns:

Variables with columns: name, label, concept, predicateType, group, limit

Return type:

pd.DataFrame

Examples

>>> # Load ACS 5-year variables for 2022
>>> acs_vars = load_variables(2022, "acs", "acs5")
>>>
>>> # Search for income-related variables
>>> income_vars = acs_vars[acs_vars['label'].str.contains('income', case=False)]
>>>
>>> # Load decennial census variables for 2020
>>> dec_vars = load_variables(2020, "dec", "pl")
pytidycensus.variables.search_variables(pattern, year, dataset, survey=None, field='label')[source]

Search for variables by pattern in labels, concepts, or names.

Parameters:
  • pattern (str) – Search pattern (case-insensitive)

  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

  • field (str, default "label") – Field to search in (‘label’, ‘concept’, ‘name’, or ‘all’)

Returns:

Matching variables

Return type:

pd.DataFrame

Examples

>>> # Search for income variables in ACS
>>> income_vars = search_variables("income", 2022, "acs", "acs5")
>>>
>>> # Search for population in concepts
>>> pop_vars = search_variables("population", 2020, "dec", "pl", field="concept")
pytidycensus.variables.get_table_variables(table, year, dataset, survey=None)[source]

Get all variables for a specific table.

Parameters:
  • table (str) – Table code (e.g., ‘B19013’, ‘P1’)

  • year (int) – Census year

  • dataset (str) – Dataset name

  • survey (str, optional) – Survey type

Returns:

Variables for the specified table

Return type:

pd.DataFrame

Examples

>>> # Get all variables for median household income table
>>> b19013_vars = get_table_variables("B19013", 2022, "acs", "acs5")
>>>
>>> # Get all variables for race table in 2020 Census
>>> p1_vars = get_table_variables("P1", 2020, "dec", "pl")
pytidycensus.variables.clear_cache(cache_dir=None)[source]

Clear the variables cache.

Parameters:

cache_dir (str, optional) – Cache directory to clear. Defaults to user cache directory.

Return type:

None

pytidycensus.variables.list_available_datasets(year)[source]

List available datasets for a given year.

Parameters:

year (int) – Census year

Returns:

Available datasets and their surveys

Return type:

Dict[str, list]

Come study with us at The George Washington University

GWU Geography & Environment