pytidycensus package
pytidycensus: Python interface to US Census Bureau APIs
A Python library that provides an integrated interface to several United States Census Bureau APIs and geographic boundary files. Allows users to return Census and ACS data as pandas DataFrames, and optionally returns GeoPandas GeoDataFrames with feature geometry for mapping and spatial analysis.
- class pytidycensus.CensusAPI(api_key=None, cache_dir=None)[source]
Bases:
object
Core client for interacting with US Census Bureau APIs.
Handles authentication, rate limiting, caching, and error handling for Census API requests.
- BASE_URL = 'https://api.census.gov/data'
- get(year, dataset, variables, geography, survey=None, show_call=False)[source]
Make a request to the Census API.
- Parameters:
year (int) – Census year
dataset (str) – Dataset name (e.g., ‘acs’, ‘dec’)
variables (List[str]) – List of variable codes to retrieve
geography (Dict[str, str]) – Geography specification (e.g., {‘for’: ‘county:*’, ‘in’: ‘state:06’})
survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’)
show_call (bool, default False) – Whether to print the API call URL
- Returns:
Parsed JSON response from API
- Return type:
List[Dict[str, Any]]
- Raises:
requests.RequestException – If API request fails
ValueError – If API returns error response
- pytidycensus.set_census_api_key(api_key)[source]
Set Census API key as environment variable.
- Parameters:
api_key (str) – Census API key obtained from https://api.census.gov/data/key_signup.html
- Raises:
ValueError – If the API key is not a string of exactly 40 characters
- Return type:
- pytidycensus.get_acs(geography, variables=None, table=None, cache_table=False, year=2022, survey='acs5', state=None, county=None, zcta=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, moe_level=90, api_key=None, show_call=False, **kwargs)[source]
Obtain data from the American Community Survey (ACS).
- Parameters:
geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).
variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.
table (str, optional) – ACS table ID to retrieve all variables from.
cache_table (bool, default False) – Whether to cache table names for faster future access.
year (int, default 2022) – Year of ACS data (2009-2022 for 5-year, 2005-2022 for 1-year).
survey (str, default "acs5") – ACS survey type (“acs1”, “acs3”, or “acs5”).
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
zcta (str or list of str, optional) – ZIP Code Tabulation Area(s) to retrieve data for. Geography must be “zcta”.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.
summary_var (str, optional) – Summary variable from the ACS to include for comparison (e.g. total population).
moe_level (int, default 90) – Confidence level for margin of error (90, 95, or 99).
api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.
show_call (bool, default False) – Whether to print the API call URL.
**kwargs – Additional parameters passed to geography functions.
- Returns:
ACS data, optionally with geometry.
- Return type:
Examples
>>> import pytidycensus as tc >>> tc.set_census_api_key("your_key_here") >>> >>> # Get median household income by county in Texas >>> tx_income = tc.get_acs( ... geography="county", ... variables="B19013_001", ... state="TX", ... year=2022 ... ) >>> >>> # Get data with geometry for mapping >>> tx_income_geo = tc.get_acs( ... geography="county", ... variables="B19013_001", ... state="TX", ... geometry=True ... ) >>> >>> # Get data with named variables >>> tx_demo = tc.get_acs( ... geography="county", ... variables={"total_pop": "B01003_001", "median_income": "B19013_001"}, ... state="TX", ... year=2022 ... )
- pytidycensus.get_decennial(geography, variables=None, table=None, cache_table=False, year=2020, sumfile=None, state=None, county=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, pop_group=None, pop_group_label=False, api_key=None, show_call=False, **kwargs)[source]
Obtain data from the US Decennial Census.
- Parameters:
geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).
variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.
table (str, optional) – Census table ID to retrieve all variables from.
cache_table (bool, default False) – Whether to cache table names for faster future access.
year (int, default 2020) – Census year (2000, 2010, or 2020). Note: 1990 data is not available via the API.
sumfile (str, optional) – Summary file to use. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years. Available options vary by year.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.
summary_var (str, optional) – Summary variable from the decennial Census to include for comparison.
pop_group (str, optional) – Population group code for which you’d like to request data (for selected sumfiles).
pop_group_label (bool, default False) – If True, return a pop_group_label column with the population group description.
api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.
show_call (bool, default False) – Whether to print the API call URL.
**kwargs – Additional parameters passed to geography functions.
- Returns:
Decennial Census data, optionally with geometry.
- Return type:
Examples
>>> import pytidycensus as tc >>> tc.set_census_api_key("your_key_here") >>> >>> # Get total population by state for 2020 >>> pop_2020 = tc.get_decennial( ... geography="state", ... variables="P1_001N", ... year=2020 ... ) >>> >>> # Get race/ethnicity data with geometry >>> race_data = tc.get_decennial( ... geography="county", ... variables=["P1_003N", "P1_004N", "P1_005N"], ... state="CA", ... year=2020, ... geometry=True ... ) >>> >>> # Get data with named variables and summary variable >>> pop_data = tc.get_decennial( ... geography="county", ... variables={"total": "P1_001N", "white": "P1_003N"}, ... state="TX", ... year=2020, ... summary_var="P1_001N" ... )
- pytidycensus.get_estimates(geography, product=None, variables=None, breakdown=None, breakdown_labels=False, vintage=2024, year=None, state=None, county=None, time_series=False, output='tidy', geometry=False, keep_geo_vars=False, api_key=None, show_call=False, **kwargs)[source]
Obtain data from the US Census Bureau Population Estimates Program.
The Population Estimates Program (PEP) produces estimates of the population for the United States, its states, counties, cities, and towns. For years 2020 and later, data is retrieved from flat CSV files. For years 2019 and earlier, data comes from the Census API.
- Parameters:
geography (str) – The geography of your data. Options include: - ‘us’ (United States) - ‘region’ (Census regions) - ‘division’ (Census divisions) - ‘state’ (States and DC) - ‘county’ (Counties) - ‘cbsa’ (Core Based Statistical Areas) - ‘metropolitan statistical area/micropolitan statistical area’ (alias for cbsa) - ‘combined statistical area’ (Combined Statistical Areas) - ‘place’ (Incorporated places and Census designated places)
product (str, optional) – The data product. Options include: - ‘population’ (population totals) - ‘components’ (components of population change) - ‘characteristics’ (population by demographics) For years 2020+, only ‘characteristics’ requires this parameter.
variables (str or list of str, optional) – Variable ID(s) to retrieve. Use ‘all’ to get all available variables. Common variables include: ‘POP’, ‘BIRTHS’, ‘DEATHS’, ‘DOMESTICMIG’, ‘INTERNATIONALMIG’
breakdown (list of str, optional) – Population breakdown for characteristics product. Options include: - ‘AGEGROUP’ (age groups) - ‘SEX’ (sex) - ‘RACE’ (race) - ‘HISP’ (Hispanic origin) Can be combined, e.g., [‘SEX’, ‘RACE’]
breakdown_labels (bool, default False) – Whether to include human-readable labels for breakdown categories.
vintage (int, default 2024) – The PEP vintage (dataset version year). Recommended to use the most recent.
year (int, optional) – The specific data year. Defaults to vintage if not specified.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
time_series (bool, default False) – Whether to retrieve time series data back to 2010.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
api_key (str, optional) – Census API key for years 2019 and earlier.
show_call (bool, default False) – Whether to print the API call URL (for API-based requests).
**kwargs – Additional parameters passed to geography functions.
- Returns:
Population estimates data, optionally with geometry.
- Return type:
Examples
>>> import pytidycensus as tc >>> tc.set_census_api_key("your_key_here") >>> >>> # Get total population estimates by state >>> state_pop = tc.get_estimates( ... geography="state", ... variables="POP", ... year=2022 ... ) >>> >>> # Get population by age and sex for counties in Texas >>> tx_pop_demo = tc.get_estimates( ... geography="county", ... variables="POP", ... breakdown=["SEX", "AGEGROUP"], ... state="TX", ... breakdown_labels=True ... )
- pytidycensus.get_geography(geography, year=2022, state=None, county=None, keep_geo_vars=False, cache_dir=None, **kwargs)[source]
Download and load geographic boundary data from TIGER/Line shapefiles.
- Parameters:
geography (str) – Geography type (e.g., ‘county’, ‘tract’, ‘block group’)
year (int, default 2022) – Census year for boundaries
state (str, int, or list, optional) – State(s) to filter data for
county (str, int, or list, optional) – County(ies) to filter data for (requires state)
keep_geo_vars (bool, default False) – Whether to keep all geographic variables
cache_dir (str, optional) – Directory for caching downloaded files
**kwargs – Additional filtering parameters
- Returns:
Geographic boundary data
- Return type:
Examples
>>> # Get county boundaries for Texas >>> tx_counties = get_geography("county", state="TX", year=2022) >>> >>> # Get tract boundaries for Harris County, TX >>> harris_tracts = get_geography( ... "tract", ... state="TX", ... county="201", ... year=2022 ... )
- pytidycensus.load_variables(year, dataset, survey=None, cache=True, cache_dir=None)[source]
Load Census variables for a given dataset and year.
- Parameters:
year (int) – Census year
dataset (str) – Dataset name (‘acs’, ‘dec’, ‘pep’, etc.)
survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’, ‘sf1’, ‘pl’)
cache (bool, default True) – Whether to cache variables for faster future access
cache_dir (str, optional) – Directory for caching. Defaults to user cache directory.
- Returns:
Variables with columns: name, label, concept, predicateType, group, limit
- Return type:
pd.DataFrame
Examples
>>> # Load ACS 5-year variables for 2022 >>> acs_vars = load_variables(2022, "acs", "acs5") >>> >>> # Search for income-related variables >>> income_vars = acs_vars[acs_vars['label'].str.contains('income', case=False)] >>> >>> # Load decennial census variables for 2020 >>> dec_vars = load_variables(2020, "dec", "pl")
- pytidycensus.search_variables(pattern, year, dataset, survey=None, field='label')[source]
Search for variables by pattern in labels, concepts, or names.
- Parameters:
- Returns:
Matching variables
- Return type:
pd.DataFrame
Examples
>>> # Search for income variables in ACS >>> income_vars = search_variables("income", 2022, "acs", "acs5") >>> >>> # Search for population in concepts >>> pop_vars = search_variables("population", 2020, "dec", "pl", field="concept")
- pytidycensus.get_table_variables(table, year, dataset, survey=None)[source]
Get all variables for a specific table.
- Parameters:
- Returns:
Variables for the specified table
- Return type:
pd.DataFrame
Examples
>>> # Get all variables for median household income table >>> b19013_vars = get_table_variables("B19013", 2022, "acs", "acs5") >>> >>> # Get all variables for race table in 2020 Census >>> p1_vars = get_table_variables("P1", 2020, "dec", "pl")
Submodules
pytidycensus.api module
Core Census API client for making requests to the US Census Bureau APIs.
- class pytidycensus.api.CensusAPI(api_key=None, cache_dir=None)[source]
Bases:
object
Core client for interacting with US Census Bureau APIs.
Handles authentication, rate limiting, caching, and error handling for Census API requests.
- BASE_URL = 'https://api.census.gov/data'
- get(year, dataset, variables, geography, survey=None, show_call=False)[source]
Make a request to the Census API.
- Parameters:
year (int) – Census year
dataset (str) – Dataset name (e.g., ‘acs’, ‘dec’)
variables (List[str]) – List of variable codes to retrieve
geography (Dict[str, str]) – Geography specification (e.g., {‘for’: ‘county:*’, ‘in’: ‘state:06’})
survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’)
show_call (bool, default False) – Whether to print the API call URL
- Returns:
Parsed JSON response from API
- Return type:
List[Dict[str, Any]]
- Raises:
requests.RequestException – If API request fails
ValueError – If API returns error response
- pytidycensus.api.set_census_api_key(api_key)[source]
Set Census API key as environment variable.
- Parameters:
api_key (str) – Census API key obtained from https://api.census.gov/data/key_signup.html
- Raises:
ValueError – If the API key is not a string of exactly 40 characters
- Return type:
pytidycensus.acs module
American Community Survey (ACS) data retrieval functions.
- pytidycensus.acs.get_acs(geography, variables=None, table=None, cache_table=False, year=2022, survey='acs5', state=None, county=None, zcta=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, moe_level=90, api_key=None, show_call=False, **kwargs)[source]
Obtain data from the American Community Survey (ACS).
- Parameters:
geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).
variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.
table (str, optional) – ACS table ID to retrieve all variables from.
cache_table (bool, default False) – Whether to cache table names for faster future access.
year (int, default 2022) – Year of ACS data (2009-2022 for 5-year, 2005-2022 for 1-year).
survey (str, default "acs5") – ACS survey type (“acs1”, “acs3”, or “acs5”).
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
zcta (str or list of str, optional) – ZIP Code Tabulation Area(s) to retrieve data for. Geography must be “zcta”.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.
summary_var (str, optional) – Summary variable from the ACS to include for comparison (e.g. total population).
moe_level (int, default 90) – Confidence level for margin of error (90, 95, or 99).
api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.
show_call (bool, default False) – Whether to print the API call URL.
**kwargs – Additional parameters passed to geography functions.
- Returns:
ACS data, optionally with geometry.
- Return type:
Examples
>>> import pytidycensus as tc >>> tc.set_census_api_key("your_key_here") >>> >>> # Get median household income by county in Texas >>> tx_income = tc.get_acs( ... geography="county", ... variables="B19013_001", ... state="TX", ... year=2022 ... ) >>> >>> # Get data with geometry for mapping >>> tx_income_geo = tc.get_acs( ... geography="county", ... variables="B19013_001", ... state="TX", ... geometry=True ... ) >>> >>> # Get data with named variables >>> tx_demo = tc.get_acs( ... geography="county", ... variables={"total_pop": "B01003_001", "median_income": "B19013_001"}, ... state="TX", ... year=2022 ... )
pytidycensus.decennial module
Decennial Census data retrieval functions.
- pytidycensus.decennial.get_decennial(geography, variables=None, table=None, cache_table=False, year=2020, sumfile=None, state=None, county=None, output='tidy', geometry=False, keep_geo_vars=False, shift_geo=False, summary_var=None, pop_group=None, pop_group_label=False, api_key=None, show_call=False, **kwargs)[source]
Obtain data from the US Decennial Census.
- Parameters:
geography (str) – The geography of your data (e.g., ‘county’, ‘tract’, ‘block group’).
variables (str, list of str, or dict, optional) – Variable ID(s) to retrieve. Can be a single variable, list of variables, or dictionary mapping custom names to variable IDs. If not provided, must specify table.
table (str, optional) – Census table ID to retrieve all variables from.
cache_table (bool, default False) – Whether to cache table names for faster future access.
year (int, default 2020) – Census year (2000, 2010, or 2020). Note: 1990 data is not available via the API.
sumfile (str, optional) – Summary file to use. Defaults to ‘pl’ for 2020, ‘sf1’ for earlier years. Available options vary by year.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
shift_geo (bool, default False) – (Deprecated) If True, warn user to use alternative geometry shifting.
summary_var (str, optional) – Summary variable from the decennial Census to include for comparison.
pop_group (str, optional) – Population group code for which you’d like to request data (for selected sumfiles).
pop_group_label (bool, default False) – If True, return a pop_group_label column with the population group description.
api_key (str, optional) – Census API key. If not provided, looks for CENSUS_API_KEY environment variable.
show_call (bool, default False) – Whether to print the API call URL.
**kwargs – Additional parameters passed to geography functions.
- Returns:
Decennial Census data, optionally with geometry.
- Return type:
Examples
>>> import pytidycensus as tc >>> tc.set_census_api_key("your_key_here") >>> >>> # Get total population by state for 2020 >>> pop_2020 = tc.get_decennial( ... geography="state", ... variables="P1_001N", ... year=2020 ... ) >>> >>> # Get race/ethnicity data with geometry >>> race_data = tc.get_decennial( ... geography="county", ... variables=["P1_003N", "P1_004N", "P1_005N"], ... state="CA", ... year=2020, ... geometry=True ... ) >>> >>> # Get data with named variables and summary variable >>> pop_data = tc.get_decennial( ... geography="county", ... variables={"total": "P1_001N", "white": "P1_003N"}, ... state="TX", ... year=2020, ... summary_var="P1_001N" ... )
pytidycensus.estimates module
Population estimates data retrieval functions.
- exception pytidycensus.estimates.PopulationEstimatesError[source]
Bases:
Exception
Base exception class for Population Estimates errors.
- exception pytidycensus.estimates.InvalidGeographyError[source]
Bases:
PopulationEstimatesError
Raised when an invalid geography is specified.
- exception pytidycensus.estimates.InvalidVariableError[source]
Bases:
PopulationEstimatesError
Raised when an invalid variable is specified.
- exception pytidycensus.estimates.DataNotAvailableError[source]
Bases:
PopulationEstimatesError
Raised when requested data is not available.
- exception pytidycensus.estimates.APIError[source]
Bases:
PopulationEstimatesError
Raised when there are issues with API requests.
- pytidycensus.estimates.get_estimates(geography, product=None, variables=None, breakdown=None, breakdown_labels=False, vintage=2024, year=None, state=None, county=None, time_series=False, output='tidy', geometry=False, keep_geo_vars=False, api_key=None, show_call=False, **kwargs)[source]
Obtain data from the US Census Bureau Population Estimates Program.
The Population Estimates Program (PEP) produces estimates of the population for the United States, its states, counties, cities, and towns. For years 2020 and later, data is retrieved from flat CSV files. For years 2019 and earlier, data comes from the Census API.
- Parameters:
geography (str) – The geography of your data. Options include: - ‘us’ (United States) - ‘region’ (Census regions) - ‘division’ (Census divisions) - ‘state’ (States and DC) - ‘county’ (Counties) - ‘cbsa’ (Core Based Statistical Areas) - ‘metropolitan statistical area/micropolitan statistical area’ (alias for cbsa) - ‘combined statistical area’ (Combined Statistical Areas) - ‘place’ (Incorporated places and Census designated places)
product (str, optional) – The data product. Options include: - ‘population’ (population totals) - ‘components’ (components of population change) - ‘characteristics’ (population by demographics) For years 2020+, only ‘characteristics’ requires this parameter.
variables (str or list of str, optional) – Variable ID(s) to retrieve. Use ‘all’ to get all available variables. Common variables include: ‘POP’, ‘BIRTHS’, ‘DEATHS’, ‘DOMESTICMIG’, ‘INTERNATIONALMIG’
breakdown (list of str, optional) – Population breakdown for characteristics product. Options include: - ‘AGEGROUP’ (age groups) - ‘SEX’ (sex) - ‘RACE’ (race) - ‘HISP’ (Hispanic origin) Can be combined, e.g., [‘SEX’, ‘RACE’]
breakdown_labels (bool, default False) – Whether to include human-readable labels for breakdown categories.
vintage (int, default 2024) – The PEP vintage (dataset version year). Recommended to use the most recent.
year (int, optional) – The specific data year. Defaults to vintage if not specified.
state (str, int, or list, optional) – State(s) to retrieve data for. Accepts names, abbreviations, or FIPS codes.
county (str, int, or list, optional) – County(ies) to retrieve data for. Must be used with state.
time_series (bool, default False) – Whether to retrieve time series data back to 2010.
output (str, default "tidy") – Output format (“tidy” or “wide”).
geometry (bool, default False) – Whether to include geometry for mapping.
keep_geo_vars (bool, default False) – Whether to keep all geographic variables from shapefiles.
api_key (str, optional) – Census API key for years 2019 and earlier.
show_call (bool, default False) – Whether to print the API call URL (for API-based requests).
**kwargs – Additional parameters passed to geography functions.
- Returns:
Population estimates data, optionally with geometry.
- Return type:
Examples
>>> import pytidycensus as tc >>> tc.set_census_api_key("your_key_here") >>> >>> # Get total population estimates by state >>> state_pop = tc.get_estimates( ... geography="state", ... variables="POP", ... year=2022 ... ) >>> >>> # Get population by age and sex for counties in Texas >>> tx_pop_demo = tc.get_estimates( ... geography="county", ... variables="POP", ... breakdown=["SEX", "AGEGROUP"], ... state="TX", ... breakdown_labels=True ... )
pytidycensus.geography module
Geographic boundary data retrieval and processing using TIGER shapefiles.
- class pytidycensus.geography.TigerDownloader(cache_dir=None)[source]
Bases:
object
Downloads and processes TIGER/Line shapefiles from the US Census Bureau.
- BASE_URL = 'https://www2.census.gov/geo/tiger'
- __init__(cache_dir=None)[source]
Initialize TIGER downloader.
- Parameters:
cache_dir (str, optional) – Directory for caching downloaded files
- pytidycensus.geography.get_geography(geography, year=2022, state=None, county=None, keep_geo_vars=False, cache_dir=None, **kwargs)[source]
Download and load geographic boundary data from TIGER/Line shapefiles.
- Parameters:
geography (str) – Geography type (e.g., ‘county’, ‘tract’, ‘block group’)
year (int, default 2022) – Census year for boundaries
state (str, int, or list, optional) – State(s) to filter data for
county (str, int, or list, optional) – County(ies) to filter data for (requires state)
keep_geo_vars (bool, default False) – Whether to keep all geographic variables
cache_dir (str, optional) – Directory for caching downloaded files
**kwargs – Additional filtering parameters
- Returns:
Geographic boundary data
- Return type:
Examples
>>> # Get county boundaries for Texas >>> tx_counties = get_geography("county", state="TX", year=2022) >>> >>> # Get tract boundaries for Harris County, TX >>> harris_tracts = get_geography( ... "tract", ... state="TX", ... county="201", ... year=2022 ... )
- pytidycensus.geography.get_state_boundaries(year=2022, **kwargs)[source]
Get US state boundaries.
- Return type:
- pytidycensus.geography.get_county_boundaries(state=None, year=2022, **kwargs)[source]
Get US county boundaries, optionally filtered by state.
- Return type:
- pytidycensus.geography.get_tract_boundaries(state, county=None, year=2022, **kwargs)[source]
Get census tract boundaries for a state, optionally filtered by county.
- Return type:
pytidycensus.utils module
Utility functions for data processing and validation.
- pytidycensus.utils.load_county_lookup()[source]
Load county lookup table from national_county.txt.
- Returns:
DataFrame with columns: state_abbrev, state_fips, county_fips, county_name
- Return type:
pd.DataFrame
- pytidycensus.utils.add_name_column(df)[source]
Add NAME column using national_county.txt lookup table for geographic areas.
Works for state, county, and tract level geographies by matching GEOID. For tract-level data, shows county and state name without tract number.
- Parameters:
df (pd.DataFrame) – DataFrame with GEOID column
- Returns:
DataFrame with NAME column added
- Return type:
pd.DataFrame
- pytidycensus.utils.validate_state(state)[source]
Validate and convert state identifiers to FIPS codes.
- Parameters:
state (str, int, or list) – State name(s), abbreviation(s), or FIPS code(s)
- Returns:
List of 2-digit FIPS codes
- Return type:
List[str]
- Raises:
ValueError – If state identifier is invalid
- pytidycensus.utils.validate_county(county, state_fips)[source]
Validate and convert county identifiers to FIPS codes.
- pytidycensus.utils.lookup_county_fips(county_name, state_fips)[source]
Look up county FIPS code by name.
- pytidycensus.utils.validate_year(year, dataset)[source]
Validate year for given dataset.
- Parameters:
- Returns:
Validated year
- Return type:
- Raises:
ValueError – If year is not available for dataset
- pytidycensus.utils.validate_geography(geography, dataset=None)[source]
Validate geography parameter.
- Parameters:
- Returns:
Validated geography
- Return type:
- Raises:
ValueError – If geography is not recognized
NotImplementedError – If geography is recognized but not implemented for the specified dataset
- pytidycensus.utils.build_geography_params(geography, state=None, county=None, **kwargs)[source]
Build geography parameters for Census API call.
- Parameters:
- Returns:
Geography parameters for API call
- Return type:
- Raises:
NotImplementedError – If geography is recognized but not yet implemented
- pytidycensus.utils.process_census_data(data, variables, output='tidy')[source]
Process raw Census API response into pandas DataFrame.
pytidycensus.variables module
Census variable loading and caching functionality.
- pytidycensus.variables.load_variables(year, dataset, survey=None, cache=True, cache_dir=None)[source]
Load Census variables for a given dataset and year.
- Parameters:
year (int) – Census year
dataset (str) – Dataset name (‘acs’, ‘dec’, ‘pep’, etc.)
survey (str, optional) – Survey type (e.g., ‘acs5’, ‘acs1’, ‘sf1’, ‘pl’)
cache (bool, default True) – Whether to cache variables for faster future access
cache_dir (str, optional) – Directory for caching. Defaults to user cache directory.
- Returns:
Variables with columns: name, label, concept, predicateType, group, limit
- Return type:
pd.DataFrame
Examples
>>> # Load ACS 5-year variables for 2022 >>> acs_vars = load_variables(2022, "acs", "acs5") >>> >>> # Search for income-related variables >>> income_vars = acs_vars[acs_vars['label'].str.contains('income', case=False)] >>> >>> # Load decennial census variables for 2020 >>> dec_vars = load_variables(2020, "dec", "pl")
- pytidycensus.variables.search_variables(pattern, year, dataset, survey=None, field='label')[source]
Search for variables by pattern in labels, concepts, or names.
- Parameters:
- Returns:
Matching variables
- Return type:
pd.DataFrame
Examples
>>> # Search for income variables in ACS >>> income_vars = search_variables("income", 2022, "acs", "acs5") >>> >>> # Search for population in concepts >>> pop_vars = search_variables("population", 2020, "dec", "pl", field="concept")
- pytidycensus.variables.get_table_variables(table, year, dataset, survey=None)[source]
Get all variables for a specific table.
- Parameters:
- Returns:
Variables for the specified table
- Return type:
pd.DataFrame
Examples
>>> # Get all variables for median household income table >>> b19013_vars = get_table_variables("B19013", 2022, "acs", "acs5") >>> >>> # Get all variables for race table in 2020 Census >>> p1_vars = get_table_variables("P1", 2020, "dec", "pl")
- pytidycensus.variables.list_available_datasets(year)[source]
List available datasets for a given year.
Come study with us at The George Washington University
