pytidycensus.utils

Utility functions for data processing and validation.

Functions

`add_margin_of_error`(df, variables[, ...])	Add margin of error columns for ACS data with confidence level adjustment.
`add_name_column`(df)	Add NAME column using national_county.txt lookup table for geographic areas.
`build_geography_params`(geography[, state, ...])	Build geography parameters for Census API call.
`check_overlapping_acs_periods`(years, survey)	Check for and warn about overlapping ACS periods.
`get_credentials`()
`load_county_lookup`()	Load county lookup table from national_county.txt.
`lookup_county_fips`(county_name, state_fips)	Look up county FIPS code by name.
`process_census_data`(data, variables[, output])	Process raw Census API response into pandas DataFrame.
`validate_county`(county, state_fips)	Validate and convert county identifiers to FIPS codes.
`validate_geography`(geography[, dataset])	Validate geography parameter.
`validate_state`(state)	Validate and convert state identifiers to FIPS codes.
`validate_year`(year, dataset)	Validate year for given dataset.

pytidycensus.utils.get_credentials()[source]

pytidycensus.utils.load_county_lookup()[source]

Load county lookup table from national_county.txt.

Returns:: DataFrame with columns: state_abbrev, state_fips, county_fips, county_name
Return type:: pd.DataFrame

pytidycensus.utils.add_name_column(df)[source]

Add NAME column using national_county.txt lookup table for geographic areas.

Works for state, county, and tract level geographies by matching GEOID. For tract-level data, shows county and state name without tract number.

Parameters:: df (pd.DataFrame) – DataFrame with GEOID column
Returns:: DataFrame with NAME column added
Return type:: pd.DataFrame

pytidycensus.utils.validate_state(state)[source]

Validate and convert state identifiers to FIPS codes.

Parameters:: state (str, int, or list) – State name(s), abbreviation(s), or FIPS code(s)
Returns:: List of 2-digit FIPS codes
Return type:: List[str]
Raises:: ValueError – If state identifier is invalid

pytidycensus.utils.validate_county(county, state_fips)[source]

Validate and convert county identifiers to FIPS codes.

Parameters:

county (str, int, or list) – County name(s) or FIPS code(s)
state_fips (str) – State FIPS code

Returns:

List of 3-digit county FIPS codes

Return type:

List[str]

Raises:

ValueError – If county identifier is invalid

pytidycensus.utils.lookup_county_fips(county_name, state_fips)[source]

Look up county FIPS code by name.

Parameters:

county_name (str) – County name to look up
state_fips (str) – State FIPS code

Returns:

County FIPS code if found, None otherwise

Return type:

Optional[str]

pytidycensus.utils.validate_year(year, dataset)[source]

Validate year for given dataset.

Parameters:

year (int) – Census year
dataset (str) – Dataset type (‘acs’, ‘dec’, ‘estimates’)

Returns:

Validated year

Return type:

int

Raises:

ValueError – If year is not available for dataset

pytidycensus.utils.check_overlapping_acs_periods(years, survey)[source]

Check for and warn about overlapping ACS periods.

Overlapping ACS periods (e.g., 2018 and 2019 for ACS5) share common years and should not be used for statistical comparisons or trend analysis.

Parameters:

years (list of int) – Years being requested
survey (str) – Survey type (‘acs1’, ‘acs3’, or ‘acs5’)

Return type:

None

Warning

UserWarning: If overlapping periods are detected

pytidycensus.utils.validate_geography(geography, dataset=None)[source]

Validate geography parameter.

Parameters:

geography (str) – Geography level
dataset (str, optional) – Dataset type (“acs”, “decennial”, “estimates”) for context-aware validation

Returns:

Validated geography

Return type:

str

Raises:

ValueError – If geography is not recognized
NotImplementedError – If geography is recognized but not implemented for the specified dataset

pytidycensus.utils.build_geography_params(geography, state=None, county=None, **kwargs)[source]

Build geography parameters for Census API call.

Parameters:

geography (str) – Geography level
state (str, int, or list, optional) – State identifier(s)
county (str, int, or list, optional) – County identifier(s)
**kwargs – Additional geography parameters

Returns:

Geography parameters for API call

Return type:

Dict[str, str]

Raises:

NotImplementedError – If geography is recognized but not yet implemented

pytidycensus.utils.process_census_data(data, variables, output='tidy')[source]

Process raw Census API response into pandas DataFrame.

Parameters:

data (List[Dict[str, Any]]) – Raw Census API response
variables (List[str]) – Variable codes requested
output (str, default "tidy") – Output format (“tidy” or “wide”)

Returns:

Processed data

Return type:

pd.DataFrame

pytidycensus.utils.add_margin_of_error(df, variables, moe_level=90, output='tidy')[source]

Add margin of error columns for ACS data with confidence level adjustment.

Parameters:

df (pd.DataFrame) – Census data
variables (List[str]) – Variable codes
moe_level (int, default 90) – Confidence level (90, 95, or 99)

Returns:

Data with margin of error columns

Return type:

pd.DataFrame