{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Simple Time Series Analysis with Census Data\n", "\n", "This tutorial teaches you how to analyze changes in demographic data over time using the US Census Bureau's datasets. We'll focus on **geographic levels that don't change over time** (like states and counties) to keep things simple and avoid complex boundary adjustments.\n", "\n", "## What You'll Learn\n", "\n", "1. **The Golden Rule**: Only compare like survey types (ACS5↔ACS5, Decennial↔Decennial)\n", "2. **Population trends** using decennial census data (2010 vs 2020)\n", "3. **Income trends** using ACS 5-year data (2012 vs 2020)\n", "4. **Best practices** for temporal analysis\n", "5. **Visualization techniques** for demographic change\n", "\n", "## Why This Matters\n", "\n", "Understanding demographic change over time helps with:\n", "- Urban planning and policy decisions\n", "- Business location and market analysis \n", "- Research on social and economic trends\n", "- Grant writing and community development" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup: Import Libraries and API Key\n", "\n", "First, let's import the libraries we need and set up our Census API key." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Libraries imported successfully!\n", "Using pytidycensus version: 1.0.4\n" ] } ], "source": [ "# Import required libraries\n", "import pytidycensus as tc\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "# Make plots look nicer\n", "plt.style.use('default')\n", "plt.rcParams['figure.figsize'] = (12, 8)\n", "plt.rcParams['font.size'] = 11\n", "\n", "print(\"Libraries imported successfully!\")\n", "print(f\"Using pytidycensus version: {tc.__version__}\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Remember to set your Census API key above!\n", " Get one free at: https://api.census.gov/data/key_signup.html\n" ] } ], "source": [ "# Set your Census API key here\n", "# Get a free key at: https://api.census.gov/data/key_signup.html\n", "\n", "# UNCOMMENT and add your key:\n", "# tc.set_census_api_key(\"YOUR_API_KEY_HERE\")\n", "\n", "print(\" Remember to set your Census API key above!\")\n", "print(\" Get one free at: https://api.census.gov/data/key_signup.html\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Golden Rule of Census Time Series\n", "\n", "**CRITICAL**: Only compare surveys of the same type!\n", "\n", "### ✅ CORRECT Comparisons\n", "- **Decennial 2010 ↔ Decennial 2020**: Complete population counts\n", "- **ACS 5-year 2012 ↔ ACS 5-year 2020**: Same methodology, sample size\n", "- **ACS 1-year 2019 ↔ ACS 1-year 2021**: Recent estimates for large areas\n", "\n", "### ❌ WRONG Comparisons\n", "- **ACS 1-year ↔ ACS 5-year**: Different sample sizes and time periods\n", "- **Decennial ↔ ACS**: Different methodologies (complete count vs. sample)\n", "\n", "### Why This Matters" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SURVEY TYPE COMPARISON:\n", "==================================================\n", "DECENNIAL CENSUS:\n", " • Complete count of all households\n", " • Very low margin of error\n", " • Every 10 years (2010, 2020, 2030...)\n", " • Best for: Long-term trends, small areas\n", "\n", "ACS 5-YEAR:\n", " • Sample survey (~3.5M addresses/year)\n", " • 5 years of data combined for stability\n", " • Available for all geographies\n", " • Best for: Small areas, stable trends\n", "\n", "ACS 1-YEAR:\n", " • Sample survey (~3.5M addresses/year)\n", " • Single year of data\n", " • Only areas with 65,000+ population\n", " • Best for: Large areas, recent trends\n" ] } ], "source": [ "# Let's demonstrate why survey type matters\n", "print(\"SURVEY TYPE COMPARISON:\")\n", "print(\"=\" * 50)\n", "print(\"DECENNIAL CENSUS:\")\n", "print(\" • Complete count of all households\")\n", "print(\" • Very low margin of error\")\n", "print(\" • Every 10 years (2010, 2020, 2030...)\")\n", "print(\" • Best for: Long-term trends, small areas\")\n", "print()\n", "print(\"ACS 5-YEAR:\")\n", "print(\" • Sample survey (~3.5M addresses/year)\")\n", "print(\" • 5 years of data combined for stability\")\n", "print(\" • Available for all geographies\")\n", "print(\" • Best for: Small areas, stable trends\")\n", "print()\n", "print(\"ACS 1-YEAR:\")\n", "print(\" • Sample survey (~3.5M addresses/year)\")\n", "print(\" • Single year of data\")\n", "print(\" • Only areas with 65,000+ population\")\n", "print(\" • Best for: Large areas, recent trends\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Population Change Analysis (Decennial Census)\n", "\n", "Let's start by analyzing population changes in the Washington DC metro area between 2010 and 2020 using decennial census data. We'll compare **DC, Maryland, and Virginia** at the state level.\n", "\n", "### Why Use State Level?\n", "- State boundaries don't change over time\n", "- No need for complex boundary adjustments\n", "- Reliable and straightforward comparison" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Fetching 2010 decennial census data...\n", " Variable: P001001 (Total Population)\n", "Getting data from the 2010 decennial Census\n", "Using Census Summary File 1\n", "Retrieved data for 3 states\n", "\n", "2010 Population Data:\n", " NAME total_pop\n", " DC 601723\n", "Maryland 5773552\n", "Virginia 8001024\n" ] } ], "source": [ "# Step 1: Get 2010 population data for DC metro states\n", "print(\" Fetching 2010 decennial census data...\")\n", "print(\" Variable: P001001 (Total Population)\")\n", "\n", "# Define the states we want to analyze\n", "metro_states = [\"DC\", \"MD\", \"VA\"]\n", "\n", "# Get 2010 data\n", "pop_2010 = tc.get_decennial(\n", " geography=\"state\",\n", " variables={\"total_pop\": \"P001001\"}, # P001001 = Total Population in 2010\n", " state=metro_states,\n", " year=2010,\n", " output=\"wide\" # Wide format puts variables as columns\n", ")\n", "\n", "print(f\"Retrieved data for {len(pop_2010)} states\")\n", "print(\"\\n2010 Population Data:\")\n", "print(pop_2010[['NAME', 'total_pop']].to_string(index=False))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Fetching 2020 decennial census data...\n", " Variable: P1_001N (Total Population)\n", " Note: Variable codes changed between 2010 and 2020!\n", "Getting data from the 2020 decennial Census\n", "Using the PL 94-171 Redistricting Data Summary File\n", "Retrieved data for 3 states\n", "\n", "2020 Population Data:\n", " NAME total_pop\n", " DC 689545\n", "Maryland 6177224\n", "Virginia 8631393\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/mmann1123/Documents/github/pytidycensus/pytidycensus/decennial.py:429: UserWarning: Note: 2020 decennial Census data use differential privacy, a technique that introduces errors into data to preserve respondent confidentiality. Small counts should be interpreted with caution. See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.\n", " warnings.warn(\n" ] } ], "source": [ "# Step 2: Get 2020 population data\n", "print(\" Fetching 2020 decennial census data...\")\n", "print(\" Variable: P1_001N (Total Population)\")\n", "print(\" Note: Variable codes changed between 2010 and 2020!\")\n", "\n", "# Get 2020 data - NOTE: Different variable code!\n", "pop_2020 = tc.get_decennial(\n", " geography=\"state\",\n", " variables={\"total_pop\": \"P1_001N\"}, # P1_001N = Total Population in 2020\n", " state=metro_states,\n", " year=2020,\n", " output=\"wide\"\n", ")\n", "\n", "print(f\"Retrieved data for {len(pop_2020)} states\")\n", "print(\"\\n2020 Population Data:\")\n", "print(pop_2020[['NAME', 'total_pop']].to_string(index=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 🔍 Key Learning Point: Variable Codes Change!\n", "\n", "Notice that we used different variable codes:\n", "- **2010**: `P001001` \n", "- **2020**: `P1_001N`\n", "\n", "This is common when comparing across census years. Always check variable definitions!" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Calculating population changes...\n", "Population change analysis complete!\n", "\n", "Population Change Summary (2010-2020):\n", "============================================================\n", "DC:\n", " 2010: 601,723\n", " 2020: 689,545\n", " Change: +87,822 (+14.6%)\n", "\n", "Maryland:\n", " 2010: 5,773,552\n", " 2020: 6,177,224\n", " Change: +403,672 (+7.0%)\n", "\n", "Virginia:\n", " 2010: 8,001,024\n", " 2020: 8,631,393\n", " Change: +630,369 (+7.9%)\n", "\n" ] } ], "source": [ "# Step 3: Merge the data and calculate changes\n", "print(\" Calculating population changes...\")\n", "\n", "# Merge 2010 and 2020 data on state name\n", "pop_change = pd.merge(\n", " pop_2010[['NAME', 'total_pop']].rename(columns={'total_pop': 'pop_2010'}),\n", " pop_2020[['NAME', 'total_pop']].rename(columns={'total_pop': 'pop_2020'}),\n", " on='NAME'\n", ")\n", "\n", "# Calculate absolute and percentage changes\n", "pop_change['change_absolute'] = pop_change['pop_2020'] - pop_change['pop_2010']\n", "pop_change['change_percent'] = (pop_change['change_absolute'] / pop_change['pop_2010']) * 100\n", "\n", "print(\"Population change analysis complete!\")\n", "print(\"\\nPopulation Change Summary (2010-2020):\")\n", "print(\"=\" * 60)\n", "\n", "for _, row in pop_change.iterrows():\n", " print(f\"{row['NAME']}:\")\n", " print(f\" 2010: {row['pop_2010']:,}\")\n", " print(f\" 2020: {row['pop_2020']:,}\")\n", " print(f\" Change: {row['change_absolute']:+,} ({row['change_percent']:+.1f}%)\")\n", " print()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "DC METRO AREA SUMMARY:\n", " Total 2010 Population: 14,376,299\n", " Total 2020 Population: 15,498,162\n", " Net Change: +1,121,863 (+7.8%)\n" ] } ], "source": [ "# Step 4: Visualize the population changes\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))\n", "\n", "# Chart 1: Absolute change\n", "colors = ['red' if x < 0 else 'steelblue' for x in pop_change['change_absolute']]\n", "bars1 = ax1.bar(pop_change['NAME'], pop_change['change_absolute'], color=colors, alpha=0.7)\n", "ax1.set_title('Population Change 2010-2020\\n(Absolute Numbers)', fontsize=14, fontweight='bold')\n", "ax1.set_ylabel('Population Change')\n", "ax1.axhline(y=0, color='black', linestyle='-', alpha=0.3)\n", "ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x/1000:.0f}K'))\n", "\n", "# Add value labels on bars\n", "for bar, value in zip(bars1, pop_change['change_absolute']):\n", " height = bar.get_height()\n", " ax1.text(bar.get_x() + bar.get_width()/2., height + (5000 if height >= 0 else -15000),\n", " f'{value/1000:.0f}K', ha='center', va='bottom' if height >= 0 else 'top', fontweight='bold')\n", "\n", "# Chart 2: Percentage change\n", "colors2 = ['red' if x < 0 else 'darkgreen' for x in pop_change['change_percent']]\n", "bars2 = ax2.bar(pop_change['NAME'], pop_change['change_percent'], color=colors2, alpha=0.7)\n", "ax2.set_title('Population Change 2010-2020\\n(Percentage)', fontsize=14, fontweight='bold')\n", "ax2.set_ylabel('Percent Change (%)')\n", "ax2.axhline(y=0, color='black', linestyle='-', alpha=0.3)\n", "\n", "# Add value labels on bars\n", "for bar, value in zip(bars2, pop_change['change_percent']):\n", " height = bar.get_height()\n", " ax2.text(bar.get_x() + bar.get_width()/2., height + (0.3 if height >= 0 else -0.8),\n", " f'{value:.1f}%', ha='center', va='bottom' if height >= 0 else 'top', fontweight='bold')\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "# Summary statistics\n", "total_2010 = pop_change['pop_2010'].sum()\n", "total_2020 = pop_change['pop_2020'].sum()\n", "total_change = total_2020 - total_2010\n", "total_pct = (total_change / total_2010) * 100\n", "\n", "print(f\"DC METRO AREA SUMMARY:\")\n", "print(f\" Total 2010 Population: {total_2010:,}\")\n", "print(f\" Total 2020 Population: {total_2020:,}\")\n", "print(f\" Net Change: {total_change:+,} ({total_pct:+.1f}%)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Income Change Analysis (ACS 5-Year Data)\n", "\n", "Now let's analyze how median household income changed in the DC metro area using **ACS 5-year data**. We'll compare 2012 (2008-2012 ACS) with 2020 (2016-2020 ACS).\n", "\n", "### Why These Years?\n", "- **2012 ACS 5-year**: Represents 2008-2012 period (pre-recession recovery)\n", "- **2020 ACS 5-year**: Represents 2016-2020 period (recent data)\n", "- **8-year gap**: Provides meaningful temporal separation\n", "\n", "### County-Level Analysis\n", "We'll look at specific counties in the DC metro area that are economically important." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Target Counties for Income Analysis:\n", " • Washington DC\n", " • Montgomery County, MD\n", " • Arlington County, VA\n", " • Fairfax County, VA\n", "\n", "Variable: B19013_001E (Median Household Income)\n", "Comparing: 2012 ACS 5-year vs 2020 ACS 5-year\n" ] } ], "source": [ "# Define the counties we want to analyze\n", "counties_to_analyze = [\n", " {\"state\": \"DC\", \"county\": None, \"display_name\": \"Washington DC\"},\n", " {\"state\": \"MD\", \"county\": \"Montgomery County\", \"display_name\": \"Montgomery County, MD\"},\n", " {\"state\": \"VA\", \"county\": \"Arlington County\", \"display_name\": \"Arlington County, VA\"},\n", " {\"state\": \"VA\", \"county\": \"Fairfax County\", \"display_name\": \"Fairfax County, VA\"}\n", "]\n", "\n", "print(\"Target Counties for Income Analysis:\")\n", "for county in counties_to_analyze:\n", " print(f\" • {county['display_name']}\")\n", "print()\n", "print(\"Variable: B19013_001E (Median Household Income)\")\n", "print(\"Comparing: 2012 ACS 5-year vs 2020 ACS 5-year\")" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fetching 2012 ACS 5-year data (2008-2012)...\n", "Getting data from the 2008-2012 5-year ACS\n", " Washington DC: $64,267\n", "Getting data from the 2008-2012 5-year ACS\n", " Montgomery County, MD: $96,985\n", "Getting data from the 2008-2012 5-year ACS\n", " Arlington County, VA: $102,459\n", "Getting data from the 2008-2012 5-year ACS\n", " Fairfax County, VA: $109,383\n", "\n", " Successfully retrieved 2012 data for 4 counties\n" ] } ], "source": [ "# Step 1: Get 2012 ACS income data\n", "print(\"Fetching 2012 ACS 5-year data (2008-2012)...\")\n", "\n", "income_2012_data = []\n", "\n", "for county_info in counties_to_analyze:\n", " try:\n", " income_data = tc.get_acs(\n", " geography=\"county\",\n", " variables={\"median_income\": \"B19013_001E\"},\n", " state=county_info[\"state\"],\n", " county=county_info[\"county\"], # None for DC (state-equivalent)\n", " year=2012,\n", " survey=\"acs5\",\n", " output=\"wide\"\n", " )\n", " \n", " # Add display name for easier tracking\n", " income_data['display_name'] = county_info['display_name']\n", " income_2012_data.append(income_data)\n", " \n", " print(f\" {county_info['display_name']}: ${income_data.iloc[0]['median_income']:,}\")\n", " \n", " except Exception as e:\n", " print(f\" {county_info['display_name']}: Error - {str(e)[:50]}...\")\n", "\n", "# Combine all 2012 data\n", "if income_2012_data:\n", " income_2012_combined = pd.concat(income_2012_data, ignore_index=True)\n", " print(f\"\\n Successfully retrieved 2012 data for {len(income_2012_combined)} counties\")\n", "else:\n", " print(\"\\n No 2012 data retrieved\")" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fetching 2020 ACS 5-year data (2016-2020)...\n", "Getting data from the 2016-2020 5-year ACS\n", " Washington DC: $90,842\n", "Getting data from the 2016-2020 5-year ACS\n", " Montgomery County, MD: $111,812\n", "Getting data from the 2016-2020 5-year ACS\n", " Arlington County, VA: $122,604\n", "Getting data from the 2016-2020 5-year ACS\n", " Fairfax County, VA: $127,866\n", "\n", " Successfully retrieved 2020 data for 4 counties\n" ] } ], "source": [ "# Step 2: Get 2020 ACS income data\n", "print(\"Fetching 2020 ACS 5-year data (2016-2020)...\")\n", "\n", "income_2020_data = []\n", "\n", "for county_info in counties_to_analyze:\n", " try:\n", " income_data = tc.get_acs(\n", " geography=\"county\",\n", " variables={\"median_income\": \"B19013_001E\"},\n", " state=county_info[\"state\"],\n", " county=county_info[\"county\"],\n", " year=2020,\n", " survey=\"acs5\",\n", " output=\"wide\"\n", " )\n", " \n", " income_data['display_name'] = county_info['display_name']\n", " income_2020_data.append(income_data)\n", " \n", " print(f\" {county_info['display_name']}: ${income_data.iloc[0]['median_income']:,}\")\n", " \n", " except Exception as e:\n", " print(f\" {county_info['display_name']}: Error - {str(e)[:50]}...\")\n", "\n", "# Combine all 2020 data\n", "if income_2020_data:\n", " income_2020_combined = pd.concat(income_2020_data, ignore_index=True)\n", " print(f\"\\n Successfully retrieved 2020 data for {len(income_2020_combined)} counties\")\n", "else:\n", " print(\"\\n No 2020 data retrieved\")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Calculating income changes...\n", " Income change analysis complete!\n", "\n", "Median Income Change Summary (2012-2020):\n", "======================================================================\n", " Washington DC:\n", " 2012: $64,267\n", " 2020: $90,842\n", " Change: $+26,575 (+41.4%)\n", "\n", " Montgomery County, MD:\n", " 2012: $96,985\n", " 2020: $111,812\n", " Change: $+14,827 (+15.3%)\n", "\n", " Arlington County, VA:\n", " 2012: $102,459\n", " 2020: $122,604\n", " Change: $+20,145 (+19.7%)\n", "\n", " Fairfax County, VA:\n", " 2012: $109,383\n", " 2020: $127,866\n", " Change: $+18,483 (+16.9%)\n", "\n" ] } ], "source": [ "# Step 3: Calculate income changes\n", "if 'income_2012_combined' in locals() and 'income_2020_combined' in locals():\n", " print(\" Calculating income changes...\")\n", " \n", " # Merge data on display name\n", " income_change = pd.merge(\n", " income_2012_combined[['display_name', 'median_income']].rename(columns={'median_income': 'income_2012'}),\n", " income_2020_combined[['display_name', 'median_income']].rename(columns={'median_income': 'income_2020'}),\n", " on='display_name'\n", " )\n", " \n", " # Calculate changes\n", " income_change['change_absolute'] = income_change['income_2020'] - income_change['income_2012']\n", " income_change['change_percent'] = (income_change['change_absolute'] / income_change['income_2012']) * 100\n", " \n", " print(\" Income change analysis complete!\")\n", " print(\"\\nMedian Income Change Summary (2012-2020):\")\n", " print(\"=\" * 70)\n", " \n", " for _, row in income_change.iterrows():\n", " print(f\" {row['display_name']}:\")\n", " print(f\" 2012: ${row['income_2012']:,}\")\n", " print(f\" 2020: ${row['income_2020']:,}\")\n", " print(f\" Change: ${row['change_absolute']:+,} ({row['change_percent']:+.1f}%)\")\n", " print()\n", "else:\n", " print(\"Cannot calculate changes - missing data\")\n", " income_change = pd.DataFrame() # Empty dataframe for later checks" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "INCOME ANALYSIS SUMMARY:\n", " Average income change: 23.3%\n", " Counties with income growth: 4\n", " Counties with income decline: 0\n" ] } ], "source": [ "# Step 4: Visualize income changes\n", "if not income_change.empty:\n", " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))\n", " \n", " # Chart 1: Income levels comparison\n", " x = np.arange(len(income_change))\n", " width = 0.35\n", " \n", " bars1 = ax1.bar(x - width/2, income_change['income_2012'], width, \n", " label='2012 (2008-2012 ACS)', color='lightcoral', alpha=0.8)\n", " bars2 = ax1.bar(x + width/2, income_change['income_2020'], width,\n", " label='2020 (2016-2020 ACS)', color='steelblue', alpha=0.8)\n", " \n", " ax1.set_title('Median Household Income Comparison\\n2012 vs 2020', fontsize=14, fontweight='bold')\n", " ax1.set_ylabel('Median Income ($)')\n", " ax1.set_xticks(x)\n", " ax1.set_xticklabels([name.replace(' County', '').replace(', VA', '').replace(', MD', '') \n", " for name in income_change['display_name']], rotation=45)\n", " ax1.legend()\n", " ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))\n", " \n", " # Add value labels\n", " for bar in bars1:\n", " height = bar.get_height()\n", " ax1.text(bar.get_x() + bar.get_width()/2., height + 1000,\n", " f'${height/1000:.0f}K', ha='center', va='bottom', fontsize=9)\n", " \n", " for bar in bars2:\n", " height = bar.get_height()\n", " ax1.text(bar.get_x() + bar.get_width()/2., height + 1000,\n", " f'${height/1000:.0f}K', ha='center', va='bottom', fontsize=9)\n", " \n", " # Chart 2: Percentage change\n", " colors = ['red' if x < 0 else 'darkgreen' for x in income_change['change_percent']]\n", " bars3 = ax2.bar(income_change['display_name'].str.replace(' County', '').str.replace(', VA', '').str.replace(', MD', ''),\n", " income_change['change_percent'], color=colors, alpha=0.7)\n", " \n", " ax2.set_title('Income Change 2012-2020\\n(Percentage)', fontsize=14, fontweight='bold')\n", " ax2.set_ylabel('Percent Change (%)')\n", " ax2.axhline(y=0, color='black', linestyle='-', alpha=0.3)\n", " ax2.tick_params(axis='x', rotation=45)\n", " \n", " # Add value labels\n", " for bar, value in zip(bars3, income_change['change_percent']):\n", " height = bar.get_height()\n", " ax2.text(bar.get_x() + bar.get_width()/2., height + (0.5 if height >= 0 else -1.5),\n", " f'{value:.1f}%', ha='center', va='bottom' if height >= 0 else 'top', fontweight='bold')\n", " \n", " plt.tight_layout()\n", " plt.show()\n", " \n", " # Summary statistics\n", " avg_change = income_change['change_percent'].mean()\n", " print(f\"INCOME ANALYSIS SUMMARY:\")\n", " print(f\" Average income change: {avg_change:.1f}%\")\n", " print(f\" Counties with income growth: {(income_change['change_percent'] > 0).sum()}\")\n", " print(f\" Counties with income decline: {(income_change['change_percent'] < 0).sum()}\")\n", " \n", "else:\n", " print(\" Cannot create visualization - no income data available\")\n", " print(\" This might be due to API key issues or data availability\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Understanding Your Results\n", "\n", "### What Do These Numbers Mean?\n", "\n", "**Population Changes (Decennial Census)**\n", "- Shows actual population growth or decline\n", "- DC typically shows high growth due to urban revitalization\n", "- Suburban areas may show different patterns\n", "\n", "**Income Changes (ACS 5-Year)**\n", "- Reflects economic conditions over time\n", "- Adjusted for inflation, this shows real purchasing power changes\n", "- High-income areas often show faster income growth\n", "\n", "### Important Considerations" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " IMPORTANT DATA QUALITY CONSIDERATIONS:\n", "==================================================\n", "\n", " DECENNIAL CENSUS:\n", " Advantages:\n", " • Complete population count (not a sample)\n", " • Very accurate for population totals\n", " • Available for all geographic levels\n", " Limitations:\n", " • Only every 10 years\n", " • Limited variables (basic demographics only)\n", " • 2020 data uses differential privacy (slight noise added)\n", "\n", " ACS 5-YEAR:\n", " Advantages:\n", " • Rich set of variables (income, education, housing, etc.)\n", " • Annual updates\n", " • Available for small geographies\n", " Limitations:\n", " • Sample-based (margins of error)\n", " • 5-year averages may mask recent changes\n", " • Smaller areas have larger margins of error\n", "\n", " BEST PRACTICES:\n", " • Always check margins of error for ACS data\n", " • Consider real vs. nominal changes (adjust for inflation)\n", " • Look for consistent patterns across multiple indicators\n", " • Document your methodology and assumptions\n" ] } ], "source": [ "# Let's discuss data quality and limitations\n", "print(\" IMPORTANT DATA QUALITY CONSIDERATIONS:\")\n", "print(\"=\" * 50)\n", "print()\n", "print(\" DECENNIAL CENSUS:\")\n", "print(\" Advantages:\")\n", "print(\" • Complete population count (not a sample)\")\n", "print(\" • Very accurate for population totals\")\n", "print(\" • Available for all geographic levels\")\n", "print(\" Limitations:\")\n", "print(\" • Only every 10 years\")\n", "print(\" • Limited variables (basic demographics only)\")\n", "print(\" • 2020 data uses differential privacy (slight noise added)\")\n", "print()\n", "print(\" ACS 5-YEAR:\")\n", "print(\" Advantages:\")\n", "print(\" • Rich set of variables (income, education, housing, etc.)\")\n", "print(\" • Annual updates\")\n", "print(\" • Available for small geographies\")\n", "print(\" Limitations:\")\n", "print(\" • Sample-based (margins of error)\")\n", "print(\" • 5-year averages may mask recent changes\")\n", "print(\" • Smaller areas have larger margins of error\")\n", "print()\n", "print(\" BEST PRACTICES:\")\n", "print(\" • Always check margins of error for ACS data\")\n", "print(\" • Consider real vs. nominal changes (adjust for inflation)\")\n", "print(\" • Look for consistent patterns across multiple indicators\")\n", "print(\" • Document your methodology and assumptions\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 4: Advanced Topics and Next Steps\n", "\n", "### When Geographic Boundaries Change\n", "\n", "For this tutorial, we used **states and counties** because their boundaries are stable over time. But what if you need to analyze **census tracts** or other small geographies that change?\n", "\n", "**Solution: Area Interpolation**\n", "- Use the `tobler` library's `area_interpolate()` function\n", "- Redistributes data from old boundaries to new boundaries\n", "- Accounts for how areas were split or merged\n", "\n", "See our advanced tutorial: `time_series_analysis.md` for tract-level analysis with boundary changes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary: Key Takeaways\n", "\n", "### What You've Learned\n", "\n", "1. **The Golden Rule**: Only compare like survey types\n", " - Decennial ↔ Decennial for population counts\n", " - ACS 5-year ↔ ACS 5-year for detailed demographics\n", "\n", "2. **Geographic Strategy**: Use stable boundaries when possible\n", " - States and counties don't change over time\n", " - Avoids complex boundary adjustments\n", "\n", "3. **Variable Consistency**: Check codes across years\n", " - 2010: `P001001` vs 2020: `P1_001N` for population\n", " - ACS variables are generally more consistent\n", "\n", "4. **Data Quality**: Understand limitations\n", " - Decennial = complete count, ACS = sample\n", " - Check margins of error for ACS data\n", " - Consider real vs. nominal changes\n", "\n", "### Practical Applications\n", "\n", "- **Urban Planning**: Population growth patterns\n", "- **Economic Development**: Income trend analysis \n", "- **Policy Research**: Demographic change impacts\n", "- **Business Analysis**: Market area dynamics\n", "\n", "### Your Assignment\n", "\n", "Try this analysis with your own area of interest:\n", "1. Pick 3-4 states or counties\n", "2. Run the population change analysis\n", "3. Add income or another ACS variable\n", "4. Create your own visualizations\n", "5. Write a brief interpretation of the results\n", "\n", "**Happy analyzing!**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "census", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 4 }