It can be useful to know how many people live near a landmark / point-of-interest (POI). For example, a location is often considered “walkable” if you can walk to it in 10 minutes or less. Understanding how many people live near a POI is one way of estimating how many people are within walking distance of a POI, if they were to walk from their home to the POI.
In this post, I start with a point-of-interest, “Times Square, NYC”, and using the Census API I find out how many people live within the census tract that contains this POI (a tract is one of the smallest sub-divisions for which the Census provides population estimates).
If we wanted to go a bit further down this path of estimating “population within walking distance” we could take this approach and expand it to include all census tracts within a certain distance of our POI census tract. I used this approach one time to understand the potential “reach” of outdoor neighborhood advertising.
from census import Census
from us import states
import censusgeocode as cg
import folium
import geocoder
import pprint
import wget
import zipfile
import shapefile as shp
import pandas as pd
from shapely.geometry import Polygon
import numpy as np
= Census("MY_API_KEY") c
Geocoding the point-of-interest
First we’ll geocode our point-of-interest (POI) using geocoder
. This will give us the latitude and longitude coordinates, among other things.
= geocoder.osm('Times Square').json
poi
# Print keys
pprint.pprint(poi.keys())
print("\n" + poi['address'])
dict_keys(['accuracy', 'address', 'bbox', 'city', 'confidence', 'country', 'country_code', 'county', 'icon', 'importance', 'lat', 'lng', 'ok', 'osm_id', 'osm_type', 'place_id', 'place_rank', 'postal', 'quality', 'raw', 'region', 'state', 'status', 'street', 'suburb', 'type'])
Times Square, West 45th Street, Times Square, Theater District, Manhattan Community Board 5, Manhattan, New York County, New York, 10036, United States of America
Next, we’ll use censusgeocode
to get the geo IDs used by the census. As input, we’ll use the latitude and longitude from the previous geocoding.
= cg.coordinates(y = poi['lat'], x = poi['lng'])
poi_cg
# Print top-level keys
pprint.pprint(poi_cg.keys())
# Print Census Tract keys
'Census Tracts'][0].keys()) pprint.pprint(poi_cg[
dict_keys(['2010 Census Blocks', 'States', 'Counties', 'Census Tracts'])
dict_keys(['GEOID', 'CENTLAT', 'AREAWATER', 'STATE', 'BASENAME', 'OID', 'LSADC', 'FUNCSTAT', 'INTPTLAT', 'NAME', 'OBJECTID', 'TRACT', 'CENTLON', 'AREALAND', 'INTPTLON', 'MTFCC', 'COUNTY', 'CENT', 'INTPT'])
Mapping the tract
First we’ll find the shapefiles for NY state tracts using the us
library.
Getting shapefiles
= states.NY.shapefile_urls()
shpurls for region, url in shpurls.items():
print(region, url)
tract https://www2.census.gov/geo/tiger/TIGER2010/TRACT/2010/tl_2010_36_tract10.zip
cd https://www2.census.gov/geo/tiger/TIGER2010/CD/111/tl_2010_36_cd111.zip
county https://www2.census.gov/geo/tiger/TIGER2010/COUNTY/2010/tl_2010_36_county10.zip
state https://www2.census.gov/geo/tiger/TIGER2010/STATE/2010/tl_2010_36_state10.zip
zcta https://www2.census.gov/geo/tiger/TIGER2010/ZCTA5/2010/tl_2010_36_zcta510.zip
block https://www2.census.gov/geo/tiger/TIGER2010/TABBLOCK/2010/tl_2010_36_tabblock10.zip
blockgroup https://www2.census.gov/geo/tiger/TIGER2010/BG/2010/tl_2010_36_bg10.zip
Now we’ll download and unzip the shapefiles for the tract zip.
'https://www2.census.gov/geo/tiger/TIGER2010/TRACT/2010/tl_2010_36_tract10.zip')
wget.download(with zipfile.ZipFile('tl_2010_36_tract10.zip', 'r') as zip_ref:
zip_ref.extractall()
-1 / unknown
Converting shapefile to LNG/LAT coords
Next, we’ll read the shape files into a dataframe, giving us a “coords” column, using the shapefile
library. Note that these will be in LNG, LAT (not LAT, LNG).
def read_shapefile(sf):
"""
Read a shapefile into a Pandas dataframe with a 'coords'
column holding the geometry information. This uses the pyshp
package
"""
= [x[0] for x in sf.fields][1:]
fields = sf.records()
records = [s.points for s in sf.shapes()]
shps = pd.DataFrame(columns=fields, data=records)
df = df.assign(coords=shps)
df return df
= 'tl_2010_36_tract10.shp'
shp_path = shp.Reader(shp_path)
sf = read_shapefile(sf) df
This is the tract we’re interested in:
= df[df['GEOID10'] == poi_cg['Census Tracts'][0]['GEOID']].reset_index()
poi_tract 'coords'] poi_tract[
0 [(-73.985085, 40.758589), (-73.98225599999999,...
Name: coords, dtype: object
Map the POI marker and tract polygon
Finally, we’re ready to map the POI and the tract polygon in which it is located. We’ll use shapely
to get the polygon and folium
to do the mapping.
# Convert to Polygon
= Polygon(poi_tract['coords'][0])
poi_poly
# Initialize map with zoom and custom tileset, centered on POI
= folium.Map(location=[poi['lat'], poi['lng']],
m =16,
zoom_start='cartodbpositron')
tiles
# Add a map pin
'lat'], poi['lng']]).add_to(m)
folium.Marker([poi[
# Add the polygon
folium.GeoJson(poi_poly).add_to(m)
m
Population for census tract
Now, the last step is finding the population for this census tract.
The variable we are interested in is B01003_001E
, an estimate of the total population. A full list of the variable codes and definitions can be found at the ACS developer page (see the “2018 ACS Detailed Tables Variables”).
= c.acs5.state_county_tract(('NAME', 'B01003_001E'),
poi_pop 'Census Tracts'][0]['STATE'],
poi_cg['Census Tracts'][0]['COUNTY'],
poi_cg['Census Tracts'][0]['TRACT'])
poi_cg[
= str(round(poi_pop[0]['B01003_001E']))
poi_pop_str
poi_pop_str
'1027'
folium.Marker('lat'], poi['lng']],
[poi[=folium.DivIcon(
icon=(150, 36),
icon_size=(0, 0),
icon_anchor='<div style="font-size: 10pt;\
html background:white;\
padding:5px;\
border:1px solid red">\
The population in the census tract\
containing Times Square is ' + poi_pop_str + '\
</div>',
)
).add_to(m)
m