Use SQL to Find Central ZIP Codes for Metro Areas

At work we have a copy of the zip-codes.com Business database, which I tend to reference somewhat frequently due to the nature of projects I work on.

Today I needed a list of postal codes from major metro areas which would be used to drive a test data generator. The data generator has access to a simpler, non-commercial zip code list, and can do radius searches, but that's about it. I needed one code per metro area, and it worked out best to have the one closest to the center of the metro areas.

What I ended up with was a script to select the ZIP code closest to the approximate center for each of the N most populous CBSAs.

Some notes:

  • PrimaryRecord ='P' : As it comes from the data provider, the data is not fully de-normalized. A ZIPcode could have multiple records, when it is part of a place that has multiple names. The 'P' record indicates the primary record for a given ZIP.
  • This query uses the faster Pythagorean distance, which works ok over small distances and lower latitudes. The great circle distance would be better.
WITH
  topPopulationAreas AS (
   SELECT TOP 50 CBSA_Name,
     avg(Latitude)  AS lat,
     avg(Longitude) AS lon
   FROM zipcode
   WHERE PrimaryRecord = 'P' AND CBSA_Name != '' AND Population > 0
   GROUP BY CBSA_Name
   ORDER BY sum(Population) DESC
 ),
  zipDistance AS (
   SELECT DISTINCT ZipCode,
     zc.CBSA_Name,
     sqrt(square(lat - latitude) + square(lon - longitude)) AS dist
   FROM zipcode zc
   JOIN topPopulationAreas iq ON iq.CBSA_Name = zc.CBSA_Name
 ),
  shortest AS (
   SELECT CBSA_Name,
     min(dist) AS mindist
   FROM zipDistance
   GROUP BY CBSA_Name
 )
SELECT min(zd.ZipCode), zd.CBSA_Name, dist, Latitude, Longitude
FROM zipDistance zd
 INNER JOIN shortest  ON zd.dist = shortest.mindist AND zd.CBSA_Name = shortest.CBSA_Name
 INNER JOIN zipcode z ON z.ZipCode = zd.ZipCode AND z.PrimaryRecord = 'P'
GROUP BY zd.CBSA_Name, dist, Latitude, Longitude
cbsaCenterPoints

Leave a Reply