At work we have a copy of the zip-codes.com Business database, which I tend to reference somewhat frequently due to the nature of projects I work on.

Today I needed a list of postal codes from major metro areas, which would be used to drive a test data generator. The data generator has access to a simpler, non-commercial zip code list, and can do radius searches, but thats about it. I needed one code per metro area, and it worked out best to have the one closest to the center of the metro areas.

What I ended up with was a script to select the zip code closest to the rough center of the top N CBSA, looking only at populated zip codes.

Sure, there are other ways to do this, but given the data source that I had access to, this was the quickest.

WITH
  topPopulationAreas AS (
   SELECT TOP 50 CBSA_Name, avg(Latitude) AS lat, avg(Longitude) AS lon
   FROM zipcode
   WHERE PrimaryRecord = 'P' AND CBSA_Name != '' AND Population > 0
   GROUP BY CBSA_Name
   ORDER BY sum(Population) DESC
 ),
  zipDistance AS (
   SELECT DISTINCT ZipCode, zc.CBSA_Name, sqrt(square(lat - latitude) + square(lon - longitude)) AS dist
   FROM zipcode zc
   JOIN topPopulationAreas iq ON iq.CBSA_Name = zc.CBSA_Name
 ),
  shortest AS (
   SELECT CBSA_Name, min(dist) AS mindist
   FROM zipDistance
   GROUP BY CBSA_Name
 )
SELECT min(zd.ZipCode), zd.CBSA_Name, dist, Latitude, Longitude
FROM zipDistance zd
 INNER JOIN shortest ON zd.dist = shortest.mindist AND zd.CBSA_Name = shortest.CBSA_Name
 INNER JOIN zipcode z ON z.ZipCode = zd.ZipCode AND z.PrimaryRecord = 'P'
GROUP BY zd.CBSA_Name, dist, Latitude, Longitude

cbsaCenterPoints