A point or icon on a map can be a handy guide, but it often doesn’t tell the whole story about a place. What does it look like? How big is it? Are there other, smaller points of interest contained within it? These kinds of questions are why polygon data is used to map the physical boundaries of places.
But what goes into creating polygon data beyond simply drawing the outline of a building? And what practical problems can polygon data be used to solve? Bryan Bonack from SafeGraph has the answers in this seminar, which you can find summarized in these sections:
To begin, we’ll explain – from a geospatial perspective – what polygon data is.
Polygon data consists of sets of connected vertices that form closed shapes. In geospatial terms, these shapes are used to represent the boundaries of places as accurately as possible. As polygon data is two-dimensional, it can be used to measure the area and perimeter of the places it represents.
Capturing polygon data is sometimes not as simple as it may first seem, due to the varying natures of places. Sometimes it involves outlining a building footprint, while other times it requires capturing a smaller building that is part of a larger building (like a store within a mall). Still other times, it requires creating complex sets of shapes for a place that corresponds to multiple buildings or other places, such as a theme park or a college campus.
There are even more considerations that go into creating polygon data accurate and informative enough to be used in professional settings. We’ll summarize a few of them here, along with some use cases for polygon data at businesses and other professional organizations.
Time in Video: 7:00
In simpler situations, polygon data may be able to be extracted from aerial imagery using AI-powered object recognition. This offers the best economy of scale. However, places with more complex geometries may require polygons to be drawn by hand. This helps to ensure accuracy and correct errors.
Time in Video: 8:04
Sometimes, we are unable to match an accurate polygon from our sources to a specific place. In this case, we can generate a synthetic polygon based on a radius around the centroid (i.e. the latitudinal and longitudinal center) of a place. This can be adjusted to an approximate size based on the place’s category, and to a shape that avoids overlapping with other nearby objects. However, synthetic polygons are not totally accurate representations of the boundaries of places, so we avoid using them as much as possible.
Time in Video: 9:45
Polygons can sometimes overlap because a distinct place may still be considered part of a larger place, such as a baseball field within a park. In this case, we refer to the smaller place as a “child” polygon, and the larger place as a “parent” polygon. We refer to this as “spatial hierarchy”.
We only allow places of certain categories to be considered parents, as this helps to avoid confusion in parent-child polygon relationships. For example, in the case of a restaurant inside of a supermarket, the former should never be the parent and the latter should never be the child.
Time in Video: 12:03
Places can have different spatial hierarchy relationships than the ones listed in the point above. For instance, some POIs are completely enclosed indoors and can only be accessed by entering their parent buildings. We can use place category and brand relationships to determine when this is or is not the case. We also only attribute visits to the parent POI in these cases because GPS accuracy is poor indoors.
Ideally, most polygons describe the exact shape and size of a single distinct POI (not including any parent or child polygons); we refer to these as “owned” polygons. However, some locations are difficult to draw or source exact-fitting polygons for (such as stores inside malls). So sometimes the best-fitting polygon for a place will actually include at least two different POIs whose polygons do not have parent-child relationships. We refer to these as “shared” polygons.
Time in Video: 17:07
Our polygon data is often used by advertisers, in combination with raw mobility data, to attribute visits to stores. This allows them to measure metrics like campaign effectiveness, customer journeys, and brand loyalty. Our polygon data is also used in mapping, particularly for data visualizations, as a polygon is a more accurate representation of a place than a simple point on a map.
Retail and real estate customers use our polygon data for site selection, factoring in things like spatial hierarchies and the proximity of other types of places. And commercial insurers are interested in the spatial hierarchy metadata of our polygon data. This is because co-tenancy and parent-child polygon relationships can affect how a place is assessed in terms of risk, as well as how much of that risk will be underwritten.
Synthetic polygons, in geospatial terms, are polygons that approximate the boundaries of places when precise information about them is unavailable. A synthetic polygon is created by drawing a radius around the centroid, the latitudinal and longitudinal center, of a place. Then its size is adjusted based on the place’s category (e.g. a shopping mall likely takes up more space than a coffee shop), and its shape is modified to avoid overlap with other objects (such as roads, impassable terrain, and other buildings).
We avoid using synthetic polygons as much as possible, as they do not provide accurate data regarding a place’s boundaries. However, we are sometimes forced to use them as we are not always able to match a precise polygon to each of several millions of places in our Geometry dataset. Less than 4% of our polygon data consists of synthetic polygons, and we make clear in our metadata when we do use them.
We use the term “spatial hierarchy” to describe the relationships between points of interest that share (part of) the same physical space. Polygons with proper metadata are important to spatial hierarchy because they help to map the exact natures of these relationships. Not all spatial hierarchy relationships are the same, and this can be significant in some cases.
For example, you can sometimes find a baseball field in a public park, but not a public park inside of a baseball field. This is because one category of POI has to generally be larger than another to be able to contain a POI of the other category. So it’s important to use polygon data attributed with POI categories (and sometimes brands) to make clear which “child” POIs are part of a larger “parent” POI, and not the other way around.
This gets even trickier when dealing with child POIs that are completely enclosed in the building of their parent POI. In these cases, the parent POI’s structure poses two problems. First, it obscures techniques based on aerial imaging, so you would have to determine the footprint polygons of the child POIs from inside the building (which is difficult and time-consuming). Second, it obstructs GPS signals. This makes it difficult to attribute a visit to a specific child POI within the parent building – even if each child POI is given an accurate polygon – instead of merely the parent POI itself.
To account for these limitations in establishing correct spatial hierarchy relationships, we attribute polygons with special classes. We consider a polygon “owned” if it describes the precise shape and size of a particular POI. Sometimes, however, a POI is a parent to several child POIs that are neither parents nor children to each other, and that do not have precise polygons available. In this case, the child POIs are considered to have a “shared” polygon with their parent. This helps to clarify whether a particular POI is represented exactly by a polygon, or is merely somewhere within that polygon.
SafeGraph’s “Geometry” dataset consists of polygon data for points of interest all around the globe. The schema for this dataset is full of useful contextual information, such as which polygons are computer-generated and which are hand-drawn. It also includes metadata regarding spatial hierarchy, so you can see the relationships between polygons that overlap or are very close to each other.
For more information about property-related data and potential use cases for it, watch some of these other webinars we’ve run: