Michael Taylor from SafeGraph outlines key considerations for maintaining a database on points of interest, along with why getting it right matters.
Michael Taylor from SafeGraph outlines key considerations for maintaining a database on points of interest, along with why getting it right matters.
Michael Taylor is a product manager at SafeGraph responsible for ensuring SafeGraph products map cleanly to the needs of all markets, but especially new markets. Previously, he spent almost 10 years in early-stage technology companies building out products in the financial space (after having left venture capital and wealth management). Michael graduated from Santa Clara University with a degree in finance and a minor in mathematics. He is a native and current resident of the San Francisco Bay Area.
Cataloging data on points of interest is a lot of work. For starters, you have to decide what does or doesn’t count as a “point of interest”. Then you have to decide which geographical areas you’ll cover, and that presents challenges with parsing brand relationships and differing information formats. Then there’s the question of how to verify the data, and how often to do this so the data reflects current circumstances. Put it all together, and maintaining a database of information about physical locations becomes a monumental task.
SafeGraph product manager Michael Taylor takes a deep dive into the obstacles and other considerations data companies face when building location datasets. But he also emphasizes why it’s important to get this kind of data right by looking at some real-world scenarios that rely on having correct and up-to-date data on points of interest. Here’s an overview of what he covers:
By way of introduction, we’ll explain what we mean by data on places (or locations or points of interest), and why people would want to use it in the first place.
POI data (point of interest data) is information describing any location on Earth, other than a private residence, that someone may want to visit. POIs include places like restaurants, retail stores, shopping malls, libraries, hospitals, warehouses, bus stops, and parks. The information we register about them includes attributes such as the place’s name, address, geographical coordinates, brand affiliations, industry categories, open/closed status, phone numbers, and much more.
POI data is important because it’s the foundation for all other types of geolocation data, as it’s anchored to specific physical places. Other types of geospatial data, such as foot traffic measurements, can provide context valuable to various use cases in retail, real estate, government, insurance, and more. But if the places they reference (or the facts about those places) are incorrect, it throws the whole analysis off and can lead to very misinformed decisions.
So what does it take to create and sustain a correct, up-to-date, and comprehensive database on points of interest around the world? Here are five key things geospatial data companies should be thinking about as they construct and review their datasets.
Time in Video: 5:43
The terms “accuracy” and “precision” are often used in describing the quality of data, but their meanings aren’t interchangeable. Accuracy refers to how well a piece of information reflects reality or some other source of truth. Precision refers to how close a set of data points are to each other’s values (which can refer to accuracy), often when weighed against another variable.
In geospatial terms, “accuracy” would refer to cataloging the correct information about a place. “Precision” would refer more to how well correct information is cataloged across a number of places, including monitoring information about each specific place over time to ensure what is cataloged remains correct.
Time in Video: 6:43
Unfortunately, maintaining precision for point of interest data is difficult. This is not only because there are so many POIs in the world, but also because information about many of them changes frequently and with little warning. They could get new contact information, alter their operating hours, rebrand themselves, merge with other brands, or even close at some locations to open at different ones.
There are challenges for accuracy as well. For example, it can be difficult to tell whether sets of similar information that are formatted differently refer to the same place or to different places; addresses are notorious examples. Also, some organizations – depending on their functions – may interpret location-related data to represent things that don’t fit our classification of a “place”. These things include streets, post office boxes, or businesses people operate out of their homes. In addition, there’s the issue of whether a place is open or closed; that is, if there remains something at a POI that would actually entice people to visit it.
Time in Video: 13:00
Keeping POI location data precise takes a lot of time and effort, given the number of associated challenges. This is especially true when new types and attributes of locations are added. So some datasets are updated rather infrequently, such as quarterly or annually, to conserve resources and minimize the chance of errors.
Unfortunately, this risks failing to reflect important changes to information about specific places in a timely manner. An important one is whether a “point of interest” still counts as such, or has ceased operations and thus lost its “interest” factor. So those who manage location data have to juggle between the amount of work needed to keep it current (“fresh”) and the danger of the data losing its relevance (becoming “stale”) if it isn’t reviewed frequently enough. This is why SafeGraph chooses to update its Places dataset monthly.
Time in Video: 16:07
There isn’t a single authoritative source of truth for information about places. So those who manage location datasets have to measure their quality by cross-checking them against other sources. For example, examining data on transactions or foot traffic can make it easier to tell if a POI is still open or has closed, based on either a presence or lack of economic activity and footfall. Or satellite imagery showing a place has changed shape may indicate the place has closed or become an entirely new place.
Some of this validation can be done through machine learning, as machines can often catch patterns humans may miss. Other times, there is no substitute for a “boots on the ground” review to verify existing data about places, or to spot new information about a place a machine hasn’t been taught to look for yet. Often, a balance of both yields the best quality.
Time in Video: 19:07
An extra hurdle in maintaining a quality locations database is the fact that conventions for representing places data can vary in different areas of the world. A basic example is how specific countries or regions use different languages or alphabets to communicate. Another is how different nations may format addresses or organize their political jurisdictions differently (e.g. states and census block groups in the US vs. provinces/territories and dissemination areas in Canada).
Furthermore, it’s not uncommon for POIs of a certain brand to operate only within a certain country, or even a specific region within a country. Some brands may even operate under a particular name in one country or region, but under another name somewhere else. An example is how Burger King operates under the brand name Hungry Jack’s in Australia.
There are several challenges to ensuring the validity of entries in a database about physical locations. One is simply that there is no single source of truth for places data in the world. So the best we can do is cross-reference our entries with related data from multiple sources to see how closely they line up. However, this process can have its own obstacles if formatting, brand identities, or even the definition of what a “place” is differs across databases. This can cause confusion as to whether similar data refers to the same place or to different places, or even whether the data refers to what could be considered a “place” at all.
A key question in validating a location’s entry is whether that location is still in operation or has closed permanently. This requires performing validation on a consistent enough basis, which can be challenging. On one hand, we can’t validate massive numbers of POI data entries as fast as POIs themselves open or close. But if we validate POI data too infrequently, we risk it failing to represent certain newly-existing places, or to indicate specific places have changed or even no longer exist. This can cause those who use the data to make critical mistakes in decision-making based on incomplete or incorrect information.
We can use a few techniques to validate entries for location data. One is Placekey, a universal standard for location ID we co-developed with a number of other geospatial data companies. This helps to ensure each place in the database is only represented once. Another is cross-referencing other types of datasets, such as foot traffic, transactions, or satellite imagery. These can give clues regarding whether certain places have opened, changed, or closed.
Sometimes, machine learning algorithms can be taught to process places data and look for patterns in it that point to whether a location entry is valid or not. In massive datasets, this can greatly speed up the validation process while reducing the risk of human error. In other cases, it’s better to actually send someone to a location to gather information about it. Looking for details manually that a machine may not be able to identify can both expand the available data on a location and verify whether existing data is still correct.
It’s crucial for both commercial and non-commercial organizations to have quality places data to work with, as they base a lot of pivotal decisions on this information. So not having correct data can lead to mistakes that end up being incredibly costly, both for the organization itself and sometimes even for its clientele. Here are a few sectors that rely on getting location data right.
A retail chain seeking to open a new location in a particular area needs to know some important information about that area first. In general, it should look at how successful businesses – especially similar ones – have typically been in the area. Then it should examine more specifically which businesses are there now, where they are, and whether they compete with or complement the chain’s offering. Quality places data is essential for determining these factors, as a poorly-placed store can grossly underperform.
Companies that make map applications and other geographic information software need the location data they base their programs on to be accurate and precise. This goes beyond simply knowing a location’s name, address, and geographic coordinates. It also includes things like what type of place it is (i.e. why it’s “of interest”), how to contact the place’s managers, and when the place is open to the public. As an extension of that last one, a map app should be able to indicate when new places open, or when old ones close down and cease to exist.
Governments need to balance the presence of different POI types within cities. They need to make sure essential services such as transportation, food, medical care, and green space are accessible to as many people as possible. They also need to monitor how the opening or closing of certain POIs may affect the surrounding area, in terms of traffic or pollution for example. To these ends, quality places data is essential for their work.
Out-of-home marketers aim to place advertisements in locations where they can direct significant numbers of people to shop at nearby client stores. To do this, they need to understand what types of POIs are near high concentrations of foot traffic, and which ones are currently getting the most visits. They also need to be able to measure the success of advertisements by how well they shift foot traffic and visits towards client stores. They are focused on very short-term results, so they need the places data they rely on to be as up-to-date as possible.
Those investing in commercial equity have many similar factors to consider as those selecting sites for retail chains. They need to examine the business turnover rates in a target area, especially for the types of companies they typically invest in. Part of this includes looking at what types of businesses are opening, operating, and closing near their current (or potential) tenants, as well as whether they compete with or complement those tenants. So having quality places data that accurately reflects an area’s commercial landscape is of utmost importance.
Quality location data is also of interest to the insurance industry. Insurers have to know details about a place that may contribute to its risk factor, such as its business category, operating hours, or square footage. They want to see how well these details match up with the ones the policyholder provides (i.e. how honest they’re being about their own property’s risk). They also want to compare this data against industry standards to get a general sense of if the place’s risk is higher or lower than average. Also important for them to consider are the potential risk factors of nearby POIs, as these may affect the risk factor of the place being underwritten.
Consumer packaged goods companies need quality location data to determine how to effectively get their products to market. They need to conduct trade area analysis to find regions with high concentrations of the types of stores that will sell their products. But they should also look at the rates at which target store types are opening or closing in an area, to get an idea of where distribution opportunities may be about to flourish or dry up. Based on this, they can determine which areas and brands to focus on first, as well as which distribution support businesses (manufacturing, warehousing, logistics, etc.) in those areas should be approached for strategic partnerships.
“Places” is SafeGraph’s database on points of interest, covering millions of places worldwide (including parking lots in the US). Each entry contains up to 30+ attributes of information, including a unique and persistent identifier called a Placekey. If the place is encompassed by a larger place, such as a mall or an airport, we include that location’s Placekey as an attribute as well (see the section on spatial hierarchy in our Geometry dataset documentation).
If a place is associated with a certain brand, we include attributes relevant to that fact (see our November 2018 release notes for more information). We currently have information relating to branded POIs for over 8,600 brands worldwide. Also, where possible, we include attributes indicating when a place was opened and/or closed. We update all of the information in Places every month to maintain its freshness.
See our Places documentation for a more detailed look at our methodology in putting this dataset together.
Michael Taylor is a product manager at SafeGraph responsible for ensuring SafeGraph products map cleanly to the needs of all markets, but especially new markets. Previously, he spent almost 10 years in early-stage technology companies building out products in the financial space (after having left venture capital and wealth management). Michael graduated from Santa Clara University with a degree in finance and a minor in mathematics. He is a native and current resident of the San Francisco Bay Area.