In today’s ever-evolving physical world, accurate and timely points of interest (POI) data proves more important than ever. Businesses, organizations, and research institutions gather and utilize POI data to execute successful operations - from food delivery services to find-my-nearest apps to marketing and advertising campaigns. However, POI-based applications rely on up-to-date information to provide genuine value to a product or service, and many POI resources fail to deliver adequate or accurate location data due to the dynamic nature of our world.
A key concern of using POI data for business applications is data validation. While many POI data resources use semi-reliable data validation approaches (think manual verification and directory checks), many fail to maintain timely validation methods. In fact, most POI data providers only update their databases every three to six months, which can be problematic depending on the data’s application. For example, when evaluating another POI data provider’s free match service in February of 2022, we found that 17% of their POI records were invalid. We confirmed this by checking website domains and searching for news articles about store locations that recently closed or relocated.
“We can’t be on the ground in every local market we operate in—we need access to data that can be our ‘eyes’ on the ground and give us a more accurate idea of what the local market looks like. But it wasn’t a good use of the team’s time to append partially complete POI data with open-source data to fill in the gaps.” - Julian Adams, Director of Data Science at Avison Young
According to the National Retail Federation, more than 8,100 retail store locations opened in 2021 - and that’s just in the US. Whether these were new brand openings, brand expansions, or store relocations, this stat indicates just how much POIs change every day. At a global scale, these changes are challenging to stay ahead of, and many companies building mapping and location-based platforms or applications struggle to curate an accurate and up-to-date database of places.
When POI data is an integral part of an organization’s operations, the risk associated with such a significant gap in database updates is high. A company that relies on an up-to-date record of places for its trade area analysis, for example, risks building catchments based off of incorrect competitor locations, thus misallocating resources if using stale data. Similarly, a consumer-facing mapping application built with outdated POI data makes for a poor user experience and creates a high churn rate. These are just two examples among many of the importance of data veracity when representing the dynamic physical world.
Working with stale and inaccurate data is also highly inefficient. According to research by Gartner, poor data quality costs large corporations nearly $15 million per year in losses, both in time and resources. Modern data scientists spend approximately 19% of their time collecting baseline data and 60% of their time cleaning and organizing it. With the majority of time spent remediating ‘dirty data,’ companies can drastically reduce operational data costs by simply obtaining high-quality data from the start.
"We pored through spreadsheets to isolate categories and look for issues in the data and SafeGraph was the clear winner. There was just so much weird, junky stuff in the other datasets, it just didn't pass basic data quality. So kudos to you guys for a solid product." - Nic, Babb, VP of Engineering at Adomni
This need for fresh and reliable POI data is why SafeGraph was founded in 2016, and we remain focused on one thing only: being the source of truth for physical places. The SafeGraph Places dataset is curated each month to empower organizations with an up-to-date view of global market landscapes, brand relationships, and how places share physical spaces.
SafeGraph Places is a comprehensive dataset composed of high-quality POIs, leveraged by thousands of organizations globally who trust the data as their primary source of truth. It’s a database created to address the most pressing challenges involved with POI data collection and upkeep, providing data scientists, product managers, and analysts with accurate and timely location information to ensure their products, services, analytics, and strategies are built on real-world facts. Places contains a robust set of geospatial attributes to provide deep context about physical locations, including address string, geographic coordinates, brand affiliation, open/close date, and NAICS/category codes.
An advantage of SafeGraph’s Places dataset is the breadth of location types included. While many POI providers only provide traditional commercial places, such as restaurants and retail stores, SafeGraph additionally curates POI data for parks, warehouses, EV charging stations, oil rigs, and other important, non-traditional places. This comprehensive coverage of global places under one unified data schema enables efficient data ingestion, modeling, and analysis - eliminating the need to prep data from multiple sources.
SafeGraph’s data curation process ensures the POIs included in Places are geographically precise and contain fresh and accurate attributes about what is actually occurring at that place. In the next section, we’ll dive into our data curation methodology and how we maintain freshness in a changing world.
Each month, SafeGraph creates the Places dataset using machine learning (ML) technology, web crawling, and third-party licensing. More specifically, SafeGraph curates POIs by:
The combination of all these sources results in a ready-to-use, clean, and current dataset that reflects the current state of POIs around the world.
“With other data providers, we would have to spend a lot of time cleaning the data to make it useable. Of course, the quality of the data was important to us, but the ethics of SafeGraph’s methodology really stood out.” - Scott Stoltzman, Director of Data Science at RCLCO
Each column in the Places data schema is designed to provide relevant and up-to-date information about global POIs. We describe each column in more detail below:
SafeGraph is a founding member of Placekey, the universal standard unique identifier for places. Placekey was developed out of a need to make location datasets from different sources easily joinable. To make sure our data is interoperable with other location data, SafeGraph appends Placekeys and parent Placekeys to all of our datasets.
Within SafeGraph Places, the Placekey and parent Placekey columns help identify the physical location of a POI and how it is spatially related to other places. When both components of a Placekey come together, it results in the ‘what’ and the ‘where’ of a specific POI and serves as a join-key to simplify bringing multiple location-based datasets together.
Placekey is a unique and persistent ID tied to an individual POI that simplifies joining location-based datasets from multiple sources. Think of a Starbucks location inside of a shopping mall - that Starbucks will have a unique Placekey because of its geographic location and the type of place it is. Each record in the Places dataset contains a Placekey.
The parent Placekey column, on the other hand, is only populated in rows representing places that are contained by another place. Using the previous example, the Starbucks store inside the mall will have both a Placekey and a parent Placekey, where the Placekey represents the store itself, and the parent Placekey represents the entire shopping mall. This concept of representing how places are related to each other physically is what we call ‘spatial hierarchy.’ Spatial hierarchy metadata appended to SafeGraph Places indicates when a place is standalone, exists within a larger structure, or shares a physical location with another place.
To provide the base information about what exists at each geographic location, SafeGraph includes three closely related columns. The location name column delivers the unique name of each place, such as 7-Eleven. Sometimes this matches the value in the brands column, particularly when a location name is simple, like Walmart, and falls under the Walmart brand. However the location name column can differentiate between a Walmart and a Walmart Supercenter, while the brand for that location will still be just Walmart.
The brands field is helpful for seeing entire brand footprints regardless of whether individual locations have different naming conventions. SafeGraph brand IDs also help surface brand relationships by serving as a unique and persistent identifier for different brands. Brand IDs remain the same in the event of a brand renaming itself so as not to break any existing models or queries.
SafeGraph brand IDs also detail parent and child brands. Similar to Placekey denoting spatial hierarchy, SafeGraph brand IDs show brand hierarchies. For example, Yum! Brands owns multiple restaurant brands, so POIs for those restaurant brand locations will contain a brand ID for that restaurant, and a parent brand ID for Yum! Brands. This takes identifying brand footprints and market landscapes a step further to show how some brands are related to each other, and provides another field option for querying and modeling places data.
Every POI in SafeGraph Places includes a location name, but not all records include a brand or brand ID. This is because many places do not belong to a larger brand, such as independent restaurants or local museums. SafeGraph defines a brand as a branded store which has multiple locations all under the same logo or store banner.
While location name, brands, and brand IDs are included in the main file delivered for SafeGraph Places, we include a supplementary brand info file in each delivery to provide the parent brand ID and more brand-specific information. The brand info file is easily joinable to SafeGraph Places through the brand ID column, and includes brand categorization information, stock symbol, stock exchange presence, and lists of which countries the brand currently has opened and closed locations in.
"From the beginning of our data sourcing process, SafeGraph provided the most comprehensive and actionable POI dataset. Their coverage of the top 1,000 restaurants is unmatched and invaluable.” - Ben Anderson, Senior Manager of Market, Customer, and Competitive Intelligence at Sysco
The Places dataset includes separate columns for the latitude and longitude of each POI to make the data easily mappable. It also has columns for parsed-out address strings, including separate columns for street address, city, region, postal code, and ISO country code. These foundational columns not only locate the POIs in the physical world (as does Placekey) but also power geocoding services in mapping applications and serve as valuable filters for selecting POIs from specific geographic areas.
Store IDs are unique identifiers within a brand for store locations. The store ID column enables users to easily join with other datasets that include store IDs. Most often, this involves transaction information, financial statements, quarterly reports, and first-party company data.
To provide further foundational context for each POI, the Places dataset includes three columns related to how people can engage with that place: phone number, website, and open hours. These are particularly useful for mapping applications or platforms that surface information to people looking to interact with a place. The open hours column contains specific hours of operation by day in an easily explorable JSON format.
The NAICS code, top category, and sub category columns categorize a POI by what type of place it is. These categorizations were developed by the US Census Bureau to distinguish different place types and are all closely related to each other.
“SafeGraph data adheres to industry standards, like NAICS codes. This makes it a lot easier for us to work with and join to other data sources without having to do a big cleanup effort.” - Matt Taaffe, VP of Product at Olvin
NAICS codes define a POI by a 6-digit code - a taxonomy developed to classify each type of POI numerically. Burger King, a ‘limited-service restaurant,’ contains a NAICS code of 722513. Top category is a string label that defines a POI by its purpose - based on the first 4 digits of a NAICS code. A Burger King, for example, is labeled ‘restaurants and other eating places.’ Sub category is a string label that defines a POI with a description of its purpose - based on the first 6 digits of a NAICS code. The same Burger King location labeled ‘restaurants and other eating places’ in the top category column is labeled ‘limited-service restaurant’ in the sub category column.
SafeGraph strives to provide 6-digit NAICS codes for most POIs, but for some places our models cannot meaningfully differentiate between two adjacent 6-digit NAICS. In these situations we err on the side of caution so as not to provide false facts, and choose to only assign a 3 or 4 digit description, meaning the sub category column will be null.
The category tag column expands on this categorization, providing further flexibility and granularity where the NAICS codes fall short. For example, category tags for a fast food restaurant may include terms like ‘counter service,’ ‘sandwich shop,’ ‘late-night,’ ‘drive-through,’ and more, while the sub category would remain ‘limited-service restaurant’ regardless of the type of food served. Category tags are also helpful in distinguishing between different types of medical offices or retailers. This information is typically used to:
Because each POI can contain multiple category tags, category tags are included as JSON in one column if applicable to a specific place.
To indicate the real-world status of a POI and make it clear when places open and close, SafeGraph includes three date-related columns. The opened on column provides the month and year that POI opened, while the closed on column details the month and year that POI closed, if applicable. If a closed on column value is null, that indicates the POI is still open. If an opened on column value is null, it means we are still acquiring the metadata to confidently report when that place opened, or that it opened before we had rich enough metadata to infer a date. We also include a tracking closed since column to note when we began reporting on that place’s opened or closed status.
The SafeGraph product and engineering teams have developed a detailed and thorough logic for determining if POIs are opened or closed. If a new place from an existing source repeatedly appears in our build pipeline, it is flagged as opened during the month in which it first appears. Similarly, if a POI from an existing source repeatedly disappears from our build pipeline, it is flagged as closed during the month in which it first disappears. These flags are added to the Places product permitting final QA checks and overall data hygiene. SafeGraph does not track temporary closures so as not to mistakenly mark places as permanently closed. You can read more about our open and close logic here.
While SafeGraph Places is ultimately a file of latitude and longitude coordinates for POIs, we do provide detail on whether the location itself exists in the real world as a polygonal space or not. For example, while the record for Golden Gate Park in SafeGraph Places is represented as geospatial coordinates for a single point, the geometry type field indicates that the park actually can be represented as a polygon. Types of places that do not have a polygonal geometry type include bus stops or ATMs, since they often do not have physical extents large enough for a person to traverse. SafeGraph uses the Places dataset to build Geometry data, providing the polygon data for places with geometry.