Figure 1 Yelp coverage of CBP restaurants by ZIP code in 2015
We test whether Yelp data predict establishment growth in County Business Patterns for the years prior to 2015 at the ZIP code level. There is strong persistence in establishment growth rates, and so we control for two years of lags in County Business Pattern establishment growth, which, along with year fixed effects, can explain 14.8% of the variation in establishment growth across US ZIP codes. Adding Yelp data from the year in question boosts the r-squared to 22.5%. Our point estimate suggests that one extra Yelp business is associated with 0.6 more businesses in County Business Patterns. In many cases, including today, we don’t even have the one-year lag of County Business Patterns growth, which means that the marginal contribution of Yelp data would be even larger.
Yelp’s predictive power in the restaurant sector is even more impressive, because past restaurant growth doesn’t predict current restaurant growth. Year effects and lagged growth can explain less than 1% of the variation in restaurant growth rates across ZIP codes. Including growth in Yelp-reviewed restaurants boosts the r-squared to 11%. Using a richer set of Yelp variables increases the r-squared to almost 14%.
The fact that Yelp’s current data can predict current economic events suggests that patterns in Yelp data reflect patterns in the underlying economy, rather than simply patterns in the adoption of Yelp. At the same time, Yelp is more predictive in some places than in others, because this online source of information is not used uniformly across places. Richer, denser, and better-educated places have better Yelp coverage, presumably because they have a more internet-savvy population that likes to go out more. We find that one Yelp business is only associated with 0.2 extra County Business Patterns establishments in places that are poorly educated, low income, and less dense. The relatively low coefficient reflects Yelp’s weaker coverage in those areas. The coefficient rises to 0.5 in richer parts of the US, even if they are less dense or less educated. The coefficient increases to almost 0.75 when a ZIP code has money, education, and density. We expect that these spatial differences may decline if Yelp spreads further, but for now, ‘nowcasting’ with Yelp is safer in richer and better-educated cities, such as Boston. Yelp’s predictive power also differs by industry. It provides little ability to predict manufacturing growth, and does best in predicting retail establishment growth and the growth of business and professional services.
Yelp provides timelier local economic data than County Business Patterns, but it also provides data at a more granular level than is available in the public-facing County Business Patterns data. Block-by-block, street-by-street, Yelp can provide policymakers with recent changes in economic geography. In principle, these data can be a useful input to real estate investors, builders, and even business owners rethinking their location. Furthermore, Yelp makes it possible to measure new outcomes that were never included in traditional data sources. To illustrate in the context of New York City, Figure 2 shows how Yelp data can be used to analyse the types of restaurants that open across neighbourhoods, looking at price levels of their menus.
Figure 2 Number of mid-to-expensive ($$+) yelp restaurants opening per capita in 2015
Ultimately, digital data from online platforms can offer faster and geographically detailed images of the economy, but are a complement rather than a substitute for official government statistics. Our confidence in the value of Yelp data comes entirely from the fact they are is cross-validated with those statistics. The new big data frontier provides enormous opportunities, but it should never be an excuse for cutting the funding of traditional government data, which – when combined with new data sources – will provide a more complete picture of the economy.
ReferencesCavallo, A (2012), “Scraped Data and Sticky Prices”, MIT Sloan Working Paper.
Choi, H, and H Varian (2012), “Predicting the Present with Google Trends”, Economic Record88:2–9.
Einav, L, and J Levin (2014), "The Data Revolution and Economic Analysis", Innovation Policy and the Economy 14.
Glaeser, E L, H Kim, and M Luca (2017), “Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity”, NBER Working Paper 24010.
Glaeser, E L, S D Kominers, M Luca, and N Naik (2018), “Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life” Economic Inquiry, 56, 1, 114-137.
Guzman, J, and S Stern (2016), “Nowcasting and Placecasting Entrepreneurial Quality and Performance”, Working Paper.
Luca, D, and M Luca (2017), “Survival of the Fittest: The Impact of the Minimum Wage on Firm Exit” Working paper.
Wu, L, and E Brynjolfsson (2015), “The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales” in A Goldfarb, S M Greenstein, and C E Tucker (eds.), Economic Analysis of the Digital Economy, Chicago: University of Chicago Press.
Endnotes[1] These Yelp numbers exclude any businesses in Yelp that are missing a ZIP code, price range, or any recommended reviews.