Welcome to the Hoberg-Phillips Data Library
NEW: Data extended to 2023 (overall coverage now 1989 to 2023)!
Data Provided by Gerard Hoberg and Gordon Phillips
Data Repositories
Text-based Network Industry Classifications (TNIC) data
* These new industry classifications are based on firm pairwise similarity scores from text analysis of firm 10K product descriptions. Competitors are firm-centric, with each firm having its own distinct set of competitors - analogous to networks or a "Facebook" circle of friends. We provide the classifications from Hoberg and Phillips (2016, Journal of Political Economy) and also the new Doc2Vec embeddings-based classifications from Hoberg and Phillips (2025, Journal of Finance.) These new industry classifications are updated annually and offer more research flexibility, and are also more informative, than FIC (fixed industry) classifications such as SIC, NAICS, and the 10-K based FIC classifications below. Our research shows they sharply improve upon SIC and NAICS codes in explaining many different firm-specific decisions, including firm profitability, Tobins Q, and dividends. These benefits are outlined in Hoberg and Phillips (2010, 2016, 2025), with references available by clicking on the above link.
For an exercise with Python code and underlying data to replicate one year of the Hoberg-Phillips TNIC network from the original JPE 2016 paper, please download this zip file: Download here. Examine the document inside this zipped file for details.Industry Concentration and Total Similarity Data
* HHI Concentration metrics and Total Similarity data is available based on TNIC Industries.
Product Market Fluidity Data
* Product Market Fluidity assesses the degree of competitive threat and product market change surrounding a firm, based on "Product Market Threats, Payouts, and Financial Flexibility," Hoberg, Phillips, and Prabhala (2014, JF).
Vertical TNIC Data (VTNIC)
* Vertical TNIC data is comprised of two key databases, and is based on Fresard, Hoberg, and Phillips (2020, RFS). The first is a firm-year panel indicating the extent to which firms are vertically integrated. The second is a firm-pair-year database indicating the potential for vertical relatedness for every pair of firms in every year.
Firm Scope Data
* Scope data based on Doc2Vec Embeddings. The data is comprised of two key databases, and is based on Hoberg and Phillips (2025, Journal of Finance.) The first is a firm-year panel indicating the scope of each firm (number of product markets it operates in). The second is a firm-market-year database indicating the specific text-based markets each firm operates in for each year. These product markets are calculated using Doc2Vec embeddings. See: TNIC Doc2Vec Embeddings.