Welcome to the Hoberg-Phillips Data Library
NEW: Data extended to 2023 (overall coverage now 1989 to 2023)!
Data Provided by Gerard Hoberg and Gordon Phillips
Data Repositories
Text-based Network Industry Classifications (TNIC) data
* These new industry classifications are based on firm pairwise similarity scores from text analysis of firm 10K product descriptions. Competitors are firm-centric, with each firm having its own distinct set of competitors - analogous to networks or a "Facebook" circle of friends. We provide the classifications from Hoberg and Phillips (2016, Journal of Political Economy) and also the new Doc2Vec embeddings-based classifications from Hoberg and Phillips (2025, Journal of Finance.) These new industry classifications are updated annually and offer more research flexibility, and are also more informative, than FIC (fixed industry) classifications such as SIC, NAICS, and the 10-K based FIC classifications below. Our research shows they sharply improve upon SIC and NAICS codes in explaining many different firm-specific decisions, including firm profitability, Tobins Q, and dividends. These benefits are outlined in Hoberg and Phillips (2010, 2016, 2025), with references available by clicking on the above link.
Industry Concentration and Total Similarity Data
* HHI Concentration metrics and Total Similarity data is available based on TNIC Industries.
Product Market Fluidity Data
* Product Market Fluidity assesses the degree of competitive threat and product market change surrounding a firm, based on "Product Market Threats, Payouts, and Financial Flexibility," Hoberg, Phillips, and Prabhala (2014, JF).
Vertical TNIC Data (VTNIC)
* Vertical TNIC data is comprised of two key databases, and is based on Fresard, Hoberg, and Phillips (2020, RFS). The first is a firm-year panel indicating the extent to which firms are vertically integrated. The second is a firm-pair-year database indicating the potential for vertical relatedness for every pair of firms in every year.
Firm Scope Data
* Scope data is comprised of two key databases, and is based on Hoberg and Phillips (2025, Journal of Finance.) The first is a firm-year panel indicating the scope of each firm (number of product markets it operates in). The second is a firm-market-year database indicating the specific text-based markets each firm operates in for each year. These product markets are calculated using Doc2Vec embeddings. See: TNIC Doc2Vec Embeddings.