Welcome to the Hoberg-Phillips Data Library
NEW: Data extended to 2023 (overall coverage now 1989 to 2023)!
Data Provided by Gerard Hoberg and Gordon Phillips
10-K Text-based Network Industry Classifications (TNIC) data
TNIC pairwise industry classification data is the richest form of industry classification data from the textual network project (an unrestricted pairwise network). The benefits are outlined in the readme file below, and in the Hoberg and Phillips (2016) paper noted below.
I. [Baseline TNIC-3 Data] Download baseline version of the TNIC database (the "standard version" used in most research projects). This version is at a granularity consistent with three-digit SIC codes (we refer to this database as TNIC-3 data). [Download TNIC-3 Data] [View Readme for TNIC-3 Data]
II. [Larger TNIC-2 Data] Download a larger version of the TNIC database, which is at a granularity consistent with two-digit SIC codes (we refer to this database as TNIC-2 data). This has more pairs as it is a coarser industry classification. [Download TNIC-2 Data] [View Readme for TNIC-2 Data]
III. [Complete TNIC-All Data] Download complete version of the TNIC database (files are much larger, advanced users only). This version is referred to as "TNIC-All" and has all pairwise similarity scores for all firms in the database (including those not int he same industry). [Advanced Users TNIC Portal]
* The following studies provided the key innovations to the creation of this data:
- Text-Based Network Industries and Endogenous Product Differentiation - Gerard Hoberg and Gordon Phillips, 2016, Journal of Political Economy, 124 (5), 1423-1465
- Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis - Gerard Hoberg and Gordon Phillips, 2010, Review of Financial Studies, 23 (10), 3773-3811
10-K Text-based Fixed Industry Classifications (FIC Industries)
Base TNIC industry classification data (above) comes in pairwise network form, which is full-information. FIC industries apply clustering techniques to provide a simpler but less informative classification that comes in the form of "industry codes" assigned to each firm (the result is a firm-year panel and each firm has a new FIC industry code). This data is substantially less informative than the full information TNIC data and we do not recommend it for most projects. The differences between TNIC and FIC data are outlined in the readme file below and in the Hoberg and Phillips (2016) (link below). FIC data comes in various granularities ranging from 100 industries to 500 industries.
Download the FIC database (all granularities): [Download FIC Data] [View Readme for FIC Data]
* The following study provided key innovations to the creation of this data:
- Text-Based Network Industries and Endogenous Product Differentiation - Gerard Hoberg and Gordon Phillips, 2016, Journal of Political Economy, 124 (5), 1423-1465