Saturday, July 15, 2006

Where Do You Get Data For Research?

After my recent post on academic research, one of my readers asked
"Where do you get the data for most of your studies? Do you use the same sources most of the time, or is it different depending on the study?"
That's are a couple of great questions. The answer to your second question is "Yes". There's a lot of variety in the types of data that finance researchers use in their studies. Not surprisingly, the data used depends on the research questions you're trying to answer. For example, if you work in a narrowly defined area, your research projects could easily end up using the same data again and again.

Some studies use commonly available data sources, some use hand-collected data, and some use both. There are quite a few "standard" data sets that accounting and finance academic researchers use on a regular basis (while the two fields are different in many ways, there's a lot of overlap between finance and financial accounting). Some of the most common data sets include:
  • The Center For Research In Securities Prices (a.k.a. "CRSP") data compiled by the University of Chicago, which has daily price, distribution, volume, and return data for over 5,000 publicly traded stocks back to the 60s;
  • Standard and Poor's COMPUSTAT database of annual and quarterly financial statement data;
  • Thomson Financial's Institutional Brokers Estimates (IBES) database of historical individual and consensus analyst forecasts.
  • Thomson's other databases covering merger and acquisition and securities issue transactions, insider trades, institutional holdings, and bankruptcies;
  • The TAQ (NYSE Trade and Quoting Database) of intraday stock transaction (i.e. trade and quotation) data ;
There are also quite a few others. I've used (or am currently using ) all of the above except the TAQ data in one project of another (the TAQ data is used mostly by in market microstructure research, and I'm a corporate finance guy).

The advantage to data sources that are on computer-readable media is that they can be "sliced and diced" in multiple ways if you're a good enough programmer. This makes it possible to do large scale studies of things like which factors determine the market reactions to insider trades or whether cap-weighted or fundamental-weighted indexes provide better risk-adjusted returns. Some studies, in fact, use only computer-ready data. Not surprisingly, the people who do these kinds of studies are usually either pretty good programmers themselves (usually in SAS, C, or some other package or language), or have graduate students that can grind the data for them.

But like anything else, this (relative) ease of access makes it harder to find truly interesting new ideas that merely use these data sources. So, many other studies also use hand-collected data (this is particularly true in corporate finance research). The hand collection of data can be a long and tedious process , but it has some real advantages.

For example, I am working with a graduate student on the relationship between corporate governance and the success of a certain type of corporate decision. Before we can do any analysis, he first has to gather and code (yes, I'm management, and he's operations) board, ownership, and compensation data on about 500-600 transactions. This will involve him spending about 150-200 hours reading through corporate proxy statements to determine firms' board composition, ownership structure, and compensation structures (that's where this data is typically found).

Once we have the governance data, we can get the analyst, financial statement, and stock market data from the above mentioned databases through programming.

Gathering this type of data is a long and tedious process. But that also has its advantages. This cost of data acquisition serves makes for a significant barrier to entry, which makes it difficult for anyone else to easily replicate our idea. So, if we play our cards right, we could conceivably get 2-3 (or even more) papers out of the data set before anyone else gets something similar. In fact, the data will also form the basis for his dissertation, and he will probably end up expanding it (and using it to examine other issues) for the next couple of years.

No comments:

Post a Comment