Home > Business Databases > Getting a linking table from Compustat via WRDS Cloud using SSH and SAS

Getting a linking table from Compustat via WRDS Cloud using SSH and SAS

Researchers commonly need to combine data from multiple sources. Even when the sources are offered through the same platform such as Wharton Research Data Service (WRDS), this can still be a difficult process with many choices. Essentially, you need to find a common entity identifier in both sources; if none exists, find a reference table or other third source.

The help provided by WRDS is often sparse in detail and tailored for their more advanced users. I have previously written in detail about using WRDS Cloud and SAS to gain access to resources not available on the WRDS website, in the blog post Getting S&P 500 constituents from CRSP via WRDS Cloud using SSH and SAS.

In this post, I will suggest a way to link I/B/E/S data with Compustat North America data. I will summarise the points that are addressed in the aforementioned blog post.

Get a linking table from Compustat North America

You will need the following to begin.

  1. AUTHENTICATION. Your WRDS username and password, specifically an individual PGR or Academic login.
  2. SENDING COMMANDS. Windows users: the Putty program. Mac/Linux users: the Terminal app or command line.
  3. FILE TRANSFER. Windows users: WinSCP or FileZilla. Mac/Linux users: CyberDuck or FileZilla or command line.
  4. FILE EDITING: Notepad, Notepad++ or another text editor. Excel or other spreadsheet application.

The method is very similar to getting the S&P 500 constituents. Essentially, Compustat has a dataset called securities which lists all the identifiers for all the securities. It is in SAS format. It cannot be accessed directly by us, but we can write a SAS script to read its entire contents to a plain text file, then copy the file to our own PCs. This file is kept at location /wrds/comp/sasdata/d_na/security/security.sas7bdat for North American data and is updated often.

Your script to read this file should look like this:

options nosource nodate nocenter nonumber ps=max ls = max;
title ' ';
options source;
proc print data=comp.security noobs width=UNIFORM;

This is different from last time in a few ways.

  1. The ‘ls’ or ‘line size’ option has been changed from ’72’ to ‘max’. This is because there are more columns for Compustat securities than for S&P 500 constituents.
  2. The data table we are reading from is ‘data=comp.security’.
  3. The ‘labels’ option has been removed so we get the clean variable names as the header.
  4. The ‘width’ option is set to ‘UNIFORM’ so the columns are the same number of characters wide on each ‘page’.

The method is as follows; remember to refer to the aforementioned blog post.

  1. Save the SAS script to a new text file on your computer, compsecurity.sas or similar.
  2. Use WinSCP or equivalent to copy this file to your own space on WRDS Cloud.
  3. Use Putty or equivalent to run the script, log in and type qsas compsecurity.sas and press Enter.
  4. Wait 10 seconds and use WinSCP to download the generated files compsecurity.log and compsecurity.lst to your computer. The ‘lst’ file contains the data, if the program was successful (read the ‘log’ file to check).
  5. Use Notepad++ or equivalent to remove the blank lines from the top of compsecurity.lst and save it as compsecurity.txt – then check every 32,764 lines for a few more blank lines and repeat of the headers to be removed (these are ‘page breaks’).
  6. In Excel, open compsecurity.txt (via Data > Get External Data from Text) and use fixed column width option, carefully setting the locations for the column breaks, remembering to set most columns as Text format instead of General (particularly GVKEY and CUSIP). Save as CSV or Excel format.

Compustat securities via WRDS Cloud

Using the linking table

You will need the columns GVKEY and IBTIC (IBES Ticker).

  • If your Compustat and I/B/E/S data are in Stata format, import this table and consider joining the tables.
  • If your Compustat and I/B/E/S data are in Excel format, use VLOOKUP formulas (some more processing may be required).

Still to do: there is not necessarily a one-to-one match between these two identifiers. The same GVKEY will appear on more than one row, but will those rows contain more than one unique ticker? And is the data frequency the same in each source? You will need to find the answers to these questions out for yourself.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s