Archive

Posts Tagged ‘SSH’

Getting a linking table from Compustat via WRDS Cloud using SSH and SAS

27 June 2017 Leave a comment

Researchers commonly need to combine data from multiple sources. Even when the sources are offered through the same platform such as Wharton Research Data Service (WRDS), this can still be a difficult process with many choices. Essentially, you need to find a common entity identifier in both sources; if none exists, find a reference table or other third source.

The help provided by WRDS is often sparse in detail and tailored for their more advanced users. I have previously written in detail about using WRDS Cloud and SAS to gain access to resources not available on the WRDS website, in the blog post Getting S&P 500 constituents from CRSP via WRDS Cloud using SSH and SAS.

In this post, I will suggest a way to link I/B/E/S data with Compustat North America data. I will summarise the points that are addressed in the aforementioned blog post.

Get a linking table from Compustat North America

You will need the following to begin.

  1. AUTHENTICATION. Your WRDS username and password, specifically an individual PGR or Academic login.
  2. SENDING COMMANDS. Windows users: the Putty program. Mac/Linux users: the Terminal app or command line.
  3. FILE TRANSFER. Windows users: WinSCP or FileZilla. Mac/Linux users: CyberDuck or FileZilla or command line.
  4. FILE EDITING: Notepad, Notepad++ or another text editor. Excel or other spreadsheet application.

The method is very similar to getting the S&P 500 constituents. Essentially, Compustat has a dataset called securities which lists all the identifiers for all the securities. It is in SAS format. It cannot be accessed directly by us, but we can write a SAS script to read its entire contents to a plain text file, then copy the file to our own PCs. This file is kept at location /wrds/comp/sasdata/d_na/security/security.sas7bdat for North American data and is updated often.

Your script to read this file should look like this:

options nosource nodate nocenter nonumber ps=max ls = max;
title ' ';
options source;
proc print data=comp.security noobs width=UNIFORM;

This is different from last time in a few ways.

  1. The ‘ls’ or ‘line size’ option has been changed from ’72’ to ‘max’. This is because there are more columns for Compustat securities than for S&P 500 constituents.
  2. The data table we are reading from is ‘data=comp.security’.
  3. The ‘labels’ option has been removed so we get the clean variable names as the header.
  4. The ‘width’ option is set to ‘UNIFORM’ so the columns are the same number of characters wide on each ‘page’.

The method is as follows; remember to refer to the aforementioned blog post.

  1. Save the SAS script to a new text file on your computer, compsecurity.sas or similar.
  2. Use WinSCP or equivalent to copy this file to your own space on WRDS Cloud.
  3. Use Putty or equivalent to run the script, log in and type qsas compsecurity.sas and press Enter.
  4. Wait 10 seconds and use WinSCP to download the generated files compsecurity.log and compsecurity.lst to your computer. The ‘lst’ file contains the data, if the program was successful (read the ‘log’ file to check).
  5. Use Notepad++ or equivalent to remove the blank lines from the top of compsecurity.lst and save it as compsecurity.txt – then check every 32,764 lines for a few more blank lines and repeat of the headers to be removed (these are ‘page breaks’).
  6. In Excel, open compsecurity.txt (via Data > Get External Data from Text) and use fixed column width option, carefully setting the locations for the column breaks, remembering to set most columns as Text format instead of General (particularly GVKEY and CUSIP). Save as CSV or Excel format.

Compustat securities via WRDS Cloud

Using the linking table

You will need the columns GVKEY and IBTIC (IBES Ticker).

  • If your Compustat and I/B/E/S data are in Stata format, import this table and consider joining the tables.
  • If your Compustat and I/B/E/S data are in Excel format, use VLOOKUP formulas (some more processing may be required).

Still to do: there is not necessarily a one-to-one match between these two identifiers. The same GVKEY will appear on more than one row, but will those rows contain more than one unique ticker? And is the data frequency the same in each source? You will need to find the answers to these questions out for yourself.

Getting S&P 500 constituents from CRSP via WRDS Cloud using SSH and SAS

2 June 2016 2 comments

The title of this blog post uses far more abbreviations than I would like! Before I get into the purpose of this blog post I shall define them.

  • S&P 500: Standards & Poor’s 500, an index of the 500 largest US companies (by market capitalisation) on the NASDAQ or NYSE stock indices.
  • CRSP: The Center for Research In Security Prices, a database of end-of-day and end-of-month prices for US equity indices.
  • WRDS: Wharton Research Data Services, a platform of financial and company databases with a common user interface, including CRSP.
  • SSH: Secure Socket Shell, a network protocol that provides a secure connection between remote computers, where commands are typed out.
  • SAS: Statistical Analysis System, a software tool for data analysis that can run on remote servers or your local PC.

Outline of this blog post

There are 500 companies that make up the S&P 500 equity index. This list can change from month to month as the market capitalisation of each company changes and the list needs to comprise the largest 500. When conducting historical analysis against this index it is important to have the right list of constituents for that period. There are various ways to obtain historical constituent lists [also link2],  but one way not previous discussed on this blog is via the CRSP master list, available on the WRDS Cloud to SAS programs executed over SSH or PC-SAS. This post shows how to log in remotely, write a short SAS script and download the master list.

Edit 6 June 2016: This post explains how to get the constituent list of equities by PERMNO identifier, a five-digit number that never changes for a company, given by CRSP. You can get the equivalent GVKEY list from Compustat quite easily and from the web-based WRDS access. Armed with the PERMNO list it is easier to interact with CRSP and for example get stock returns for the S&P500 constituents. [Thanks to Alex Taylor for contributing this information.]

Please note, the SSH access route to WRDS is only available to academic research staff and PhD students – sorry to MSc and undergraduates!

Getting into WRDS Cloud using SSH/SFTP/SCP

You will need to use a tool like WinSCP to see the WRDS Cloud remote computer, to “secure copy” files between it and your PC. WinSCP is a free graphical tool, familiar if you have used FTP before; it gives a view of a directory on your computer (left) and a view of a directory on the server (right). (FileZilla and Cyberduck are alternatives for Windows or Mac.)

WinSCP login

The WinSCP login screen. My user name is blacked out for security purposes.

Login with the following details.

  • File protocol: SFTP
  • Host name:  wrds-cloud.wharton.upenn.edu
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
WinSCP files

The standard WinSCP view of files on your computer (left) and files on the server (right). The red ringed icon is the Create New Directory button. I have used it to create a directory called “dsp500list”.

Your personal file space on the server will appear on the right, initially none. The directory labelled “..” is a short-cut meaning “the directory one up from this one (parent)”. The path of this directory is “/home/manchester/<wrdsusername>”. Your new SAS script will be saved here. Create a new directory on the server to save your work in, called “dsp500list” (press F7 or the icon hightlighted above).

The WinSCP tool will allow you to easily copy files but will not let you execute commands (run programs). For this, you will need a tool like Putty (or just the Terminal on Mac). Log in with the details below.

  • Connection type: SSH
  • Host name:  wrds-cloud.wharton.upenn.edu
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
Putty login

The Putty configuration and terminal screens. You will be prompted for user name and password later (press Enter after each). Commands are typed at the “$” prompt at the bottom of the screen.

You will need to use a few Unix commands to get about here. The most important one to start with is “chdir” to change directory. Type “chdir dsp500list” to change the working directory to the one you created above. We will put the SAS script and results file here.

Writing a SAS script

The SAS script to copy, saved to file dsp500list_0.sas:


options nosource nodate nocenter nonumber ps=max ls = 72;
title ' ';
options source;
proc print data=crsp.dsp500list noobs label;

It is easiest to write this on your PC in your chosen text editor then secure copy it to the “dsp500list” directory via WinSCP. (You may choose instead to type it directly in the Putty terminal using the command line editor “vim“, but explaining that is out of scope of this post.)

WinSCP copy

On the right of WinSCP, double-click the “dsp500list” directory to enter it, then drag the new SAS script from the local hard drive (left) into it (select “Copy” not “Move”).

This script accesses the SAS dataset stored at the location “/wrds/crsp/sasdata/q_indexes/dsp500list.sas7bdat”, the daily S&P500 list changes. You could use the monthly file “msp500list” instead by changing the “data=crsp.dsp500list” part of the script accordingly.

Getting the data out

Run qsas dsp500list_0.sas on the Putty command line to execute the SAS query. It will take a few seconds but then there will be two files produced.

  1. dsp500list_0.log – a log file of the working of SAS, telling you exactly what is happening and any error messages.
  2. dsp500list_0.lst – the output text file, effectively the same data as the daily S&P 500 list.
QSAS results

Running the command “qsas dsp500list_0.sas” from the correct directory in Putty will execute the script. The resulting files can be seen with the “ls” command in Putty or by refreshing in WinSCP (green round arrows icon).

The output text file dsp500list_0.lst begins like this, although in plain text with fixed width columns and some blank lines at the start.

CRSP Permanent Number Date when the stock included in S&P500 Index Date when the stock excluded from S&P500 Index
10006 19570301 19840718
10030 19570301 19690108
10049 19251231 19320101

This file can be used in your local SAS scripts, perhaps with a little work converting it back to a true sas7bdat file — this exercise is left for the reader to complete!

Summary

This is a lot to take on if you have never worked with Unix systems before, and I would advise the regular WRDS web interface for most users. If you need to use WRDS Cloud to run your SAS scripts to directly access files such as the master CRSP lists, this post shows you how to get started.

See also the WRDS Cloud manual and Intoduction to WRDS Unix system.