Archive

Posts Tagged ‘WRDS’

Dissertation Research

19 October 2017 1 comment

When seeking to complete a dissertation in the area of Accounting & Finance, a key consideration is ‘Research Feasibility’. This can be summarised by the statement:

 

Can I obtain the data I require, in a timely manner, to successfully complete my research?

 

A typical one year MSc course would allocate 3 months at the end, to complete the disseration. The gathering of company financial quantitative data, (from sources such as: Thomson Reuters Datastream, S&P Capital IQ, Bloomberg Professional, Compustat via WRDS: Wharton Research Data Service) is fundamental to the success of the research.

 

What can go wrong?

 

  1. Data is not available:  It is not contained within the databases the university subscribes to. The years required are not covered. The student is off-campus and the data is only accessible on-campus (as is the case for Datastream and Bloomberg Professional).
  2. Research Proposal:  This may be too ambitious. For example, a student reads an accounting/finance journal article and decides to try to replicate all or part of the research contained within the article. This can be problematic, as the academic probably spent two or more years completing the research – greater than the time available for an MSc dissertation.
  3. Topic:  The choice of topic can be influenced by a desire to work in a particular area of finance. Unfortunately, this may lead to the key difficulty when conducting research – Data is Not Available.

Data is the foundation on which any analysis is based. Where this is difficult to obtain, time pressures may result, leading to the possibility of failure to submit the dissertation on time.

Whilst it could be argued that the difficulties experienced by students in working on their dissertation are part of the research process, as a Librarian, my approach is different: how can I be most helpful, in assisting the student to successfully complete their research?

 

Helpful Suggestions

 

  1. Pilot Project: Essentially this means establishing the best source – there could be more than one available. Also, how to search the source productively. Also, whether all the years of data required are covered.
  2. Seek Guidance:  This follows directly from point one above. It may be that the most efficient method (shortest time to collect what is required) is not known to the student. Guidance from a Librarian can demonstrate the best source and search method, drawing on years of experience in supporting student dissertation research.
  3. Explore Resources:  With so many sources available to students, the difficulty is often one of familiarity – knowing which databases are available and how they can be accessed. A Library web site is a good place to start. The example below is the subject guide for ‘Business and Management’, at the University of Manchester.

 

Database Guide

Business & Management Guide

 

One of the sections is  for ‘Specialist financial databases’. These are useful for dissertation research:

 

Financial Databases

Specialist Databases

 

Summary

 

Making the best use of resources by seeking guidance from Librarians and planning ahead (pilot study) can help to ensure a dissertation is successfully completed. The key factor being, the ability to secure data, on which to base any analysis.

 

Previous related post, in the Library Research Plus blog:

Research Feasibility [18 February 2015]

 

 

Advertisements

Updated conditional statements feature in WRDS

4 October 2017 Leave a comment
WRDS conditional statements

An example of nested conditional statements, taken from WRDS help pages.

Wharton Research Data Service (WRDS) is praised for offering a consistent user interface across the many databases that it houses, so, when a new search feature is released, it will be available for all users. They have updated the conditional statements builder and made it more powerful and easy to use. We will look at Compustat Global in this example.

What is the conditional statement builder?

To quote WRDS support, you can “build unlimited conditional statements using the list of variables, comparison operators, desired values, and logical operators”. The conditional statement builder still forms part of Step 2: Apply your company codes, but you will need to come back after completing Step 3 Query variables before it will work.

How do I use the conditional statement builder?

Search the entire database

As before, you can choose to select companies individually, upload a list, or “Search the entire database”. Choose this last option if you wish to use the conditional statement builder to reduce the scope of your search using any query variable. For example, you may want to search for companies located in one country, or those with total assets over a threshold, or those in a particular sector. Note that you will need to include the relevant query variable in Step 3: Query variables before applying the conditional statement builder.

Select query variables

Scroll back up to Step 2, click “Activate Conditional Statement Builder”.

Activate conditional statement builder

Add rules one at a time, selecting one of the chosen query variables and a condition (equals, not equals, greater than, begins with, and so on). You can choose to join with AND or OR logical operators. You can add groups of statements and join them.

Build your statements

What help is available?

You must take care to use the right units, scale and codes. To check the acceptable values to use, check the Variable Descriptions tab at the top of the page, or perform a small search and look at the kind of results it gives.

There is a short help page by WRDS on conditional statements, click the link labelled “How does this work?” above the builder.

Categories: Business Databases Tags:

Reshaping wide and long data in Stata

29 July 2016 1 comment
Reshape from long to wide via Stata

Reshape from long data (left) to wide data (right) via Stata

Downloading data from a platform like WRDS can give you the data you need but not in the layout you desire. This post shows you how to change from long data to wide data; that is, from a stacked column of companies to a column for each company, using date as unique identifier. It requires the use of Stata, available on PCs at the Library Finance Zone. (Version 14 was used in this test but it should work just as well on others.)

Preparation

Excel data (long)

Long data in Excel

In Excel, keep only the essential columns (you can add the others back later using VLOOKUP). In this example, some kind of ID (like PERMNO), Date and Return.The first row of the sheet contains the variable names. Save this file as type CSV (Comma delimited) *.csv, ignoring any warnings about features or multiple sheets. Close the file.

Stata import text delimited

Import text data delimited in Stata

Open Stata, click File > Import > Text data (delimited, *.csv). Click the Browse button to select your CSV file the click OK. You will see a command like the following was performed: import delimited P:\Reshape\long.csv. You can look at a summary of the data by typing list in the Command window.

Performing the command

The reshape command in Stata works in two modes.

  • Wide: instead of repeating dates and companies in long columns, create a new column for return for each company so that there is only one row for each date. This is also called long-to-wide. (You could instead have a column for each year with one row per company.)
  • Long: the reverse process, also called wide-to-long.

Here we are going from wide-to-long, with Date as the unique variable (i). We will lose the current Return column. Each new column will be Return for each company ID (j), with a name that is Return followed by that ID, such as Return100500, Return222622. The command to type in the Command window is:

reshape wide return, i(date) j(id)
Stata reshape

Reshape in Stata (click to enlarge)

After this is typed, you will get a summary of what just happened. If the text is in red, an error occurred, often if there are other variables that were not mentioned in the reshape command. You can list more than one variable to be reshaped (the command might then be reshape wide return volume price, i(date) j(id) for example).

Again, you can type the list command to see the data.

Finishing

Stata export

Export to Excel from Stata

You probably wish to export the data to Excel format to continue with your work. To do this, click File > Export > Data to Excel spreadsheet (*.xls, *.xlsx). Click the Save as button to choose where you wish to save to, and if you want older (pre-2007) or current Excel format. Tick the box “Save variable names to first row in Excel file” and press OK. You will see a command generated like this:  export excel using “P:\Reshape\reshaped.xlsx”, firstrow(variables)

Excel results

Now you have wide data in Excel!

You can now open the new file in Excel to see the reshaped data set.

More information

The Institute for Digital Education and Research at UCLA have two excellent example pages, one for reshaping long-to-wide and one for reshaping wide-to-long.

 

 

Categories: Data Analysis Tags: , ,

Getting S&P 500 constituents from CRSP via WRDS Cloud using SSH and SAS

2 June 2016 2 comments

The title of this blog post uses far more abbreviations than I would like! Before I get into the purpose of this blog post I shall define them.

  • S&P 500: Standards & Poor’s 500, an index of the 500 largest US companies (by market capitalisation) on the NASDAQ or NYSE stock indices.
  • CRSP: The Center for Research In Security Prices, a database of end-of-day and end-of-month prices for US equity indices.
  • WRDS: Wharton Research Data Services, a platform of financial and company databases with a common user interface, including CRSP.
  • SSH: Secure Socket Shell, a network protocol that provides a secure connection between remote computers, where commands are typed out.
  • SAS: Statistical Analysis System, a software tool for data analysis that can run on remote servers or your local PC.

Outline of this blog post

There are 500 companies that make up the S&P 500 equity index. This list can change from month to month as the market capitalisation of each company changes and the list needs to comprise the largest 500. When conducting historical analysis against this index it is important to have the right list of constituents for that period. There are various ways to obtain historical constituent lists [also link2],  but one way not previous discussed on this blog is via the CRSP master list, available on the WRDS Cloud to SAS programs executed over SSH or PC-SAS. This post shows how to log in remotely, write a short SAS script and download the master list.

Edit 6 June 2016: This post explains how to get the constituent list of equities by PERMNO identifier, a five-digit number that never changes for a company, given by CRSP. You can get the equivalent GVKEY list from Compustat quite easily and from the web-based WRDS access. Armed with the PERMNO list it is easier to interact with CRSP and for example get stock returns for the S&P500 constituents. [Thanks to Alex Taylor for contributing this information.]

Please note, the SSH access route to WRDS is only available to academic research staff and PhD students – sorry to MSc and undergraduates!

Getting into WRDS Cloud using SSH/SFTP/SCP

You will need to use a tool like WinSCP to see the WRDS Cloud remote computer, to “secure copy” files between it and your PC. WinSCP is a free graphical tool, familiar if you have used FTP before; it gives a view of a directory on your computer (left) and a view of a directory on the server (right). (FileZilla and Cyberduck are alternatives for Windows or Mac.)

WinSCP login

The WinSCP login screen. My user name is blacked out for security purposes.

Login with the following details.

  • File protocol: SFTP
  • Host name:  wrds-cloud.wharton.upenn.edu
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
WinSCP files

The standard WinSCP view of files on your computer (left) and files on the server (right). The red ringed icon is the Create New Directory button. I have used it to create a directory called “dsp500list”.

Your personal file space on the server will appear on the right, initially none. The directory labelled “..” is a short-cut meaning “the directory one up from this one (parent)”. The path of this directory is “/home/manchester/<wrdsusername>”. Your new SAS script will be saved here. Create a new directory on the server to save your work in, called “dsp500list” (press F7 or the icon hightlighted above).

The WinSCP tool will allow you to easily copy files but will not let you execute commands (run programs). For this, you will need a tool like Putty (or just the Terminal on Mac). Log in with the details below.

  • Connection type: SSH
  • Host name:  wrds-cloud.wharton.upenn.edu
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
Putty login

The Putty configuration and terminal screens. You will be prompted for user name and password later (press Enter after each). Commands are typed at the “$” prompt at the bottom of the screen.

You will need to use a few Unix commands to get about here. The most important one to start with is “chdir” to change directory. Type “chdir dsp500list” to change the working directory to the one you created above. We will put the SAS script and results file here.

Writing a SAS script

The SAS script to copy, saved to file dsp500list_0.sas:


options nosource nodate nocenter nonumber ps=max ls = 72;
title ' ';
options source;
proc print data=crsp.dsp500list noobs label;

It is easiest to write this on your PC in your chosen text editor then secure copy it to the “dsp500list” directory via WinSCP. (You may choose instead to type it directly in the Putty terminal using the command line editor “vim“, but explaining that is out of scope of this post.)

WinSCP copy

On the right of WinSCP, double-click the “dsp500list” directory to enter it, then drag the new SAS script from the local hard drive (left) into it (select “Copy” not “Move”).

This script accesses the SAS dataset stored at the location “/wrds/crsp/sasdata/q_indexes/dsp500list.sas7bdat”, the daily S&P500 list changes. You could use the monthly file “msp500list” instead by changing the “data=crsp.dsp500list” part of the script accordingly.

Getting the data out

Run qsas dsp500list_0.sas on the Putty command line to execute the SAS query. It will take a few seconds but then there will be two files produced.

  1. dsp500list_0.log – a log file of the working of SAS, telling you exactly what is happening and any error messages.
  2. dsp500list_0.lst – the output text file, effectively the same data as the daily S&P 500 list.
QSAS results

Running the command “qsas dsp500list_0.sas” from the correct directory in Putty will execute the script. The resulting files can be seen with the “ls” command in Putty or by refreshing in WinSCP (green round arrows icon).

The output text file dsp500list_0.lst begins like this, although in plain text with fixed width columns and some blank lines at the start.

CRSP Permanent Number Date when the stock included in S&P500 Index Date when the stock excluded from S&P500 Index
10006 19570301 19840718
10030 19570301 19690108
10049 19251231 19320101

This file can be used in your local SAS scripts, perhaps with a little work converting it back to a true sas7bdat file — this exercise is left for the reader to complete!

Summary

This is a lot to take on if you have never worked with Unix systems before, and I would advise the regular WRDS web interface for most users. If you need to use WRDS Cloud to run your SAS scripts to directly access files such as the master CRSP lists, this post shows you how to get started.

See also the WRDS Cloud manual and Intoduction to WRDS Unix system.

WRDS new interface for selecting output variables

6 April 2016 Leave a comment

Following on from the refresh last summer, the Wharton WRDS web portal has been updated again. It is now easier to find and select the output variables you want to use in your request.

WRDS variables 2016

Finding and selecting output variables in WRDS is now much easier

The screen above uses the CRSP monthly stock database as an example. In the ‘Query Variables’ step, there are now multiple tabs containing categories of variables (which used to be listed in one column of boxes). The first tab contains all the variables, the subsequent tabs are categories like identifiers, balance sheet items, profit & loss items, and so on. There are orange and white arrows to scroll left and right between the tabs.

There is a search field in the first tab where you can type the name of the variable or any substring of it. For example, type “code” to get any variable with “code” in its name, it does not have to begin with “code”. Tick each match that you want to use and it appears in the column on the right, labelled ‘Selected. You can browse through all the tabs and see the same master list of Selected variables appear in each. This is much easier than using the browser’s ‘CTRL+F’ find function that was required before.

New feature: Saved Codelists

WRDS saved lists

You can save lists of codes to use later

Another new feature is the Saved Codelists. Usually we recommend people upload a text file containing the codes for the companies (or other entities) that they wish to run a query on. WRDS lets you type in or select companies one-by-one as well (useful for looking up a small number of companies). There is a blue link to ‘Code Lookup’ in most of the WRDS databases which lets you type the whole or part of the name of a company to find its code.

The new feature is that you can save these short lists to use later. The screen above shows Bureau van Dijk Amadeus. Simply tick the first box highlighted in the image above (to the right of the codes) and type a name for the list. On a later visit to WRDS, you can find the list in the second box highlighted above, ‘Select Saved Codelists’.

Insider trading

16 July 2015 Leave a comment

Definition of Insider Trading (from Investopedia):

The buying or selling of a security by someone who has access to material, nonpublic information about the security.

Companies in any country

Use ThomsonONE.com

  • Search for the company you want.
  • Click on the Company Analysis > Ownership folder, and select Insider Holdings or Insider Transactions report.
  • Click “Click here for full history” for historic holdings (older than 24 months).

Use Capital IQ

  • Search for the company you want.
  • On the long menu down the left, choose Investors > Public Ownership, then the Insider Trading tab.
  • Only the last 12 months are available.

Use Bloomberg (available in the Finance Zone and Precinct Library)

  • Try Bloomberg function PHDC for a specific company (e.g. VOD LN <F8> PHDC <GO>) and select Insiders, sorted by size.
  • You can also look up the biographies of executies/directors (BIO) and then choose Reported Holdings.
  • For stocks that have experienced the most purchases and sales by insiders, type: INSD and hit Go. Use the fields at the top of the screen to select country, time period, type of transaction. Click on a stock to display a graph of its historical prices with date of insider transactions.
  • For more info, type: insider and hit the green Help key <F1>.

Companies in the U.S. only

Use WRDS

  • Choose Thomson Reuters database, and then Insiders Data. Search by company ticker. Check the variables you want in your report.

Others

  • Insider filings with the U.S. SEC are available through Edgar. Enter your company name or ticker symbol or CIK and Include Ownership forms.
  • Insider transactions are available by issuer and reporting owner. Take a look at the website SecForm4.Com

Intraday quotes for equities and indices

Global source

Use Bloomberg.

For Equities and Indices:

Type the stock’s ticker symbol then hit [EQUITY] and hit <GO>

Choose: GIP – Intraday Price Graph

You can change the date and the time interval.

or

Type the index symbol then hit [INDEX], and then type GIP or GIT and hit <GO>

For example, to get intraday price intervals for the S&P 500:

SPX [INDEX] GIT and hit <GO>

Note: limited historical intraday data is available approx 900 trading days for GIP and 50 trading days for GIT.

Other sources

For US stocks, go to Wharton Research Data Services (WRDS) TAQ (New York Stock Exchange Trade and Quote, or NYSE TAQ.

For historical (1983-1992) NYSE and AMEX tick-by-tick data and for historical NASDAQ (1987-1992) tick-by-tick data, click on ISSM – Institute for the Study of Security Markets

TICKDATA provides a commercial service offering intraday data from the UK and other exchanges.

Finally, Thomson Reuters Tick History database offers global intra-day Time and Sales, Time and Quotes and Market Depth content from the 1st January 1996 for an extensive range of Equities, including their derivatives. Access to this resource is limited; please contact us.