From specialist databases to business literature, Business Research Plus provides advice and tips based on The University of Manchester Library Business Data Service resources and expertise. See our About page for more details.
- Library Research Plus provides expert insight from The University of Manchester Library’s Research Services for researchers in all disciplines including business.
- My Learning Essentials, the Library’s award-winning skills progamme includes online resources with a wealth of useful tips on searching, referencing, writing to support personal and professional development.
Exploring our resources – try the Business and Management Resources page (Subject Guides), scroll down to category or tag cloud in the right-hand column or search this blog (top right). For latest news see @UML_BDS Follow @UML_BDS
Downloading data from a platform like WRDS can give you the data you need but not in the layout you desire. This post shows you how to change from long data to wide data; that is, from a stacked column of companies to a column for each company, using date as unique identifier. It requires the use of Stata, available on PCs at the Library Finance Zone. (Version 14 was used in this test but it should work just as well on others.)
In Excel, keep only the essential columns (you can add the others back later using VLOOKUP). In this example, some kind of ID (like PERMNO), Date and Return.The first row of the sheet contains the variable names. Save this file as type CSV (Comma delimited) *.csv, ignoring any warnings about features or multiple sheets. Close the file.
Open Stata, click File > Import > Text data (delimited, *.csv). Click the Browse button to select your CSV file the click OK. You will see a command like the following was performed: import delimited P:\Reshape\long.csv. You can look at a summary of the data by typing list in the Command window.
Performing the command
The reshape command in Stata works in two modes.
- Wide: instead of repeating dates and companies in long columns, create a new column for return for each company so that there is only one row for each date. This is also called long-to-wide. (You could instead have a column for each year with one row per company.)
- Long: the reverse process, also called wide-to-long.
Here we are going from wide-to-long, with Date as the unique variable (i). We will lose the current Return column. Each new column will be Return for each company ID (j), with a name that is Return followed by that ID, such as Return100500, Return222622. The command to type in the Command window is:
reshape wide return, i(date) j(id)
After this is typed, you will get a summary of what just happened. If the text is in red, an error occurred, often if there are other variables that were not mentioned in the reshape command. You can list more than one variable to be reshaped (the command might then be reshape wide return volume price, i(date) j(id) for example).
Again, you can type the list command to see the data.
You probably wish to export the data to Excel format to continue with your work. To do this, click File > Export > Data to Excel spreadsheet (*.xls, *.xlsx). Click the Save as button to choose where you wish to save to, and if you want older (pre-2007) or current Excel format. Tick the box “Save variable names to first row in Excel file” and press OK. You will see a command generated like this: export excel using “P:\Reshape\reshaped.xlsx”, firstrow(variables)
You can now open the new file in Excel to see the reshaped data set.
One of the most useful Excel functions I help students with is VLOOKUP – extremely useful to automatically bring in data from one table to another. If you have company accounts data from Compustat Global (via WRDS) for multiple companies in multiple currencies, you will need to bring in the change rate for each currency to show every field in US Dollars.
|1||Company||Currency||Total Assets Local||Exchange rate||Total Assets USD|
|2||Co. X||AUD||100||=VLOOKUP(B2, currencies, 2, FALSE)||=C2*D2|
|3||Co. Y||PLN||400||=VLOOKUP(B3, currencies, 2, FALSE)||=C3*D3|
Where ‘currencies’ is a named range elsewhere in the workbook:
The VLOOKUP command looks for the currency code in the first column and returns the rate from the second column (use ‘FALSE’ to ensure an exact match). The conversion is simply a multiplication. (Note that you can convert from USD to USD at a rate of 1.0.) But what about making sure that the exchange rate is correct for the date of the data, and what if you have data from multiple years?
Performing VLOOKUP when matching a pair of variables
Thanks to Professor Marie Dutordoir for suggesting this kind of technique!
Usually you will want to look at accounts data over several years and for several companies. If you are working with several currencies then you will need a more complicated solution to this VLOOKUP exchange rate table, considering a day/month/year for each rate-currency pair. You do not need a second lookup table or other Excel functions, you just need to have another variable of date and a variable of currency-date pairs.
The new column F (mapping) is formed from currency and year, so F2 contains “=G2&H2” (see image above). The ‘currencies’ named range now begins with a different first column produced in the same way (see image below). A VLOOKUP always matches on the first column of its range, and now this one pulls the rate from the fourth column, so rate is “=VLOOKUP(F2, currencies, 4, FALSE)” for row 2.
In the ‘currencies’ named range, I have combined the exchange rate calculation and lookup into one formula (column K). Repeat this for each data type that you wish to convert the currency for. Remember to have an entry for USD at rate 1 for each year.
This example only has the average exchange rate for each year, you may wish to extend this approach for monthly or even daily rates. If you do this, you will need to take extra care with the format of the mapping cells, but this exceeds the scope of the post.
Where to find historical exchange rates?
- Datastream (as shown in the screen above).
- The documentation for Compustat suggests using WRDS Cloud, but this requires knowledge of Unix and SAS, similar to the S&P 500 constituents example.
- Other sources of historical exchange rates (last updated July 2015).
Identifiers can take many forms (e.g. Company Name, Ticker, CUSIP, SEDOL, Datastream Code, ISIN) and be national, such as CUSIP for US / Canadian companies or international, such as ISIN (International Security Identification Number). The ISIN code incorporates the national identifier. For example, for Tesco PLC: GB0008847096, which includes the SEDOL (UK company identifier) code: 0884709.
Company Lists and Datastream
In seeking to further their dissertation research, MSc students often make use of more than one research database. Consequently, a typical scenario involves a request to import a list of company identifiers into Datastream, then conducting further searches to secure data, prior to analysis. The ‘Create List (From Range)’ function is useful in this regard.
Create List (From Range)
This function within the Excel Add-In version of Datastream is extremely effective when working with lists (up to 5,000 Series [companies] per list). This enables Company Identifiers to be copied and pasted into Excel and saved as a List on the specific computer being used (Store List Locally) or saved to the Datastream mainframe computer (Upload List) – accessible from computers with Datastream installed.
The code generated to represent the list of companies can then be entered directly into a ‘Static‘ or ‘Time Series‘ request search screen in Excel. Hence, one or more Datatypes can be obtained for an entire list of companies and the results would be displayed in a single Excel sheet.
Company Identifiers which work with the ‘Create List (From Range)’ function include: ISIN, Ticker, Datastream Code. Both Company Name and CUSIP don’t work, even though CUSIP is meant to.
Within Excel, with the Datastream tab selected, enter (paste in) the company identifiers. Next, Select the identifiers (ISINs below) and then click on ‘Create List (From Range)’. To illustrate the process, a short list of two US companies will be used (Apple, Microsoft) – this could of course be a list of many hundreds of companies.
From the ‘List Creation for Excel’ dialog box, the default option is ‘Store List Locally’ (on the computer being used at the time), with ‘Upload List’ as an option.
This generates a confirmation message – click ‘OK’.
The ‘List File Name’ [ New_0032.LLT ] is entered in the Series/Lists field in this Static Request (as at 03/06/2016) to locate: Company Name [NAME], CUSIP [WC06004], Ticker [WC05601], Datastream Code [DSCD] and Market Value [MV].
Results: widen columns as necessary, to display data.
It would be possible to repeat the above process, to create lists with different identifiers (e.g. Datastream Code, Ticker) and also select ‘Time Series’ request for historical data. For Ticker identifiers, it is advisable to format the cells in the Excel column to ‘Text’ (before identifiers are pasted in) where US companies can have an ‘@’ symbol as part of the identifier (e.g. @AAPL, for Apple). If the cells are not formatted as Text, the ‘@’ symbol will be interpreted by Excel as the start of a function.
If a list contains both US and UK Ticker identifiers, this can cause a problem where an incompatible datatype is specified in a search request. For example, if CUSIP (datatype: WC06004) is included, this means the search will fail (i.e. it gives no results), as UK companies do not have CUSIP codes.
Converting from Ticker to Datastream Code identifiers is a way around this difficulty. The Datastream Code identifiers could then be used to create a new list: for Apple (992816), Microsoft (719643), Tesco (900803) and J Sainsbury (926002).
A Static Request (03/06/2016) search for datatypes: Company Name, CUSIP and Market Value is successful (i.e. it gives results) and merely has a blank cell under CUSIP for Tesco and J Sainsbury.
The title of this blog post uses far more abbreviations than I would like! Before I get into the purpose of this blog post I shall define them.
- S&P 500: Standards & Poor’s 500, an index of the 500 largest US companies (by market capitalisation) on the NASDAQ or NYSE stock indices.
- CRSP: The Center for Research In Security Prices, a database of end-of-day and end-of-month prices for US equity indices.
- WRDS: Wharton Research Data Services, a platform of financial and company databases with a common user interface, including CRSP.
- SSH: Secure Socket Shell, a network protocol that provides a secure connection between remote computers, where commands are typed out.
- SAS: Statistical Analysis System, a software tool for data analysis that can run on remote servers or your local PC.
Outline of this blog post
There are 500 companies that make up the S&P 500 equity index. This list can change from month to month as the market capitalisation of each company changes and the list needs to comprise the largest 500. When conducting historical analysis against this index it is important to have the right list of constituents for that period. There are various ways to obtain historical constituent lists [also link2], but one way not previous discussed on this blog is via the CRSP master list, available on the WRDS Cloud to SAS programs executed over SSH or PC-SAS. This post shows how to log in remotely, write a short SAS script and download the master list.
Edit 6 June 2016: This post explains how to get the constituent list of equities by PERMNO identifier, a five-digit number that never changes for a company, given by CRSP. You can get the equivalent GVKEY list from Compustat quite easily and from the web-based WRDS access. Armed with the PERMNO list it is easier to interact with CRSP and for example get stock returns for the S&P500 constituents. [Thanks to Alex Taylor for contributing this information.]
Please note, the SSH access route to WRDS is only available to academic research staff and PhD students – sorry to MSc and undergraduates!
Getting into WRDS Cloud using SSH/SFTP/SCP
You will need to use a tool like WinSCP to see the WRDS Cloud remote computer, to “secure copy” files between it and your PC. WinSCP is a free graphical tool, familiar if you have used FTP before; it gives a view of a directory on your computer (left) and a view of a directory on the server (right). (FileZilla and Cyberduck are alternatives for Windows or Mac.)
Login with the following details.
- File protocol: SFTP
- Host name: wrds-cloud.wharton.upenn.edu
- User name and Password: your regular WRDS account details (PhD and staff only)
- When prompted, choose “Yes” to save the key to the local machine.
Your personal file space on the server will appear on the right, initially none. The directory labelled “..” is a short-cut meaning “the directory one up from this one (parent)”. The path of this directory is “/home/manchester/<wrdsusername>”. Your new SAS script will be saved here. Create a new directory on the server to save your work in, called “dsp500list” (press F7 or the icon hightlighted above).
The WinSCP tool will allow you to easily copy files but will not let you execute commands (run programs). For this, you will need a tool like Putty (or just the Terminal on Mac). Log in with the details below.
- Connection type: SSH
- Host name: wrds-cloud.wharton.upenn.edu
- User name and Password: your regular WRDS account details (PhD and staff only)
- When prompted, choose “Yes” to save the key to the local machine.
You will need to use a few Unix commands to get about here. The most important one to start with is “chdir” to change directory. Type “chdir dsp500list” to change the working directory to the one you created above. We will put the SAS script and results file here.
Writing a SAS script
The SAS script to copy, saved to file dsp500list_0.sas:
options nosource nodate nocenter nonumber ps=max ls = 72; title ' '; options source; proc print data=crsp.dsp500list noobs label;
It is easiest to write this on your PC in your chosen text editor then secure copy it to the “dsp500list” directory via WinSCP. (You may choose instead to type it directly in the Putty terminal using the command line editor “vim“, but explaining that is out of scope of this post.)
This script accesses the SAS dataset stored at the location “/wrds/crsp/sasdata/q_indexes/dsp500list.sas7bdat”, the daily S&P500 list changes. You could use the monthly file “msp500list” instead by changing the “data=crsp.dsp500list” part of the script accordingly.
Getting the data out
Run qsas dsp500list_0.sas on the Putty command line to execute the SAS query. It will take a few seconds but then there will be two files produced.
- dsp500list_0.log – a log file of the working of SAS, telling you exactly what is happening and any error messages.
- dsp500list_0.lst – the output text file, effectively the same data as the daily S&P 500 list.
The output text file dsp500list_0.lst begins like this, although in plain text with fixed width columns and some blank lines at the start.
|CRSP Permanent Number||Date when the stock included in S&P500 Index||Date when the stock excluded from S&P500 Index|
This file can be used in your local SAS scripts, perhaps with a little work converting it back to a true sas7bdat file — this exercise is left for the reader to complete!
This is a lot to take on if you have never worked with Unix systems before, and I would advise the regular WRDS web interface for most users. If you need to use WRDS Cloud to run your SAS scripts to directly access files such as the master CRSP lists, this post shows you how to get started.
If you have a data file from CRSP (or many other sources), the date may be recorded as a text string of the form YYYYMMDD (for example, 20151231 for the last New Year’s Eve). If you wish for Excel to interpret this as a date, for the purpose of analysis, this can be done with the four formulae below.
=DATE(LEFT(A2,4), MID(A2,5,2), RIGHT(A2,2))
This reads the following fields.
- Year from the first 4 characters of cell A2.
- Month from the middle 2 characters of cell A2 starting from position 5.
- Day from the last 2 characters of cell A2.
See the image below for an example.
Now that the dates are recorded this way, Excel can interpret them correctly when doing analysis or when using features like pivot tables.
This post follows on from Historical Index Constituents (e.g. S&P 500) written in October 2014.
It is easy to find the current constituents for an equity index such as the FTSE 100 or S&P 500 but finding the historical members for a given date or range of dates requires more work. This blog post shows briefly how to use four screens in Bloomberg.
In the Bloomberg search bar, type ‘SECF<GO>’ (that is, SECF then the Enter key) to use the Security Finder to find the Equity Index you need. You could just type the name of the Index in the search box and see if the correct Index is suggested. In this example, click on SPX Index (S&P 500 Index). A menu will appear, choose Member Weightings MEMB.
You can get straight to the Member Weightings screen if you know the code of the index. Type, for example, ‘SPX <Index> MEMB <GO>’ (that is, SPX then the F10 key then MEMB then Enter). The date can be changed in the ‘As of’ box at the top-right and extra columns can be added. From here, on the red menu, click Output > Excel to export the data to a new Excel document (requires the Excel add-in to be configured, see desktop icon “Install Bloomberg Office Add-ins” at Manchester).
If you requested ISIN or CUSIP identifier numbers, these can be used in other databases such as Datastream, Compustat or Capital IQ.
Getting constituents for a range of dates
Getting the membership for the start date and the end date in your range will obviously not consider any company that joined and left in the intervening time. It is common to collect the membership at month end for each month in the range, paste the results into one long list in Excel then use the Data > Remove Duplicates function to get a tidy list of each company once only.
You can do this and keep all the columns (name, weight, ISIN etc.) but the image above just keeps Ticker. You can get all the other data again using the Bloomberg Excel add-in if you wish (with formulas such as =BDP(“AAPL US Equity”, “ID_ISIN”) where the first parameter is a reference to the cell with the Ticker).
Is there a way to do this more easily in Bloomberg? Well, yes and no. There is a screen in Bloomberg called Historical Index Member Changes CHNG.
For the FTSE 100, type ‘UKX <Index> CHNG <GO>’. Choose the date range and click the top ‘Enter 1<GO> to generate report’. A report will be generated and it will tell you the number of the report. Go to the Retrieve Reports screen RPT.
You will see a line in the report for each member’s entry and exit to the index, with more detail. But you cannot export the list of member tickers to Excel or do anything else systematic like that. It will be better to use the MEMB screen and export each month.
Do you know a better way? Please let us know by commenting on this post.
See also Historical FTSE100 Index Constituents on Datastream, dated July 2012.
Following on from the refresh last summer, the Wharton WRDS web portal has been updated again. It is now easier to find and select the output variables you want to use in your request.
The screen above uses the CRSP monthly stock database as an example. In the ‘Query Variables’ step, there are now multiple tabs containing categories of variables (which used to be listed in one column of boxes). The first tab contains all the variables, the subsequent tabs are categories like identifiers, balance sheet items, profit & loss items, and so on. There are orange and white arrows to scroll left and right between the tabs.
There is a search field in the first tab where you can type the name of the variable or any substring of it. For example, type “code” to get any variable with “code” in its name, it does not have to begin with “code”. Tick each match that you want to use and it appears in the column on the right, labelled ‘Selected. You can browse through all the tabs and see the same master list of Selected variables appear in each. This is much easier than using the browser’s ‘CTRL+F’ find function that was required before.
New feature: Saved Codelists
Another new feature is the Saved Codelists. Usually we recommend people upload a text file containing the codes for the companies (or other entities) that they wish to run a query on. WRDS lets you type in or select companies one-by-one as well (useful for looking up a small number of companies). There is a blue link to ‘Code Lookup’ in most of the WRDS databases which lets you type the whole or part of the name of a company to find its code.
The new feature is that you can save these short lists to use later. The screen above shows Bureau van Dijk Amadeus. Simply tick the first box highlighted in the image above (to the right of the codes) and type a name for the list. On a later visit to WRDS, you can find the list in the second box highlighted above, ‘Select Saved Codelists’.