Welcome to Business Research Plus

27 July 2011 1 comment

Business Research Plus header 2015

From specialist databases to business literature, Business Research Plus provides advice and tips based on The University of Manchester Library Business Data Service resources and expertise.  See our About page for more details.

  • Library Research Plus provides expert insight from The University of Manchester Library’s Research Services for researchers in all disciplines including business.
  • My Learning Essentials, the Library’s award-winning skills progamme includes online resources with a wealth of useful tips on searching, referencing, writing to support personal and professional development.

Exploring our resources – try the Business and Management Resources page (Subject Guides),  scroll down to category or tag cloud in the right-hand column or search this blog (top right). For latest news see @UML_BDS 

Reshaping wide and long data in Stata

29 July 2016 Leave a comment
Reshape from long to wide via Stata

Reshape from long data (left) to wide data (right) via Stata

Downloading data from a platform like WRDS can give you the data you need but not in the layout you desire. This post shows you how to change from long data to wide data; that is, from a stacked column of companies to a column for each company, using date as unique identifier. It requires the use of Stata, available on PCs at the Library Finance Zone. (Version 14 was used in this test but it should work just as well on others.)


Excel data (long)

Long data in Excel

In Excel, keep only the essential columns (you can add the others back later using VLOOKUP). In this example, some kind of ID (like PERMNO), Date and Return.The first row of the sheet contains the variable names. Save this file as type CSV (Comma delimited) *.csv, ignoring any warnings about features or multiple sheets. Close the file.

Stata import text delimited

Import text data delimited in Stata

Open Stata, click File > Import > Text data (delimited, *.csv). Click the Browse button to select your CSV file the click OK. You will see a command like the following was performed: import delimited P:\Reshape\long.csv. You can look at a summary of the data by typing list in the Command window.

Performing the command

The reshape command in Stata works in two modes.

  • Wide: instead of repeating dates and companies in long columns, create a new column for return for each company so that there is only one row for each date. This is also called long-to-wide. (You could instead have a column for each year with one row per company.)
  • Long: the reverse process, also called wide-to-long.

Here we are going from wide-to-long, with Date as the unique variable (i). We will lose the current Return column. Each new column will be Return for each company ID (j), with a name that is Return followed by that ID, such as Return100500, Return222622. The command to type in the Command window is:

reshape wide return, i(date) j(id)
Stata reshape

Reshape in Stata (click to enlarge)

After this is typed, you will get a summary of what just happened. If the text is in red, an error occurred, often if there are other variables that were not mentioned in the reshape command. You can list more than one variable to be reshaped (the command might then be reshape wide return volume price, i(date) j(id) for example).

Again, you can type the list command to see the data.


Stata export

Export to Excel from Stata

You probably wish to export the data to Excel format to continue with your work. To do this, click File > Export > Data to Excel spreadsheet (*.xls, *.xlsx). Click the Save as button to choose where you wish to save to, and if you want older (pre-2007) or current Excel format. Tick the box “Save variable names to first row in Excel file” and press OK. You will see a command generated like this:  export excel using “P:\Reshape\reshaped.xlsx”, firstrow(variables)

Excel results

Now you have wide data in Excel!

You can now open the new file in Excel to see the reshaped data set.

More information

The Institute for Digital Education and Research at UCLA have two excellent example pages, one for reshaping long-to-wide and one for reshaping wide-to-long.



Categories: Data Analysis Tags: , ,

Use pairs in one VLOOKUP with historical exchange rates from Datastream

20 July 2016 Leave a comment

One of the most useful Excel functions I help students with is VLOOKUP – extremely useful to automatically bring in data from one table to another. If you have company accounts data from Compustat Global (via WRDS) for multiple companies in multiple currencies, you will need to bring in the change rate for each currency to show every field in US Dollars.

1 Company Currency Total Assets Local Exchange rate Total Assets USD
2 Co. X AUD 100 =VLOOKUP(B2, currencies, 2, FALSE) =C2*D2
3 Co. Y PLN 400 =VLOOKUP(B3, currencies, 2, FALSE) =C3*D3

Where ‘currencies’ is a named range elsewhere in the workbook:

From Currency Rate
AUD 1.3158
PLN 3.95685

The VLOOKUP command looks for the currency code in the first column and returns the rate from the second column (use ‘FALSE’ to ensure an exact match). The conversion is simply a multiplication. (Note that you can convert from USD to USD at a rate of 1.0.) But what about making sure that the exchange rate is correct for the date of the data, and what if you have data from multiple years?

Performing VLOOKUP when matching a pair of variables

Thanks to Professor Marie Dutordoir for suggesting this kind of technique!

Usually you will want to look at accounts data over several years and for several companies. If you are working with several currencies then you will need a more complicated solution to this VLOOKUP exchange rate table, considering a day/month/year for each rate-currency pair. You do not need a second lookup table or other Excel functions, you just need to have another variable of date and a variable of currency-date pairs.

Global company accounts data

Global company accounts data. Given column G (currency) and column H (year), create new column F (map = currency&year).

The new column F (mapping) is formed from currency and year, so F2 contains “=G2&H2” (see image above). The ‘currencies’ named range now begins with a different first column produced in the same way (see image below). A VLOOKUP always matches on the first column of its range, and now this one pulls the rate from the fourth column, so rate is “=VLOOKUP(F2, currencies, 4, FALSE)” for row 2.

In the ‘currencies’ named range, I have combined the exchange rate calculation and lookup into one formula (column K). Repeat this for each data type that you wish to convert the currency for. Remember to have an entry for USD at rate 1 for each year.


Stacked historical exchange rates in a named range ‘currencies’. Created column A (map) from B&C.

This example only has the average exchange rate for each year, you may wish to extend this approach for monthly or even daily rates. If you do this, you will need to take extra care with the format of the mapping cells, but this exceeds the scope of the post.

Where to find historical exchange rates?

Use the Datastream Navigator to find exchange rates

Use the Datastream Navigator to find exchange rates. Start by searching one category (Exchange Rates), limit it “To Currency: United States Dollar”, then use the search box for the source currency.


Company Identifiers in Datastream.

Identifiers can take many forms (e.g. Company Name, Ticker, CUSIP, SEDOL, Datastream Code, ISIN) and be national, such as CUSIP for US / Canadian companies or international, such as ISIN (International Security Identification Number). The ISIN code incorporates the national identifier. For example, for Tesco PLC: GB0008847096, which includes the SEDOL (UK company identifier) code: 0884709.


Company Lists and Datastream


In seeking to further their dissertation research, MSc students often make use of more than one research database. Consequently, a typical scenario involves a request to import a list of company identifiers into Datastream, then conducting further searches to secure data, prior to analysis. The ‘Create List (From Range)’ function is useful in this regard.


Create List (From Range)


This function within the Excel Add-In version of Datastream is extremely effective when working with lists (up to 5,000 Series [companies] per list). This enables Company Identifiers to be copied and pasted into Excel and saved as a List on the specific computer being used (Store List Locally) or saved to the Datastream mainframe computer (Upload List) – accessible from computers with Datastream installed.

The code generated to represent the list of companies can then be entered directly into a ‘Static‘ or ‘Time Series‘ request search screen in Excel. Hence, one or more Datatypes can be obtained for an entire list of companies and the results would be displayed in a single Excel sheet.

Company Identifiers which work with the ‘Create List (From Range)’ function include: ISIN, Ticker, Datastream Code. Both Company Name and CUSIP don’t work, even though CUSIP is meant to.


Example Search


Within Excel, with the Datastream tab selected, enter (paste in) the company identifiers. Next, Select the identifiers (ISINs below) and then click on ‘Create List (From Range)’. To illustrate the process, a short list of two US companies will be used (Apple, Microsoft) – this could of course be a list of many hundreds of companies.


Datastream - Create List (From Range)

Datastream – Create List (From Range)


From the ‘List Creation for Excel’ dialog box, the default option is ‘Store List Locally’ (on the computer being used at the time), with ‘Upload List’ as an option.


Create List - Store Locally

Create List – Store Locally


This generates a confirmation message – click ‘OK’.


Create List - Confirmation Message

Create List – Confirmation Message


The ‘List File Name’ [ New_0032.LLT ] is entered in the Series/Lists field in this Static Request (as at 03/06/2016) to locate: Company Name [NAME], CUSIP [WC06004], Ticker [WC05601], Datastream Code [DSCD] and Market Value [MV].


Datastream - Static Request

Datastream – Static Request


Results:  widen columns as necessary, to display data.





It would be possible to repeat the above process, to create lists with different identifiers         (e.g. Datastream Code, Ticker) and also select ‘Time Series’ request for historical data. For Ticker identifiers, it is advisable to format the cells in the Excel column to ‘Text’ (before identifiers are pasted in) where US companies can have an ‘@’ symbol as part of the identifier (e.g. @AAPL, for Apple). If the cells are not formatted as Text, the ‘@’ symbol will be interpreted by Excel as the start of a function.


Incompatible Datatypes


If a list contains both US and UK Ticker identifiers, this can cause a problem where an incompatible datatype is specified in a search request. For example, if CUSIP (datatype: WC06004) is included, this means the search will fail (i.e. it gives no results), as UK companies do not have CUSIP codes.

Converting from Ticker to Datastream Code identifiers is a way around this difficulty. The Datastream Code identifiers could then be used to create a new list: for Apple (992816), Microsoft (719643), Tesco (900803) and J Sainsbury (926002).

A Static Request (03/06/2016) search for datatypes:  Company Name,  CUSIP and Market Value is successful (i.e. it gives results) and merely has a blank cell under CUSIP for Tesco and             J Sainsbury.



Datastream Code List - Static Request Results

Datastream Code List – Static Request Results


Getting S&P 500 constituents from CRSP via WRDS Cloud using SSH and SAS

2 June 2016 1 comment

The title of this blog post uses far more abbreviations than I would like! Before I get into the purpose of this blog post I shall define them.

  • S&P 500: Standards & Poor’s 500, an index of the 500 largest US companies (by market capitalisation) on the NASDAQ or NYSE stock indices.
  • CRSP: The Center for Research In Security Prices, a database of end-of-day and end-of-month prices for US equity indices.
  • WRDS: Wharton Research Data Services, a platform of financial and company databases with a common user interface, including CRSP.
  • SSH: Secure Socket Shell, a network protocol that provides a secure connection between remote computers, where commands are typed out.
  • SAS: Statistical Analysis System, a software tool for data analysis that can run on remote servers or your local PC.

Outline of this blog post

There are 500 companies that make up the S&P 500 equity index. This list can change from month to month as the market capitalisation of each company changes and the list needs to comprise the largest 500. When conducting historical analysis against this index it is important to have the right list of constituents for that period. There are various ways to obtain historical constituent lists [also link2],  but one way not previous discussed on this blog is via the CRSP master list, available on the WRDS Cloud to SAS programs executed over SSH or PC-SAS. This post shows how to log in remotely, write a short SAS script and download the master list.

Edit 6 June 2016: This post explains how to get the constituent list of equities by PERMNO identifier, a five-digit number that never changes for a company, given by CRSP. You can get the equivalent GVKEY list from Compustat quite easily and from the web-based WRDS access. Armed with the PERMNO list it is easier to interact with CRSP and for example get stock returns for the S&P500 constituents. [Thanks to Alex Taylor for contributing this information.]

Please note, the SSH access route to WRDS is only available to academic research staff and PhD students – sorry to MSc and undergraduates!

Getting into WRDS Cloud using SSH/SFTP/SCP

You will need to use a tool like WinSCP to see the WRDS Cloud remote computer, to “secure copy” files between it and your PC. WinSCP is a free graphical tool, familiar if you have used FTP before; it gives a view of a directory on your computer (left) and a view of a directory on the server (right). (FileZilla and Cyberduck are alternatives for Windows or Mac.)

WinSCP login

The WinSCP login screen. My user name is blacked out for security purposes.

Login with the following details.

  • File protocol: SFTP
  • Host name:  wrds-cloud.wharton.upenn.edu
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
WinSCP files

The standard WinSCP view of files on your computer (left) and files on the server (right). The red ringed icon is the Create New Directory button. I have used it to create a directory called “dsp500list”.

Your personal file space on the server will appear on the right, initially none. The directory labelled “..” is a short-cut meaning “the directory one up from this one (parent)”. The path of this directory is “/home/manchester/<wrdsusername>”. Your new SAS script will be saved here. Create a new directory on the server to save your work in, called “dsp500list” (press F7 or the icon hightlighted above).

The WinSCP tool will allow you to easily copy files but will not let you execute commands (run programs). For this, you will need a tool like Putty (or just the Terminal on Mac). Log in with the details below.

  • Connection type: SSH
  • Host name:  wrds-cloud.wharton.upenn.edu
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
Putty login

The Putty configuration and terminal screens. You will be prompted for user name and password later (press Enter after each). Commands are typed at the “$” prompt at the bottom of the screen.

You will need to use a few Unix commands to get about here. The most important one to start with is “chdir” to change directory. Type “chdir dsp500list” to change the working directory to the one you created above. We will put the SAS script and results file here.

Writing a SAS script

The SAS script to copy, saved to file dsp500list_0.sas:

options nosource nodate nocenter nonumber ps=max ls = 72;
title ' ';
options source;
proc print data=crsp.dsp500list noobs label;

It is easiest to write this on your PC in your chosen text editor then secure copy it to the “dsp500list” directory via WinSCP. (You may choose instead to type it directly in the Putty terminal using the command line editor “vim“, but explaining that is out of scope of this post.)

WinSCP copy

On the right of WinSCP, double-click the “dsp500list” directory to enter it, then drag the new SAS script from the local hard drive (left) into it (select “Copy” not “Move”).

This script accesses the SAS dataset stored at the location “/wrds/crsp/sasdata/q_indexes/dsp500list.sas7bdat”, the daily S&P500 list changes. You could use the monthly file “msp500list” instead by changing the “data=crsp.dsp500list” part of the script accordingly.

Getting the data out

Run qsas dsp500list_0.sas on the Putty command line to execute the SAS query. It will take a few seconds but then there will be two files produced.

  1. dsp500list_0.log – a log file of the working of SAS, telling you exactly what is happening and any error messages.
  2. dsp500list_0.lst – the output text file, effectively the same data as the daily S&P 500 list.
QSAS results

Running the command “qsas dsp500list_0.sas” from the correct directory in Putty will execute the script. The resulting files can be seen with the “ls” command in Putty or by refreshing in WinSCP (green round arrows icon).

The output text file dsp500list_0.lst begins like this, although in plain text with fixed width columns and some blank lines at the start.

CRSP Permanent Number Date when the stock included in S&P500 Index Date when the stock excluded from S&P500 Index
10006 19570301 19840718
10030 19570301 19690108
10049 19251231 19320101

This file can be used in your local SAS scripts, perhaps with a little work converting it back to a true sas7bdat file — this exercise is left for the reader to complete!


This is a lot to take on if you have never worked with Unix systems before, and I would advise the regular WRDS web interface for most users. If you need to use WRDS Cloud to run your SAS scripts to directly access files such as the master CRSP lists, this post shows you how to get started.

See also the WRDS Cloud manual and Intoduction to WRDS Unix system.

Convert CRSP date YYYYMMDD to Excel true date

10 May 2016 1 comment

If you have a data file from CRSP (or many other sources), the date may be recorded as a text string of the form YYYYMMDD (for example, 20151231 for the last New Year’s Eve). If you wish for Excel to interpret this as a date, for the purpose of analysis, this can be done with the four formulae below.

=DATE(LEFT(A2,4), MID(A2,5,2), RIGHT(A2,2))

This reads the following fields.

  • Year from the first 4 characters of cell A2.
  • Month from the middle 2 characters of cell A2 starting from position 5.
  • Day from the last 2 characters of cell A2.

See the image below for an example.

Excel date from CRSP YYYYMMDD

Use Excel’s DATE formula to interpret text or numbers as a date, for analysis purposes. You may need the LEFT, MID and RIGHT formulae to pull out the year, month and day values from another cell.

Now that the dates are recorded this way, Excel can interpret them correctly when doing analysis or when using features like pivot tables.

Categories: Business Databases Tags: ,

Historical Index Constituents in Bloomberg

12 April 2016 2 comments

This post follows on from Historical Index Constituents (e.g. S&P 500) written in October 2014.

It is easy to find the current constituents for an equity index such as the FTSE 100 or S&P 500 but finding the historical members for a given date or range of dates requires more work. This blog post shows briefly how to use four screens in Bloomberg.

Bloomberg screen SECF

Use Security Finder SECF to find the code for the Equity Index. Choose Category: Index/Stats and Sub-Category: Equity Index. You can type part of the name or description to further filter the list.

In the Bloomberg search bar, type ‘SECF<GO>’ (that is, SECF then the Enter key) to use the Security Finder to find the Equity Index you need. You could just type the name of the Index in the search box and see if the correct Index is suggested. In this example, click on SPX Index (S&P 500 Index). A menu will appear, choose Member Weightings MEMB.

Bloomberg screen MEMB

For an Index, the MEMB screen shows constituent members (and weightings) for a given date. You can add extra Fields such as ISIN or CUSIP identifiers to use later.

You can get straight to the Member Weightings screen if you know the code of the index. Type, for example, ‘SPX <Index> MEMB <GO>’ (that is, SPX then the F10 key then MEMB then Enter). The date can be changed in the ‘As of’ box at the top-right and extra columns can be added. From here, on the red menu, click Output > Excel to export the data to a new Excel document (requires the Excel add-in to be configured, see desktop icon “Install Bloomberg Office Add-ins” at Manchester).

Bloomberg Export MEMB to Excel

Open the output file, accept a warning message about file types.

If you requested ISIN or CUSIP identifier numbers, these can be used in other databases such as Datastream, Compustat or Capital IQ.

Getting constituents for a range of dates

Getting the membership for the start date and the end date in your range will obviously not consider any company that joined and left in the intervening time. It is common to collect the membership at month end for each month in the range, paste the results into one long list in Excel then use the Data > Remove Duplicates function to get a tidy list of each company once only.

Bloomberg output without duplicates

(1) Export each month’s constituents. (2) Copy and paste the Tickers into one long column for all months (without copying the heading ‘Ticker’ each time). (3) Select that column and choose Data > Remove Duplicates.

You can do this and keep all the columns (name, weight, ISIN etc.) but the image above just keeps Ticker. You can get all the other data again using the Bloomberg Excel add-in if you wish (with formulas such as =BDP(“AAPL US Equity”, “ID_ISIN”) where the first parameter is a reference to the cell with the Ticker).

Is there a way to do this more easily in Bloomberg? Well, yes and no. There is a screen in Bloomberg called Historical Index Member Changes CHNG.

Bloomberg screen CHNG

Historical Index Member Changes for a given Index and date range. But only works for some indices and the report cannot be exported.

For the FTSE 100, type ‘UKX <Index> CHNG <GO>’. Choose the date range and click the top ‘Enter 1<GO> to generate report’. A report will be generated and it will tell you the number of the report. Go to the Retrieve Reports screen RPT.

Bloomberg screen RPT

Retrieve reports generated in Bloomberg. View these on screen or save one page at a time to an image or printer.

You will see a line in the report for each member’s entry and exit to the index, with more detail. But you cannot export the list of member tickers to Excel or do anything else systematic like that. It will be better to use the MEMB screen and export each month.

Do you know a better way? Please let us know by commenting on this post.

See also Historical FTSE100 Index Constituents on Datastream, dated July 2012.

Categories: Business Databases Tags:

WRDS new interface for selecting output variables

6 April 2016 Leave a comment

Following on from the refresh last summer, the Wharton WRDS web portal has been updated again. It is now easier to find and select the output variables you want to use in your request.

WRDS variables 2016

Finding and selecting output variables in WRDS is now much easier

The screen above uses the CRSP monthly stock database as an example. In the ‘Query Variables’ step, there are now multiple tabs containing categories of variables (which used to be listed in one column of boxes). The first tab contains all the variables, the subsequent tabs are categories like identifiers, balance sheet items, profit & loss items, and so on. There are orange and white arrows to scroll left and right between the tabs.

There is a search field in the first tab where you can type the name of the variable or any substring of it. For example, type “code” to get any variable with “code” in its name, it does not have to begin with “code”. Tick each match that you want to use and it appears in the column on the right, labelled ‘Selected. You can browse through all the tabs and see the same master list of Selected variables appear in each. This is much easier than using the browser’s ‘CTRL+F’ find function that was required before.

New feature: Saved Codelists

WRDS saved lists

You can save lists of codes to use later

Another new feature is the Saved Codelists. Usually we recommend people upload a text file containing the codes for the companies (or other entities) that they wish to run a query on. WRDS lets you type in or select companies one-by-one as well (useful for looking up a small number of companies). There is a blue link to ‘Code Lookup’ in most of the WRDS databases which lets you type the whole or part of the name of a company to find its code.

The new feature is that you can save these short lists to use later. The screen above shows Bureau van Dijk Amadeus. Simply tick the first box highlighted in the image above (to the right of the codes) and type a name for the list. On a later visit to WRDS, you can find the list in the second box highlighted above, ‘Select Saved Codelists’.


Get every new post delivered to your Inbox.

Join 1,174 other followers