Author Archive

Extending your Excel worksheet formulas via Python, for Capital IQ

17 March 2017 3 comments

Downloading bulk data from financial databases can be a tiring process. You may need panel data structured in a very specific way for analysis in a tool like Stata, and the data provider provides data in a different and limited way; perhaps for one company or one year at a time, leaving you to build it into the shape you need. Each product, like Bloomberg, Datastream, and Capital IQ, provide a different Excel interface. Sometimes you can build a formula for each data item for each year for each company, then fill or copy it across or down to save time. This extended blog post will show you how to do this for the Capital IQ Office Plug-in, and how to save time extending it for panel data using a short Python script.

What kind of data are we looking at?

First, let’s consider the types of variables in turn. In no particular order:

  1. Entities: That is, Companies, Funds, Equities, Indices, Persons…
  2. Time: Usually one of Years, Quarters, Months, Days…
  3. Data items/types: Time series or line items such as Total assets, Turnover, Closing price…

If we consider any one variable or dimension at a time, this is easy; it’s just a list of identifiers or numbers.

  2. 1995, 1996, 1997

We can easily create a table to list any two of these dimensions simply.

For example: (fictional) figures for one company:

Item\Year 2001 2002 2003
Total assets ($m) 6.3 7.2 8.9
Net profit ($m) 2.3 4.7 6.1

Or, (fictional) figures for Year 2004:

Company\Item Employees (thousands) Credit rating

But if we want to look at three at once, it gets more complicated. More on that later.

Getting data from Capital IQ Office Plug-in

This post assumes you are familiar with using Capital IQ and its Excel add-in. If not, search for other posts on Capital IQ or read the hand-out guides we have written before proceeding.

In short, you can use an Excel formula to pull data from the Capital IQ servers to your spreadsheet. These can be created in the Formula Bulider menu, or typed out manually. For example, to get total assets, the formula will be =CIQ(“IQ18671”, “IQ_TOTAL_ASSETS”, “FY2012”) if the company identifier is IQ18671 and the selected date is financial year 2012. [Note that the double-quote characters appear ‘curved’ here in WordPress and you’ll need to retype them clean in Excel.]

Tip: The data retrieved will only load on a PC with the Capital IQ Excel add-in set up; you will need to copy then paste-as-values, or save as CSV to keep the data for use elsewhere.

Using the fill down/across command

The formula includes several arguments — the options separated by commas — which are “hard coded” – the values for which company, date or item are included as text strings.

Basic Capital IQ formula

The CIQ formula in the yellow cell is shown shown in the formula editor. It references the company in A2, the item “IQ_TOTAL_ASSETS” and the date “FY2012”.

You can replace these arguments with cell references, to a cell that contains the same text, or part of it.

Replacing the item with a cell reference

The yellow cell’s formula now includes a cell reference B$1 for the item type. Note the dollar sign locks the row number when we fill the formula down.

This will allow you to use the Excel feature fill down or fill across, to extend the formula down or right.

Fill the formula down and right

The cell reference for date follows the text “FY” then an ampersand, as the cell B2 contains just the year number. Note the dollar sign in B$2, and now also in the company cell reference $A3, so that the formula fills right correctly.

It will work for any argument, including item or date, although you need to take more care with date. In the example above we use year, and Capital IQ expects the year number to follow the letters “FY” for Financial Year (or “CY” for Calendar Year). We must therefore use the ampersand ‘&’ character to join the strings.

Adding the third dimension

So far we have looked tables to represent two of our three dimensions. There are six permutations for this:

  • Companies down and dates across, for one item
  • Companies down and items across, for one date
  • Dates down and items across, for one company
  • and each of those transposed the other way around.

So if we want to show all three, we have to essentially stack more tables across or down. There are many more ways to arrange the data like this, such as:

  • Companies down, items across, dates across
  • Companies down, date across, items across

It might be easier to show these in a dummy table:

Company ID Item1-2001 Item2-2001 Item1-2002 Item2-2002 Item1-2003 Item2-2003
comp1  a  b  c  d  e  f
comp2  g  h  i  j  k  l


Company ID Item1-2001 Item1-2002 Item1-2003 Item2-2001 Item2-2002 Item2-2003
comp1  a  c  e  b  d  f
comp2  g  i  k  h  j l

You might prefer to stack into panels as below (this is like reshaping from wide to long in Stata). We have to repeat the first column (company ID) for each date… do not be tempted to use merged cells here! If you did, you won’t be able to use data filters, Excel tables or correctly export to other programs.

Company ID Year Item1 Item2
comp1 2001  a  b
comp1 2002  c  d
comp1 2003  e  f
comp2 2001  g  h
comp2 2002  i  j
comp2 2003  k  l

Often we start with a list of thousands of companies. It would be very time consuming to manually insert extra lines for each year after each company! This is where the script comes in…

Extending the process with a Python script

In order to perform a predictable and repetitive task, it can be a good idea to write a short script instead. The method below uses a combination of Excel and an external programming language, Python, and some human intervention. (I assume you have Python installed – the University PCs tend to have it, sometimes as part of SPSS.)

  1. Start with a column of the company IDs in Excel, and save this as a text file. You can do this by copying the column and pasting it into Notepad, remembering to delete any header rows. (You can instead paste it to a new workbook in Excel and save as Text (Tab delimited) (*.txt).) The file must be named ids.txt for the script to work.

    List of company IDs

    Save the list of IDs to a text file ids.txt

  2. Now for the Python script, which is shown at the end of the post for you to copy. Copy and paste this into a text editor like Notepad, although Notepad++ is included on University of Manchester PCs and is much better. If the list of years {1995, 1996, 1997} is not right for you, change it here! Save this as or similar – note the file extension is .py and not .txt – this is a Python script. Save it to the same directory as the ID file.

    Python code, showing whitespace

    Notepad++ showing Python code (you can copy-paste it from the end of this blog post), showing whitespace/tabs in orange. Python scripts must have consistent line indents, tabs or spaces.

  3. Run the script. To do this, open Windows Explorer, navigate to the folder you have your two new files in and hold Shift + Right-click then choose “Open command window here”. A black command window will open, type “python” (no quotes) and press enter. Hopefully no error message will show, you’ll get no message if it was successful.

    Run the script

    Shift+Right-click to get “Open command window here”, type “python” and press Enter. File template.txt is created.

  4. Find the newly created file, template.txt, and open it in Excel. There are several ways to open tab-delimited text files in Excel, any should do.
    Open .txt file in Excel, part 1

    Excel, open file, select type “Text Files (*.prn; *.txt; *.csv)” and choose template.txt. Accept any warning messages.

    Open .txt file in Excel - part 2

    This is a delimited text file, with Tab delimiters only.

  5. You can type the item names in C1, D1, E1 etc., such as “IQ_FINISHED_INV” and “IQ_TOTAL_ASSETS” (no quotes). You can type the template Capital IQ formula in C2, remembering to replace the literal arguments with cell references, and remembering to lock some of the references with dollar signs. For example, =CIQ($A2, C$1, “FY”&$B2)

    Fill the template columns and first formula

    Complete the green cell column headers (item codes). Fill in the yellow cell formula with correct cell references and locks, here it is =CIQ($A2, C$1, “FY”&$B2)

  6. Save the workbook as regular Excel file (*.xlsx). Now select the cell, hover the mouse over the bottom-right corner for the little black square icon, and drag the icon to the right to select that row in all the columns you need. The formula will copy across, and hopefully correctly. You may need to edit it and try again if the dollar sign locks went wrong. Now fill the formula down (just a few rows at first to test, then all the way down by double-clicking the black square icon).

    Final fill out

    Save before you start! Fill the formula right, then down a few rows, then further. Don’t just fill all the way to the bottom if there are thousands of rows, this might overload Capital IQ.

The Python code, which should have indents on the lines in each for loop:

fin = open('ids.txt')
fout = open('template.txt', 'w')
years = {"1995", "1996", "1997"}
# header
for line in fin:
   id = line.rstrip()
   for year in years:
       lineout = id + "\t" + year + "\n"

Moving on

  • The script is not the best example of Python, but it works! For example, the list of years should really be done better.
  • You could extend the script to include the items and Excel formulas, but for a one-off script, it is probably quicker to do some of the work in Excel by hand (using the fill function).
  • You could chose any other programming language over Python, including VBA which is built into Excel.
  • You could apply this technique to other databases with an Excel interface such as Bloomberg.

Related posts

Bond ISIN to company ISIN — EDSC manuals, tips & tricks

9 November 2016 Leave a comment

Suppose you created a list with bonds and the accompanying ISIN codes. You can’t use these ISIN codes to download information from the issuer of this bond. But here is way to use the ISIN codes you collected so far to find the ISIN code of the issuer An example: the ISIN code US44044KAA97 belongs […]

via Bond ISIN to company ISIN — EDSC manuals, tips & tricks

Finding ESG data in Datastream using the ASSET4 template

25 October 2016 Leave a comment

A detailed source of environmental, social and governance (ESG) data can be found in Thomson Reuters Datastream.

Instead of choosing the appropriate datatypes in a regular Datastream time series request, you can download the ASSET4 template. This is hard to find in the InfoBase support platform (replacement for Extranet) so I suggest you get it from the Datastream Excel add-in toolbar, although this method is not quite as easy as you would like.

Open Datastream in Excel the usual way (remember the desktop icon “DSSetup – Shortcut”). Click on the Datastream ribbon tab, go to the Request Tables group and click Sample Sheets. Tick the button ‘Equities : ESG ASSET4 – Sector Industry  analysis’ then click the Download button.

ASSET4 - find the template

Find the ASSEST4 template with the ‘Sample Sheets’ button and download it.

The Download & Open button does the same as the Download button, it doesn’t open the file.

ASSET4 template location

ASSET4 template downloads to a hidden folder.

The template takes the form of a macro-enabled Excel workbook and is downloaded to a hidden folder. Click on Start > Computer, click in the address bar and type the following:
C:\ProgramData\datastream\datastream advance\User

Note that ‘ProgramData’ is not the same as ‘Program Files’ or ‘Program Files (x86)’, it is a hidden folder. As you type the rest of the path, Windows should suggest auto-completions, which you can accept.

ASSET4 file location

ASSET4 file location, in the hidden C:\ProgramData folder.

Find downloaded ASSET4 (animated)

ASSET4 file location (animated)

From here, open the file and accept any warnings about enabling content.

ASSET4 intro

ASSET4 introduction

There are many sheets (tabs), starting with Home. Click on ESG Filters to select your search criteria. Choose (1) a sector or industry such as Telecommunications Services and (2) criteria across the environmental, social and corporate governance categories such as Water Use Total, Women Managers and Size of Board. When ready, click the Go button.

ASSET4 search criteria

ASSET4 search criteria

Wait a few minutes while the data is downloaded to the Data Table sheet.

ASSET4 data table

ASSET4 data table

The tables and charts of the many other sheets are automatically updated to reflect your search (except the Data Fields definitions sheet). For example, the Bottom 5’s sheet.

ASSET4 example results

ASSET4 example results

If you find a datatype you would like to use in a regular time series request, make a note of its code.

See also earlier post Environmental, Social and Governance (ESG) Data from July 2013.

Advanced support for using the Bloomberg Excel add-in

4 August 2016 2 comments


An example Bloomberg Excel template

This post is a collection of frequently and infrequently asked questions about the Bloomberg Professional API, most specifically the Bloomberg Excel Add-in. For the most part, it should be considered for experts, so please don’t be disheartened if it is too advanced for you; it’s effectively everything I know about the topic.

The questions are as follows.

  1. Where is the add-in? I don’t have it.
  2. Which ways can one use Bloomberg and Excel together?
  3. Where can I find the field codes?
  4. I don’t want to be restricted to Excel, can I use Java, C++, .NET, Python or R?
  5. Where can I get more help?

Where is the add-in? I don’t have it.

Install Bloomberg Excel add-in

  • If the ribbon tab labelled “Bloomberg” is not showing in Excel, close it then click on Start > All Programs > Bloomberg > Install Office Add-Ins. Follow the instructions, but be prepared to run the little install program a few times, perhaps opening and closing Excel and running it again.

Which ways can one use Bloomberg and Excel together?

Terminal copy or export

Bloomberg types of copy/export to Excel

Some of the ways to get data from Bloomberg to Excel (remember to work at the terminal though!). Top-left (FA): Red menu > Output > Excel > Current Template (data is live). Top-right (GP): Right-click on chart > Copy/Export Options > Copy Data to Clipboard (then paste in Excel). Bottom-left (MEMB): Red menu > Output > Excel (data is fixed). Bottom-right (CHNG): Not possible to export report data.

  • Various functions in the Bloomberg terminal let you get the on-screen data into Excel in different ways. Look for commands such as “Copy data to clipboard”, “Output > Excel” or similar, by clicking on the red menu bar (Actions) or right-clicking on charts or data.
  • Sometimes the action will download and open a new Excel document, either with the data written in directly or loading later via the Bloomberg API. Sometimes the data is copied to the clipboard for you to paste into a worksheet of your choice.
  • Remember that a sheet which contains Bloomberg formulas to load live data may not load on a PC without Bloomberg unless you save as CSV or copy/paste-as-values. (This is also true for all the Excel options below.)

Excel templates

Excel template library

Bloomberg template library, browsed in Excel. Could also use XLTP function in terminal.

  • Usually most useful if you are looking up one company, bond, exchange rate or commodity, a Bloomberg template will give you a detailed Excel workbook filled with data and visualisations that are updated live from Bloomberg via the Excel API. Amber coloured fields are editable, often to change the company, country, sector, date or other variable. (See the first image in the post.)
  • The templates can be found in the terminal with the XLTP<GO> function, and in Excel under Bloomberg > Explore > Template Library.

Excel import

Bloomberg Excel historical end of day

Use the Import Data menu in Excel to get historical end of data and other data.

  • A commonly used feature that is described in our Bloomberg Workbook (available in the Bloomberg Suite and at the Precinct Library) is the Historical End of Day wizard. In Excel, click Bloomberg > Import > Import Data > Real-Time / Historical > Historical End of Day.
  • The wizard will let you type security identifiers or select from a common index, then choose your data types and data range. It will then produce the results in the cell you selected.

Excel function builder

Bloomberg Excel function builder

Build a function from scratch. The formula in cell B1 is =BDP(“AAPL US EQUITY”, “INDUSTRY_SECTOR”) and the value is Technology.

  • If you want a little more control, use the function builder, found in Bloomberg > Create > Function Builder. This more advanced tool will expose the Bloomberg API to you, starting by asking you to choose one of three major Bloomberg Functions:
    • BDP: (Bloomberg Data Point) Import a single data point of current data.
    • BDH: (Bloomberg Data History) returns the historical data for a selected security.
    • BDS: (Bloomberg Data Series) imports a set of bulk data such as peers.
  • For your chosen function, you will be asked to type in a security (such as “AAPL US EQUITY”), a field (such as “INDUSTRY_SECTOR” or “PX_LAST”) , and dates (depending on the function).
  • The tool will suggest auto-completion if you don’t happen to know the exact security or field code. It will only suggest valid responses.
  • You can add optional extra parameters such as orientation=H|V, currency, or “array=True” which puts all the output data into one cell instead many rows/columns (requires array formulas afterwards). Note that row and column counts will be added as extra parameters automatically after the formula has called.
  • The security, field and dates can be written into the formula or referenced from other cells.
  • Notice that the security ID needs to end with what kind of entity it is, so equities end “EQUITY”, bonds end “CORP” etc.

Excel manual function creation

Bloomberg Excel function builder manual edits

In the formula in cell B1, by replacing the security ID with a cell reference A1, you can then copy the formula down or across.

  • Once you have used the function builder, you will have a working formula that you may wish to copy out for each of your many securities, fields or dates. If you use a cell reference for this variable, you may copy the formula across or down. For example, you can have a list of security identifiers in column A {AAPL US EQUITY, IBM US EQUITY, VOD LN EQUITY, …} and the formula in column B =BDP(A1, “INDUSTRY_SECTOR”) and copy down the formula in column B.
  • What if your formula produces data in two dimensions and you need to leave a gap between each call, for example with amendment history of bonds? I have addressed that problem by writing a Python script to prepare the formulas and spacing. Assuming noblanks.txt is a file with one security ID per line (without the “CORP” bit) and withblanks.txt is our output:
fin = open('noblanks.txt')
fout = open('withblanks.txt', 'w')
for line in fin:
    id = line.rstrip()
    fout.write(id + "\t=BDS(\"" + id + "Corp\",\"AMENDMENT_HISTORY\",\"cols=3;rows=100\")")
    fout.write('\t\n' * 100)

The generated file is tab-delimited and can be opened in Excel for Bloomberg to action. The first column is the security ID (without the “CORP” bit), the second column contains the formulas, and there are 100 blank rows between each Bloomberg call to ensure enough space. There is room to improve this approach!

Where can I find the field codes?

The mnemonic codes for each field are not the same as in the Bloomberg terminal but can be looked up using the FLDS<GO> function. The auto-complete feature of the function builder in Excel is a good alternative.

I don’t want to be restricted to Excel, can I use Java, C++, .NET, Python or R?

  • In theory, yes, that is possible, although you will need technical support and all the necessary development environment to be set up on a Bloomberg terminal. In summary, the APIv3 needs be installed (from the WAPI<GO> function) which provides the necessary libraries for Java, C, C++ and .NET.
  • To use R (or RStudio), you’ll need to connect via Java (with the standard rJava library and Rbbg library from the repository). Python connects via the C library.
  • This is too advanced to be part of the regular Business Data Service, sorry!

Last year, I worked with PhD candidate Ali Bayat, and we got the following R script working from RStudio with R 3.1.3 and Java 8u22. [Thank you, Ali.]

install.packages("Rbbg", repos = "")
conn <- blpConnect()
bdp(conn, "AMZN US Equity", "NAME")

Where can I get more help?

  • As mentioned, our white binder Bloomberg Workbook (at the Bloomberg Suite and Precinct Library) describes how to use the historical end of day import from Excel.
  • There is help throughout the Excel add-in (look for white question mark in a blue circle icons). The templates all have a help sheet (coloured green tab). Remember to press the F1 key at any terminal function to get context-sensitive support.


Reshaping wide and long data in Stata

29 July 2016 1 comment
Reshape from long to wide via Stata

Reshape from long data (left) to wide data (right) via Stata

Downloading data from a platform like WRDS can give you the data you need but not in the layout you desire. This post shows you how to change from long data to wide data; that is, from a stacked column of companies to a column for each company, using date as unique identifier. It requires the use of Stata, available on PCs at the Library Finance Zone. (Version 14 was used in this test but it should work just as well on others.)


Excel data (long)

Long data in Excel

In Excel, keep only the essential columns (you can add the others back later using VLOOKUP). In this example, some kind of ID (like PERMNO), Date and Return.The first row of the sheet contains the variable names. Save this file as type CSV (Comma delimited) *.csv, ignoring any warnings about features or multiple sheets. Close the file.

Stata import text delimited

Import text data delimited in Stata

Open Stata, click File > Import > Text data (delimited, *.csv). Click the Browse button to select your CSV file the click OK. You will see a command like the following was performed: import delimited P:\Reshape\long.csv. You can look at a summary of the data by typing list in the Command window.

Performing the command

The reshape command in Stata works in two modes.

  • Wide: instead of repeating dates and companies in long columns, create a new column for return for each company so that there is only one row for each date. This is also called long-to-wide. (You could instead have a column for each year with one row per company.)
  • Long: the reverse process, also called wide-to-long.

Here we are going from wide-to-long, with Date as the unique variable (i). We will lose the current Return column. Each new column will be Return for each company ID (j), with a name that is Return followed by that ID, such as Return100500, Return222622. The command to type in the Command window is:

reshape wide return, i(date) j(id)
Stata reshape

Reshape in Stata (click to enlarge)

After this is typed, you will get a summary of what just happened. If the text is in red, an error occurred, often if there are other variables that were not mentioned in the reshape command. You can list more than one variable to be reshaped (the command might then be reshape wide return volume price, i(date) j(id) for example).

Again, you can type the list command to see the data.


Stata export

Export to Excel from Stata

You probably wish to export the data to Excel format to continue with your work. To do this, click File > Export > Data to Excel spreadsheet (*.xls, *.xlsx). Click the Save as button to choose where you wish to save to, and if you want older (pre-2007) or current Excel format. Tick the box “Save variable names to first row in Excel file” and press OK. You will see a command generated like this:  export excel using “P:\Reshape\reshaped.xlsx”, firstrow(variables)

Excel results

Now you have wide data in Excel!

You can now open the new file in Excel to see the reshaped data set.

More information

The Institute for Digital Education and Research at UCLA have two excellent example pages, one for reshaping long-to-wide and one for reshaping wide-to-long.



Categories: Data Analysis Tags: , ,

Use pairs in one VLOOKUP with historical exchange rates from Datastream

20 July 2016 Leave a comment

One of the most useful Excel functions I help students with is VLOOKUP – extremely useful to automatically bring in data from one table to another. If you have company accounts data from Compustat Global (via WRDS) for multiple companies in multiple currencies, you will need to bring in the change rate for each currency to show every field in US Dollars.

1 Company Currency Total Assets Local Exchange rate Total Assets USD
2 Co. X AUD 100 =VLOOKUP(B2, currencies, 2, FALSE) =C2*D2
3 Co. Y PLN 400 =VLOOKUP(B3, currencies, 2, FALSE) =C3*D3

Where ‘currencies’ is a named range elsewhere in the workbook:

From Currency Rate
AUD 1.3158
PLN 3.95685

The VLOOKUP command looks for the currency code in the first column and returns the rate from the second column (use ‘FALSE’ to ensure an exact match). The conversion is simply a multiplication. (Note that you can convert from USD to USD at a rate of 1.0.) But what about making sure that the exchange rate is correct for the date of the data, and what if you have data from multiple years?

Performing VLOOKUP when matching a pair of variables

Thanks to Professor Marie Dutordoir for suggesting this kind of technique!

Usually you will want to look at accounts data over several years and for several companies. If you are working with several currencies then you will need a more complicated solution to this VLOOKUP exchange rate table, considering a day/month/year for each rate-currency pair. You do not need a second lookup table or other Excel functions, you just need to have another variable of date and a variable of currency-date pairs.

Global company accounts data

Global company accounts data. Given column G (currency) and column H (year), create new column F (map = currency&year).

The new column F (mapping) is formed from currency and year, so F2 contains “=G2&H2” (see image above). The ‘currencies’ named range now begins with a different first column produced in the same way (see image below). A VLOOKUP always matches on the first column of its range, and now this one pulls the rate from the fourth column, so rate is “=VLOOKUP(F2, currencies, 4, FALSE)” for row 2.

In the ‘currencies’ named range, I have combined the exchange rate calculation and lookup into one formula (column K). Repeat this for each data type that you wish to convert the currency for. Remember to have an entry for USD at rate 1 for each year.


Stacked historical exchange rates in a named range ‘currencies’. Created column A (map) from B&C.

This example only has the average exchange rate for each year, you may wish to extend this approach for monthly or even daily rates. If you do this, you will need to take extra care with the format of the mapping cells, but this exceeds the scope of the post.

Where to find historical exchange rates?

Use the Datastream Navigator to find exchange rates

Use the Datastream Navigator to find exchange rates. Start by searching one category (Exchange Rates), limit it “To Currency: United States Dollar”, then use the search box for the source currency.


Getting S&P 500 constituents from CRSP via WRDS Cloud using SSH and SAS

2 June 2016 1 comment

The title of this blog post uses far more abbreviations than I would like! Before I get into the purpose of this blog post I shall define them.

  • S&P 500: Standards & Poor’s 500, an index of the 500 largest US companies (by market capitalisation) on the NASDAQ or NYSE stock indices.
  • CRSP: The Center for Research In Security Prices, a database of end-of-day and end-of-month prices for US equity indices.
  • WRDS: Wharton Research Data Services, a platform of financial and company databases with a common user interface, including CRSP.
  • SSH: Secure Socket Shell, a network protocol that provides a secure connection between remote computers, where commands are typed out.
  • SAS: Statistical Analysis System, a software tool for data analysis that can run on remote servers or your local PC.

Outline of this blog post

There are 500 companies that make up the S&P 500 equity index. This list can change from month to month as the market capitalisation of each company changes and the list needs to comprise the largest 500. When conducting historical analysis against this index it is important to have the right list of constituents for that period. There are various ways to obtain historical constituent lists [also link2],  but one way not previous discussed on this blog is via the CRSP master list, available on the WRDS Cloud to SAS programs executed over SSH or PC-SAS. This post shows how to log in remotely, write a short SAS script and download the master list.

Edit 6 June 2016: This post explains how to get the constituent list of equities by PERMNO identifier, a five-digit number that never changes for a company, given by CRSP. You can get the equivalent GVKEY list from Compustat quite easily and from the web-based WRDS access. Armed with the PERMNO list it is easier to interact with CRSP and for example get stock returns for the S&P500 constituents. [Thanks to Alex Taylor for contributing this information.]

Please note, the SSH access route to WRDS is only available to academic research staff and PhD students – sorry to MSc and undergraduates!

Getting into WRDS Cloud using SSH/SFTP/SCP

You will need to use a tool like WinSCP to see the WRDS Cloud remote computer, to “secure copy” files between it and your PC. WinSCP is a free graphical tool, familiar if you have used FTP before; it gives a view of a directory on your computer (left) and a view of a directory on the server (right). (FileZilla and Cyberduck are alternatives for Windows or Mac.)

WinSCP login

The WinSCP login screen. My user name is blacked out for security purposes.

Login with the following details.

  • File protocol: SFTP
  • Host name:
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
WinSCP files

The standard WinSCP view of files on your computer (left) and files on the server (right). The red ringed icon is the Create New Directory button. I have used it to create a directory called “dsp500list”.

Your personal file space on the server will appear on the right, initially none. The directory labelled “..” is a short-cut meaning “the directory one up from this one (parent)”. The path of this directory is “/home/manchester/<wrdsusername>”. Your new SAS script will be saved here. Create a new directory on the server to save your work in, called “dsp500list” (press F7 or the icon hightlighted above).

The WinSCP tool will allow you to easily copy files but will not let you execute commands (run programs). For this, you will need a tool like Putty (or just the Terminal on Mac). Log in with the details below.

  • Connection type: SSH
  • Host name:
  • User name and Password: your regular WRDS account details (PhD and staff only)
  • When prompted, choose “Yes” to save the key to the local machine.
Putty login

The Putty configuration and terminal screens. You will be prompted for user name and password later (press Enter after each). Commands are typed at the “$” prompt at the bottom of the screen.

You will need to use a few Unix commands to get about here. The most important one to start with is “chdir” to change directory. Type “chdir dsp500list” to change the working directory to the one you created above. We will put the SAS script and results file here.

Writing a SAS script

The SAS script to copy, saved to file

options nosource nodate nocenter nonumber ps=max ls = 72;
title ' ';
options source;
proc print data=crsp.dsp500list noobs label;

It is easiest to write this on your PC in your chosen text editor then secure copy it to the “dsp500list” directory via WinSCP. (You may choose instead to type it directly in the Putty terminal using the command line editor “vim“, but explaining that is out of scope of this post.)

WinSCP copy

On the right of WinSCP, double-click the “dsp500list” directory to enter it, then drag the new SAS script from the local hard drive (left) into it (select “Copy” not “Move”).

This script accesses the SAS dataset stored at the location “/wrds/crsp/sasdata/q_indexes/dsp500list.sas7bdat”, the daily S&P500 list changes. You could use the monthly file “msp500list” instead by changing the “data=crsp.dsp500list” part of the script accordingly.

Getting the data out

Run qsas on the Putty command line to execute the SAS query. It will take a few seconds but then there will be two files produced.

  1. dsp500list_0.log – a log file of the working of SAS, telling you exactly what is happening and any error messages.
  2. dsp500list_0.lst – the output text file, effectively the same data as the daily S&P 500 list.
QSAS results

Running the command “qsas” from the correct directory in Putty will execute the script. The resulting files can be seen with the “ls” command in Putty or by refreshing in WinSCP (green round arrows icon).

The output text file dsp500list_0.lst begins like this, although in plain text with fixed width columns and some blank lines at the start.

CRSP Permanent Number Date when the stock included in S&P500 Index Date when the stock excluded from S&P500 Index
10006 19570301 19840718
10030 19570301 19690108
10049 19251231 19320101

This file can be used in your local SAS scripts, perhaps with a little work converting it back to a true sas7bdat file — this exercise is left for the reader to complete!


This is a lot to take on if you have never worked with Unix systems before, and I would advise the regular WRDS web interface for most users. If you need to use WRDS Cloud to run your SAS scripts to directly access files such as the master CRSP lists, this post shows you how to get started.

See also the WRDS Cloud manual and Intoduction to WRDS Unix system.