Archive for the ‘Data Analysis’ Category

Reshaping wide and long data in Stata

29 July 2016 1 comment
Reshape from long to wide via Stata

Reshape from long data (left) to wide data (right) via Stata

Downloading data from a platform like WRDS can give you the data you need but not in the layout you desire. This post shows you how to change from long data to wide data; that is, from a stacked column of companies to a column for each company, using date as unique identifier. It requires the use of Stata, available on PCs at the Library Finance Zone. (Version 14 was used in this test but it should work just as well on others.)


Excel data (long)

Long data in Excel

In Excel, keep only the essential columns (you can add the others back later using VLOOKUP). In this example, some kind of ID (like PERMNO), Date and Return.The first row of the sheet contains the variable names. Save this file as type CSV (Comma delimited) *.csv, ignoring any warnings about features or multiple sheets. Close the file.

Stata import text delimited

Import text data delimited in Stata

Open Stata, click File > Import > Text data (delimited, *.csv). Click the Browse button to select your CSV file the click OK. You will see a command like the following was performed: import delimited P:\Reshape\long.csv. You can look at a summary of the data by typing list in the Command window.

Performing the command

The reshape command in Stata works in two modes.

  • Wide: instead of repeating dates and companies in long columns, create a new column for return for each company so that there is only one row for each date. This is also called long-to-wide. (You could instead have a column for each year with one row per company.)
  • Long: the reverse process, also called wide-to-long.

Here we are going from wide-to-long, with Date as the unique variable (i). We will lose the current Return column. Each new column will be Return for each company ID (j), with a name that is Return followed by that ID, such as Return100500, Return222622. The command to type in the Command window is:

reshape wide return, i(date) j(id)
Stata reshape

Reshape in Stata (click to enlarge)

After this is typed, you will get a summary of what just happened. If the text is in red, an error occurred, often if there are other variables that were not mentioned in the reshape command. You can list more than one variable to be reshaped (the command might then be reshape wide return volume price, i(date) j(id) for example).

Again, you can type the list command to see the data.


Stata export

Export to Excel from Stata

You probably wish to export the data to Excel format to continue with your work. To do this, click File > Export > Data to Excel spreadsheet (*.xls, *.xlsx). Click the Save as button to choose where you wish to save to, and if you want older (pre-2007) or current Excel format. Tick the box “Save variable names to first row in Excel file” and press OK. You will see a command generated like this:  export excel using “P:\Reshape\reshaped.xlsx”, firstrow(variables)

Excel results

Now you have wide data in Excel!

You can now open the new file in Excel to see the reshaped data set.

More information

The Institute for Digital Education and Research at UCLA have two excellent example pages, one for reshaping long-to-wide and one for reshaping wide-to-long.



Categories: Data Analysis Tags: , ,

Company Identifiers in Datastream.

Identifiers can take many forms (e.g. Company Name, Ticker, CUSIP, SEDOL, Datastream Code, ISIN) and be national, such as CUSIP for US / Canadian companies or international, such as ISIN (International Security Identification Number). The ISIN code incorporates the national identifier. For example, for Tesco PLC: GB0008847096, which includes the SEDOL (UK company identifier) code: 0884709.


Company Lists and Datastream


In seeking to further their dissertation research, MSc students often make use of more than one research database. Consequently, a typical scenario involves a request to import a list of company identifiers into Datastream, then conducting further searches to secure data, prior to analysis. The ‘Create List (From Range)’ function is useful in this regard.


Create List (From Range)


This function within the Excel Add-In version of Datastream is extremely effective when working with lists (up to 5,000 Series [companies] per list). This enables Company Identifiers to be copied and pasted into Excel and saved as a List on the specific computer being used (Store List Locally) or saved to the Datastream mainframe computer (Upload List) – accessible from computers with Datastream installed.

The code generated to represent the list of companies can then be entered directly into a ‘Static‘ or ‘Time Series‘ request search screen in Excel. Hence, one or more Datatypes can be obtained for an entire list of companies and the results would be displayed in a single Excel sheet.

Company Identifiers which work with the ‘Create List (From Range)’ function include: ISIN, Ticker, Datastream Code. Both Company Name and CUSIP don’t work, even though CUSIP is meant to.


Example Search


Within Excel, with the Datastream tab selected, enter (paste in) the company identifiers. Next, Select the identifiers (ISINs below) and then click on ‘Create List (From Range)’. To illustrate the process, a short list of two US companies will be used (Apple, Microsoft) – this could of course be a list of many hundreds of companies.


Datastream - Create List (From Range)

Datastream – Create List (From Range)


From the ‘List Creation for Excel’ dialog box, the default option is ‘Store List Locally’ (on the computer being used at the time), with ‘Upload List’ as an option.


Create List - Store Locally

Create List – Store Locally


This generates a confirmation message – click ‘OK’.


Create List - Confirmation Message

Create List – Confirmation Message


The ‘List File Name’ [ New_0032.LLT ] is entered in the Series/Lists field in this Static Request (as at 03/06/2016) to locate: Company Name [NAME], CUSIP [WC06004], Ticker [WC05601], Datastream Code [DSCD] and Market Value [MV].


Datastream - Static Request

Datastream – Static Request


Results:  widen columns as necessary, to display data.





It would be possible to repeat the above process, to create lists with different identifiers         (e.g. Datastream Code, Ticker) and also select ‘Time Series’ request for historical data. For Ticker identifiers, it is advisable to format the cells in the Excel column to ‘Text’ (before identifiers are pasted in) where US companies can have an ‘@’ symbol as part of the identifier (e.g. @AAPL, for Apple). If the cells are not formatted as Text, the ‘@’ symbol will be interpreted by Excel as the start of a function.


Incompatible Datatypes


If a list contains both US and UK Ticker identifiers, this can cause a problem where an incompatible datatype is specified in a search request. For example, if CUSIP (datatype: WC06004) is included, this means the search will fail (i.e. it gives no results), as UK companies do not have CUSIP codes.

Converting from Ticker to Datastream Code identifiers is a way around this difficulty. The Datastream Code identifiers could then be used to create a new list: for Apple (992816), Microsoft (719643), Tesco (900803) and J Sainsbury (926002).

A Static Request (03/06/2016) search for datatypes:  Company Name,  CUSIP and Market Value is successful (i.e. it gives results) and merely has a blank cell under CUSIP for Tesco and             J Sainsbury.



Datastream Code List - Static Request Results

Datastream Code List – Static Request Results


Changing Datastream data & Stata

24 March 2016 Leave a comment

A wonderful post from Research Financial again, this time giving a thorough worked example of using Stata to reshape Datastream data. Stata and Datastream are available at The University of Manchester Library in the Finance Zone.


The past few weeks I have been learning about and working with Stata. This program can handle a lot of data and uses commands to edit data or analyse it. A sequence of commands can be saved in .do files and then rerun as a script. There are many commands available and one of them is very handy when it comes to changing data from columns to rows. It is similar to the transpose option that Microsoft Excel offers for quick changes. It is the reshape command.

In this blog post I will use the reshape command to change Datastream data as an example. Similar work can be done for other downloads from databases like Amadeus or Bankscope.
For downloads it can take a bit of work to change the data and rework it before Stata can be used to merge it with other data. In the case of…

View original post 674 more words

Categories: Data Analysis Tags: ,

Company Employee Data Using Different Databases

24 February 2016 1 comment

How many employees work for a company?

This would appear to be a simple question. But perhaps not…

Although the company in question will have a more exacting answer as to values at any particular point in time, this would be considered commercially sensitive information. Therefore, values are typically made available at a single point in time, such as the end of the company’s reporting period. This is then accessible via a number of business databases.

Unfortunately, if you look in different databases, you tend to get different answers – for example, looking at Tesco PLC, which is a publicly quoted UK company, a number of values are specified depending on the database consulted:


Bloomberg Professional

This shows 386,086 employees and helpfully specifies a date point of 28 February 2015, but displayed in US date format [02/28/15]. This reflects Tesco’s reporting period dates – essentially the start of March to the end of February.

Bb Tesco EE



This gives a different value [506,984] but specifies the same date – the end of the reporting period.

T1 Tesco EE PDF


Datastream (Thomson Reuters)

The datatype is ‘WC07011 – Employees’ and the definition notes: Employees represents the number of both full time and part time employees of the company. It excludes: Seasonal employees, Emergency employees. Footnotes:  D. Average employees.  This gives a value of 506,984.

Datastream Tesco EE



This specifies a value of 386,086.

Fame Tesco EE


Capital IQ

When searching for Tesco PLC, the ‘Number of Employees’ is stated as: 517,802.

Capital IQ Tesco EE


Company Annual Report

Perhaps one way to obtain clarity is to look in the ‘Tesco Annual Report and Financial Statements 2015’, accessible through Thomson Research or PI Navigator databases, for example.

Page 20, shows the following breakdown, for a total of 517,802 (220,257 + 297,545).

ARS 2015 Tesco EE p20

If you were to stop at this point in the Annual Report, you would miss further details. Hence, on page 100, there are additional details:


ARS 2015 Tesco EE p100

This helps to show which values the various databases have selected. Thus, Capital IQ has used the combined values [220,257 + 297,545 = 517,802] from page 20 of the Annual Report, in contrast to the other databases.

Thomson have used the ‘Average Number of Employees’ measure [506,984] and Bloomberg/Fame have used the ‘Average Number of Full-Time Equivalents’ [386,086] for 2015 – that is, the end of the reporting period, specifically 28 February 2015.


The lesson to be drawn when using data from a particular database – always quote the source, to ensure clarity and remove uncertainty. In the case of Employee data, the company’s Annual Report is the ultimate source.

However, the difference in total values between page 20 [517,802 (220,257 + 297,545)] and those quoted on page 100 [506,984 and 386,086] is a mystery. Specifying a page number in your reference is therefore recommended!


Bloomberg Professional, Datastream, Thomson, Thomson Research, Fame and Capital IQ are databases available to current students and staff of The University of Manchester.


Company Results by Geographical Location

The ability to locate results by geographical location can be troublesome when having to consult multiple copies of a company’s Annual Report & Accounts.


Using a business database can make this task quicker and easier.



To get the best out of this database, it is useful to know that it is designed to work with the Internet Explorer browser.

In addition, it is advisable to select ‘Compatibility View Settings’ from the tools menu (represented by the ‘cog’ icon, then ‘Add’ and ‘Close’) to ensure all options display correctly.


T1 Tools - Compatibility View Settings


Company Search


Type in the company name in the search box at the top left of the screen and select:

T1 Tesco (Company Search)


Next, click on ‘Full Summary’ from the Company Overview screen.


T1 TEsco - Full Summary


From the Financial Reports page displayed, it is possible to reveal more search options by selecting the small icon towards the upper right of the screen.


Search Selections


T1 Geog Breakdown Tesco

With the ‘Summary’ tab selected at the top of the search screen, the Report categories available include Annual Key Items, Interim Key Items, Interim Key Items TTM, Business Segment and Geographic Segment.

For ‘Geographic Segment’ this means that data (‘Measure’ options: Sales, Assets, Capital Expenditure, Depreciation and Operating Income) are available for those areas in which results are reported, for a given year. In the case of Tesco PLC for Fiscal Period 2015* above (looking at data for the 12 months to 28 February 2015, displayed in US date format: 02/28/2015), this includes the UK, Asia and Rest of Europe.

Repeat the process for each year by selecting a new year and clicking on the ‘OK’ button to display data.


Whole Company Results – Multiple Years


For individual financial statements, select the appropriate tab at the top of the screen. These include Balance Sheet, Income Statement and Cash Flow Statement, with (Report type ‘Annual’ showing) data for 5, 10, 15, 20, 25 or All years.

T1 Tesco Income St Multiple Years


Data Definitions


Clicking on the small square icon at the end of an item in the financial statement being viewed will provide a pop-up window with a definition. For example, in the Income Statement for Tesco PLC above, ‘Operating Income’ is defined.

*Fiscal Period:  the period a company uses for accounting purposes – reporting results. In the case of Tesco PLC, March to the end of February.


Converting an SAS dataset into an Excel file

The Statistical package SAS is often used by researchers who have to deal with very large datasets. It is also used by WRDS (Wharton Research Data Services): a WRDS web query runs a SAS program (using the values on the web form as program parameters) and makes the result dataset available.

For some Eventus web queries on WRDS the results are only available as a SAS dataset. You can convert this to Excel (if it is not too large) using the SAS software.

SAS Export

SAS 9.4 Export wizard

SAS 9.4

  1.     Open the SAS dataset file
  2.     Select File and Export Data
  3.     Follow the SAS Export wizard (Excel is the default)
  4.     Select Finish to start the export
  5.     If the export fails, this is probably due to an incompatability between SAS and Microsoft Office/Excel. The simplest workaround is to use a different format: comma separated values (*.csv), Stata file (*.dta), SPSS file (*.sav) etc.

SAS Enterprise Guide

  •     Open the SAS dataset file
  •     Select File and Export
  •     Select .xls as output format

For detail on converting from SAS to Excel see the 64-bit-gotchas post on The SAS Dummy blog.

SPSS Statistics

If SAS is not available on your PC, you can open a SAS data file in SPSS Statistics and then export to Excel.

Getting data in and out of Eventus

13 February 2015 Leave a comment

WRDS top cornerEventus is a tool to perform event studies on US companies, pulling data directly out of the CRSP stock database. It is available through the WRDS portal for postgraduate students and members of staff at The University of Manchester.

The interface of Eventus is similar to the other platforms available through WRDS, but it can be a little more confusing to the newcomer. The post aims to help you get data in and out of Eventus, resulting in a usable Excel or CSV document. It does not aim to explain the mechanics of Eventus or what is really happening under the surface!

Preparing company identifiers and dates

Eventus 1 - source data

Mergers and acquisitions data, including acquirer PERMNO and announcement date

Eventus 2 - prepare columnsTypically, people who decide to use Eventus already have a list of acquiring companies, identified by PERMNO codes, and a corresponding list of dates to study (in YYYYMMDD format). Copy these two columns to a new Excel workbook in this order, then delete the header row if there is one.

Eventus 3 - ready to uploadThis workbook will need to be Saved as type: Text (Tab delimited) (*.txt). Ignore any warnings about the file type not accepting multiple sheets — you only need the active worksheet. Accept any warning about features not compatible with this file format. After the file has saved, close it, ignoring any further warnings about not saving changes.

You should now have a plain text file that begins like this if you open it in Notepad++ or similar.

Using Eventus

Your file is now ready to be uploaded to Eventus. Log onto WRDS, then choose Eventus from the drop-down list Select a Data Set on the top-left. Choose Cross-Sectional Analysis Daily. (Only the cross-sectional analysis options produce data that can be used in the methods described by this blog post.) You will see that CUSIP codes could have been chosen instead of PERMNO, but I suggest PERMNO where possible — there can be confusion over CUSIP codes sometimes, whether you need 6, 8, or 9-digit versions.

Eventus 4 - fill in fields

Launch Eventus Cross-Sectional Analysis, fill in the fields and upload the prepared file

Choose your new text file to upload. Fill in the options as you require, including the event period and alternative windows.

Eventus 5 - download

Download the files produced by Eventus; the first is the most useful (*.sas7bdat)

When ready, click on the Submit Query button. The next screen can take a few seconds to load, but hopefully it will look like this (above). If not, read the error message and check what you entered on the previous screen.

You can download three files, but the first one is the most useful. It has a filename ending in .sas7bdat. This is a SAS dataset.

Opening or converting the results

If you have SAS installed on your computer, you can open the results file directly in that (double-click the file) then choose File > Export and Files of type: Microsoft Excel Workbooks.

Eventus 6 - Open output in SAS

Output from Eventus opened in SAS Enterprise Guide which can be exported to Microsoft Excel

If you have SPSS instead, you can import the results file then export it to Excel format or CSV.

  • To launch SPSS, go to Start > IBM SPSS Statistics > IBM SPSS Statistics 22 (or 20).
  • To import in SPSS, use File > Open > Data and then Files of type: SAS (*.sas7bdat).
  • To export it from SPSS, use File > Save As and then Save as type: Excel 2007 through 2010 (*.xlsx) or Comma delimited (*.csv).
Eventus 6 - SPSS and Excel

Output from Eventus opened in IBM SPSS Statistics and exported to Microsoft Excel


This guide should help you to upload data to and download data from Eventus, ready for the next stage of your study. If you found this useful, or if you need further help explaining these steps, please leave a comment below.

Updated 20 January 2016 with some new screenshots for WRDS and SAS.

Categories: Data Analysis Tags: , ,