Downloading data from a platform like WRDS can give you the data you need but not in the layout you desire. This post shows you how to change from long data to wide data; that is, from a stacked column of companies to a column for each company, using date as unique identifier. It requires the use of Stata, available on PCs at the Library Finance Zone. (Version 14 was used in this test but it should work just as well on others.)
In Excel, keep only the essential columns (you can add the others back later using VLOOKUP). In this example, some kind of ID (like PERMNO), Date and Return.The first row of the sheet contains the variable names. Save this file as type CSV (Comma delimited) *.csv, ignoring any warnings about features or multiple sheets. Close the file.
Open Stata, click File > Import > Text data (delimited, *.csv). Click the Browse button to select your CSV file the click OK. You will see a command like the following was performed: import delimited P:\Reshape\long.csv. You can look at a summary of the data by typing list in the Command window.
Performing the command
The reshape command in Stata works in two modes.
- Wide: instead of repeating dates and companies in long columns, create a new column for return for each company so that there is only one row for each date. This is also called long-to-wide. (You could instead have a column for each year with one row per company.)
- Long: the reverse process, also called wide-to-long.
Here we are going from wide-to-long, with Date as the unique variable (i). We will lose the current Return column. Each new column will be Return for each company ID (j), with a name that is Return followed by that ID, such as Return100500, Return222622. The command to type in the Command window is:
reshape wide return, i(date) j(id)
After this is typed, you will get a summary of what just happened. If the text is in red, an error occurred, often if there are other variables that were not mentioned in the reshape command. You can list more than one variable to be reshaped (the command might then be reshape wide return volume price, i(date) j(id) for example).
Again, you can type the list command to see the data.
You probably wish to export the data to Excel format to continue with your work. To do this, click File > Export > Data to Excel spreadsheet (*.xls, *.xlsx). Click the Save as button to choose where you wish to save to, and if you want older (pre-2007) or current Excel format. Tick the box “Save variable names to first row in Excel file” and press OK. You will see a command generated like this: export excel using “P:\Reshape\reshaped.xlsx”, firstrow(variables)
You can now open the new file in Excel to see the reshaped data set.
A wonderful post from Research Financial again, this time giving a thorough worked example of using Stata to reshape Datastream data. Stata and Datastream are available at The University of Manchester Library in the Finance Zone.
The past few weeks I have been learning about and working with Stata. This program can handle a lot of data and uses commands to edit data or analyse it. A sequence of commands can be saved in .do files and then rerun as a script. There are many commands available and one of them is very handy when it comes to changing data from columns to rows. It is similar to the transpose option that Microsoft Excel offers for quick changes. It is the reshape command.
In this blog post I will use the reshape command to change Datastream data as an example. Similar work can be done for other downloads from databases like Amadeus or Bankscope.
For downloads it can take a bit of work to change the data and rework it before Stata can be used to merge it with other data. In the case of…
View original post 674 more words
There are times when the database query doesn’t match the data that you are trying to obtain. For example, you have a list of X companies and for each company the date of an event E (e.g. a merger) but the database query has to be in terms of a list of X companies and a date range Ya-Yb.
WRDS is a typical example – you have a list of 100 gvkey – year pairs where you want some accounting data from Compustat NA (North America). The company identifier gvkey is used to identify companies in Compustat so you can use it in WRDS Compustat queries.
Option 1 – Multiple WRDS requests
Sort your list by the year and create a company list for each year, say N lists. You then make N WRDS-Compustat requests using the relevant company list and year. Finally you combine the N results.
This option is good if you have to make a few requests (i.e N is small)
Option 2 – One WRDS request and select the results you want
Calculate the range of years that you need. You then make one WRDS-Compustat request covering all these years.
If you have 1990-2010 you could now have a result set with 2100 observations (21 years times 100 companies) and you need to select the 100 relevant ones.
You can do this selection using the Merge command in a Statistics package (the following uses Stata as an example)
- Decide on the variable names in your WRDS results that are going to identify the observations of interest (for example gvkey and fyear)
- If necessary, Convert the WRDS results into a Stata dataset and save this.
- Create a Stata dataset of your 100 gvkey-year pairs using the same variable names as in your WRDS results dataset (e.g. gvkey and fyear).
(If necessary you may need to note and eliminate duplicates from this dataset – Stata commands duplicates list and drop duplicates)
- Merge your gvkey-year dataset with your WRDS results:
merge gvkey fyear using “P:\Projects\BMAN70162\acqwrds.dta”, unique sort
(Note “P:\Projects\BMAN70162\acqwrds.dta” is the file saved at step 2)
- You should have one match for each observation in your gvkey-year dataset. This is indicated by the new Stata variable _merge = 3.
Observations with _merge = 1 indicate there was no matching entry in the WRDS result dataset – this might require further investigation.
- Observations with _merge = 2 are all the extra results you want to eliminate.
This technique does not depend on Stata – other Statistics packages provide similar merge functions but the detail of the commands will vary. (Indeed, it is probably possible to do this in Excel.)