Reshaping wide and long data in Stata
Downloading data from a platform like WRDS can give you the data you need but not in the layout you desire. This post shows you how to change from long data to wide data; that is, from a stacked column of companies to a column for each company, using date as unique identifier. It requires the use of Stata, available on PCs at the Library Finance Zone. (Version 14 was used in this test but it should work just as well on others.)
In Excel, keep only the essential columns (you can add the others back later using VLOOKUP). In this example, some kind of ID (like PERMNO), Date and Return.The first row of the sheet contains the variable names. Save this file as type CSV (Comma delimited) *.csv, ignoring any warnings about features or multiple sheets. Close the file.
Open Stata, click File > Import > Text data (delimited, *.csv). Click the Browse button to select your CSV file the click OK. You will see a command like the following was performed: import delimited P:\Reshape\long.csv. You can look at a summary of the data by typing list in the Command window.
Performing the command
The reshape command in Stata works in two modes.
- Wide: instead of repeating dates and companies in long columns, create a new column for return for each company so that there is only one row for each date. This is also called long-to-wide. (You could instead have a column for each year with one row per company.)
- Long: the reverse process, also called wide-to-long.
Here we are going from wide-to-long, with Date as the unique variable (i). We will lose the current Return column. Each new column will be Return for each company ID (j), with a name that is Return followed by that ID, such as Return100500, Return222622. The command to type in the Command window is:
reshape wide return, i(date) j(id)
After this is typed, you will get a summary of what just happened. If the text is in red, an error occurred, often if there are other variables that were not mentioned in the reshape command. You can list more than one variable to be reshaped (the command might then be reshape wide return volume price, i(date) j(id) for example).
Again, you can type the list command to see the data.
You probably wish to export the data to Excel format to continue with your work. To do this, click File > Export > Data to Excel spreadsheet (*.xls, *.xlsx). Click the Save as button to choose where you wish to save to, and if you want older (pre-2007) or current Excel format. Tick the box “Save variable names to first row in Excel file” and press OK. You will see a command generated like this: export excel using “P:\Reshape\reshaped.xlsx”, firstrow(variables)
You can now open the new file in Excel to see the reshaped data set.