Archive

Posts Tagged ‘python’

Extending your Excel worksheet formulas via Python, for Capital IQ

17 March 2017 3 comments

Downloading bulk data from financial databases can be a tiring process. You may need panel data structured in a very specific way for analysis in a tool like Stata, and the data provider provides data in a different and limited way; perhaps for one company or one year at a time, leaving you to build it into the shape you need. Each product, like Bloomberg, Datastream, ThomsonONE.com and Capital IQ, provide a different Excel interface. Sometimes you can build a formula for each data item for each year for each company, then fill or copy it across or down to save time. This extended blog post will show you how to do this for the Capital IQ Office Plug-in, and how to save time extending it for panel data using a short Python script.

What kind of data are we looking at?

First, let’s consider the types of variables in turn. In no particular order:

  1. Entities: That is, Companies, Funds, Equities, Indices, Persons…
  2. Time: Usually one of Years, Quarters, Months, Days…
  3. Data items/types: Time series or line items such as Total assets, Turnover, Closing price…

If we consider any one variable or dimension at a time, this is easy; it’s just a list of identifiers or numbers.

  1. AAPL, MSFT, GOOG
  2. 1995, 1996, 1997
  3. TOTAL_ASSETS, NUM_EMPLOYEES, CREDIT_RATING

We can easily create a table to list any two of these dimensions simply.

For example: (fictional) figures for one company:

Item\Year 2001 2002 2003
Total assets ($m) 6.3 7.2 8.9
Net profit ($m) 2.3 4.7 6.1

Or, (fictional) figures for Year 2004:

Company\Item Employees (thousands) Credit rating
AAPL 50 AAA
MSFT 65 AA
GOOG 14 AAA

But if we want to look at three at once, it gets more complicated. More on that later.

Getting data from Capital IQ Office Plug-in

This post assumes you are familiar with using Capital IQ and its Excel add-in. If not, search for other posts on Capital IQ or read the hand-out guides we have written before proceeding.

In short, you can use an Excel formula to pull data from the Capital IQ servers to your spreadsheet. These can be created in the Formula Bulider menu, or typed out manually. For example, to get total assets, the formula will be =CIQ(“IQ18671”, “IQ_TOTAL_ASSETS”, “FY2012”) if the company identifier is IQ18671 and the selected date is financial year 2012. [Note that the double-quote characters appear ‘curved’ here in WordPress and you’ll need to retype them clean in Excel.]

Tip: The data retrieved will only load on a PC with the Capital IQ Excel add-in set up; you will need to copy then paste-as-values, or save as CSV to keep the data for use elsewhere.

Using the fill down/across command

The formula includes several arguments — the options separated by commas — which are “hard coded” – the values for which company, date or item are included as text strings.

Basic Capital IQ formula

The CIQ formula in the yellow cell is shown shown in the formula editor. It references the company in A2, the item “IQ_TOTAL_ASSETS” and the date “FY2012”.

You can replace these arguments with cell references, to a cell that contains the same text, or part of it.

Replacing the item with a cell reference

The yellow cell’s formula now includes a cell reference B$1 for the item type. Note the dollar sign locks the row number when we fill the formula down.

This will allow you to use the Excel feature fill down or fill across, to extend the formula down or right.

Fill the formula down and right

The cell reference for date follows the text “FY” then an ampersand, as the cell B2 contains just the year number. Note the dollar sign in B$2, and now also in the company cell reference $A3, so that the formula fills right correctly.

It will work for any argument, including item or date, although you need to take more care with date. In the example above we use year, and Capital IQ expects the year number to follow the letters “FY” for Financial Year (or “CY” for Calendar Year). We must therefore use the ampersand ‘&’ character to join the strings.

Adding the third dimension

So far we have looked tables to represent two of our three dimensions. There are six permutations for this:

  • Companies down and dates across, for one item
  • Companies down and items across, for one date
  • Dates down and items across, for one company
  • and each of those transposed the other way around.

So if we want to show all three, we have to essentially stack more tables across or down. There are many more ways to arrange the data like this, such as:

  • Companies down, items across, dates across
  • Companies down, date across, items across

It might be easier to show these in a dummy table:

Company ID Item1-2001 Item2-2001 Item1-2002 Item2-2002 Item1-2003 Item2-2003
comp1  a  b  c  d  e  f
comp2  g  h  i  j  k  l

Or

Company ID Item1-2001 Item1-2002 Item1-2003 Item2-2001 Item2-2002 Item2-2003
comp1  a  c  e  b  d  f
comp2  g  i  k  h  j l

You might prefer to stack into panels as below (this is like reshaping from wide to long in Stata). We have to repeat the first column (company ID) for each date… do not be tempted to use merged cells here! If you did, you won’t be able to use data filters, Excel tables or correctly export to other programs.

Company ID Year Item1 Item2
comp1 2001  a  b
comp1 2002  c  d
comp1 2003  e  f
comp2 2001  g  h
comp2 2002  i  j
comp2 2003  k  l

Often we start with a list of thousands of companies. It would be very time consuming to manually insert extra lines for each year after each company! This is where the script comes in…

Extending the process with a Python script

In order to perform a predictable and repetitive task, it can be a good idea to write a short script instead. The method below uses a combination of Excel and an external programming language, Python, and some human intervention. (I assume you have Python installed – the University PCs tend to have it, sometimes as part of SPSS.)

  1. Start with a column of the company IDs in Excel, and save this as a text file. You can do this by copying the column and pasting it into Notepad, remembering to delete any header rows. (You can instead paste it to a new workbook in Excel and save as Text (Tab delimited) (*.txt).) The file must be named ids.txt for the script to work.

    List of company IDs

    Save the list of IDs to a text file ids.txt

  2. Now for the Python script, which is shown at the end of the post for you to copy. Copy and paste this into a text editor like Notepad, although Notepad++ is included on University of Manchester PCs and is much better. If the list of years {1995, 1996, 1997} is not right for you, change it here! Save this as script01.py or similar – note the file extension is .py and not .txt – this is a Python script. Save it to the same directory as the ID file.

    Python code, showing whitespace

    Notepad++ showing Python code (you can copy-paste it from the end of this blog post), showing whitespace/tabs in orange. Python scripts must have consistent line indents, tabs or spaces.

  3. Run the script. To do this, open Windows Explorer, navigate to the folder you have your two new files in and hold Shift + Right-click then choose “Open command window here”. A black command window will open, type “python script01.py” (no quotes) and press enter. Hopefully no error message will show, you’ll get no message if it was successful.

    Run the script

    Shift+Right-click to get “Open command window here”, type “python script01.py” and press Enter. File template.txt is created.

  4. Find the newly created file, template.txt, and open it in Excel. There are several ways to open tab-delimited text files in Excel, any should do.
    Open .txt file in Excel, part 1

    Excel, open file, select type “Text Files (*.prn; *.txt; *.csv)” and choose template.txt. Accept any warning messages.

    Open .txt file in Excel - part 2

    This is a delimited text file, with Tab delimiters only.

  5. You can type the item names in C1, D1, E1 etc., such as “IQ_FINISHED_INV” and “IQ_TOTAL_ASSETS” (no quotes). You can type the template Capital IQ formula in C2, remembering to replace the literal arguments with cell references, and remembering to lock some of the references with dollar signs. For example, =CIQ($A2, C$1, “FY”&$B2)

    Fill the template columns and first formula

    Complete the green cell column headers (item codes). Fill in the yellow cell formula with correct cell references and locks, here it is =CIQ($A2, C$1, “FY”&$B2)

  6. Save the workbook as regular Excel file (*.xlsx). Now select the cell, hover the mouse over the bottom-right corner for the little black square icon, and drag the icon to the right to select that row in all the columns you need. The formula will copy across, and hopefully correctly. You may need to edit it and try again if the dollar sign locks went wrong. Now fill the formula down (just a few rows at first to test, then all the way down by double-clicking the black square icon).

    Final fill out

    Save before you start! Fill the formula right, then down a few rows, then further. Don’t just fill all the way to the bottom if there are thousands of rows, this might overload Capital IQ.

The Python code, which should have indents on the lines in each for loop:

fin = open('ids.txt')
fout = open('template.txt', 'w')
years = {"1995", "1996", "1997"}
# header
fout.write("ID\tYear\n")
#body
for line in fin:
   id = line.rstrip()
   for year in years:
       lineout = id + "\t" + year + "\n"
fout.write(lineout)
fin.close()
fout.close()

Moving on

  • The script is not the best example of Python, but it works! For example, the list of years should really be done better.
  • You could extend the script to include the items and Excel formulas, but for a one-off script, it is probably quicker to do some of the work in Excel by hand (using the fill function).
  • You could chose any other programming language over Python, including VBA which is built into Excel.
  • You could apply this technique to other databases with an Excel interface such as Bloomberg.

Related posts

Advertisements