Archive

Posts Tagged ‘import’

Importing data into SPSS Statistics (part 3)

7 May 2014 1 comment

SPSS logoThere are many ways to bring your data into IBM SPSS Statistics, for whatever manner of analysis or reorganisation you wish to perform. Perhaps you just want to open a very large text document that is too big for Excel. Please read Part 2 of this blog post before proceeding, which covers preparing an Excel document and the direct but limited way to import data into SPSS.

Approach 3: Read text data via Comma-Separated Values format

You may have data in a text file. If so, each line must represents a reading/case/observation, and the variables are delimited (separated) by a tab, comma or other character. The most common of these is the Comma-Separated Values (CSV) file, which may be saved with a *.CSV file extension or a *.TXT one.

Prepared CSV dataAn Excel worksheet can be saved in this format using the Save As command and the option “Save as type: CSV (Comma delimited) (*.csv)”. Doing this will save just the current worksheet and the current text formatting of the data, with a comma between each cell value, and with double quote marks around every cell value that itself contains a comma. Once saved, close the Excel document and dismiss any warnings, assuming you already have saved the document in Excel format as a backup. You can open a CSV file in Notepad or a similar text editor to check the format, but don’t just double-click the file or it will be imported straight back into Excel.

Now you are ready to bring the data into SPSS. Choose the File -> Read Text Data menu, and change the filter to “Files of type: Text (*.txt, *.dat, *.csv)”. Open your CSV file, and the Text Import Wizard will begin.

Text Import Wizard

1Text Import Wizard Step 1 2Text Import Wizard Step 2 3Text Import Wizard Step 3

The wizard asks questions about the format of your text file. It lets you save all the answers you give at the end as a format, so you can skip all the questions on subsequent imports. Step 1 simply asks if you already have a format file to load. If you don’t, just click Next.

Step 2 asks if your data is delimited (values have a comma or other character between them) or fixed width (the number of characters along each line determines where the value boundaries are). A CSV file is delimited. The next question asks if the first row contains variable names or not. The preview at the bottom will update with your answers, then click Next. Don’t worry if it doesn’t look totally right yet.

Step 3 asks about how many lines of data to read in. Usually the default values are correct here, but check with the preview to be sure, then click Next.

Text Import Wizard Step 4 5Text Import Wizard Step 5 6Text Import Wizard Step 6

If your data is delimited, Step 4 asks which delimiter characters you have, and if there are any text qualifiers (quote marks around the values). If you data has been exported from Excel to CSV, choose comma (and space), then double quote mark text qualifier. The preview should be looking better by now. Click Next.

Step 5 gives you the chance to name your variables and set which variable type (data format) each one is. (Comma is used if your number has commas for thousands marks.) You cannot change the number of decimal places — that had to be done in the formatting in Excel before exporting — but variable names that would not be valid in SPSS are automatically fixed. Click Next.

Step 6 is the final stage, which just offers the chance to save all your answers to a format file. Click Finish to see you data appear in SPSS, and remember to save immediately to the native SPSS Statistics format.

You can now add value labels and variable labels. One further thing to note if your data includes CUSIP codes (a type of US company identifier): if one of the entries contains the letter E with numbers and no other letters, it may get converted to a number in scientific format. This is a danger in Excel too, so it might be worth checking for this manually after the data has been imported.

Going further: Create a new query using Database Wizard

You may open from an ODBC database, such as a Microsoft Access file, but that is beyond the scope of this blog post.

Notes

IBM SPSS Statistics version 20 was used in this blog post, but the methods should apply to older and newer versions too. The data used in these examples comes from UK’s carbon footprint, 1997-2011, retrieved 3 April 2014.

Advertisements
Categories: Data Analysis Tags: , , ,

Importing data into SPSS Statistics (part 2)

24 April 2014 2 comments

SPSS logoThere are many ways to bring your data into IBM SPSS Statistics, for whatever manner of analysis or reorganisation you wish to perform. Here are a few approaches to consider, with some of their relative merits and shortcomings. Please read Part 1 of this blog post before proceeding, which covers opening an existing data source in SPSS Statistics format and typing or pasting in data.

Approach 2: Read text data directly from other formats

You can open data files created in other applications, from the File -> Open menu or the File -> Read Text Data menu. (If you spot any difference between these two, please comment below.) The file type choices include:

  • Text (*.txt, *.dat, *.csv) — tab, comma or otherwise-delimited
  • Excel (*.xls, *.xlsx, *.xlsm) — one sheet only
  • SAS (*.sas7bdat or others) — one data set
  • Stata (*.dta)

The plain text or Excel document may have variable names in the first row, but only the first row. It may instead have no variable names at all. It is worth preparing your source data file and checking that it has no surprises for SPSS like a variable type changing from 2 decimal places to no decimal places to text within the same column.

Preparing an Excel document

Excel original data layoutThis example uses UK carbon emissions data across difference streams from 1997 to 2011, starting in an Excel document with several sheets and some formatting.

  • The numerical data has 9 decimal places but is formatted to show none.
  • The variable names are split over two rows and contain space characters.
  • The data does not start in the top-left corner.

Excel file with prepared data layoutThe images to the right show how the data has been copied to a new workbook with one worksheet (tab).

  • The variable names are in the first row only, with no spaces or punctuation (although underscores and dots would be okay).
  • The data starts from the first column (which contains an identifier “year”), and from the second row.

Example: reading text directly from Excel (quick)

SPSS: CO2 import from ExcelIn, SPSS, choose the File -> Read Text Data menu and select your prepared Excel document. You can choose if the file has variable names in the first row or now, choose which sheet to read from, and optionally specify a cell range and maximum column character width for string (text) columns.

SPSS file imported from ExcelAfter clicking on OK, the data will immediately appear in SPSS (see right). The variable types were guessed by SPSS based on the content of the first row of actual data, and the number of decimal places is set to 1 for numerical data. Although you can change the number of decimal places in the Variable View tab, the data past the first place is lost. The identifier variable “year” has had one decimal place added too, which should be removed too.

This method does not allow you to change the variable type or properties, so consider the longer approach outlined in Part 3, going via a plain text file.

This post continues with Part 3.

Notes

IBM SPSS Statistics version 20 was used in this blog post, but the methods should apply to older and newer versions too. The data used in these examples comes from UK’s carbon footprint, 1997-2011, retrieved 3 April 2014.

See also: Research Financial: Using plain text files in Excel

 

Categories: Data Analysis Tags: , , ,

Importing data into SPSS Statistics (part 1)

4 April 2014 1 comment

SPSS logoThere are many ways to bring your data into IBM SPSS Statistics, for whatever manner of analysis or reorganisation you wish to perform. Here are a few approaches to consider, with some of their relative merits and shortcomings. This post is split over three parts.

Before you proceed, you should be at least slightly familiar with the main window in SPSS Statistics, the Data Editor. Specifically, there are two views as identified by the orange tab at the bottom-left of the screen: Data View and Variable View. The former has the variables in columns with observations/readings in rows; the latter has the variables in rows with their meta-data in columns.

SPSS Data Editor showing labels SPSS Data Editor tabs  SPSS Data Editor with Variables View

Best-case scenario: Open an existing data source in SPSS Statistics format

You might be fortunate enough to already have data in the native format to SPSS Statistics (*.sav). This is the format to choose when saving your data while working in SPSS.

Each variable has a Name, and that name cannot contain spaces, punctuation (except dots or underscores) and cannot begin with a number. In older versions of SPSS, variable names could only be 8 characters long. It is good practice to use more explanatory Labels with your variables as well as short-hand Names. This will help you if you come back to your data in the future and cannot remember it as well as you thought (it happens to everyone!) or if you pass on your data to somebody else.

Approach 1: Type in data, or copy and paste

You may create a new, blank document and save it in SPSS Statistics default format. Set up your variables carefully, including the variable type (e.g. Number) and number of decimal places, before you type any data in. This is especially important if you choose to copy and paste your data in from another source such as a spreadsheet, or you risk your data being rounded down to integers.

SPSS Data Editor showing valuesRemember to use a numerical variable type wherever possible, even if your data appears to be in labelled categories such as Yes/No or UK/Europe/World. SPSS works best with numbers, so record your categories as integers (e.g. 0/1 for No/Yes) then assign value labels once it is in SPSS (that’s the Values column in the Variable View).

To toggle the display of the data category labels and the numbers behind them, go to View -> Value Labels when in the Data View tab.

This post continues with Part 2.

Notes

IBM SPSS Statistics version 20 was used in this blog post, but the methods should apply to older and newer versions too. The data in the screenshots come from a British Crime Survey, 2010, and were prepared by The Cathy Marsh Centre for Census and Survey Research.

Categories: Data Analysis Tags: , , ,