Archive

Posts Tagged ‘Statistics’

National statistics office websites across the world

8 October 2014 2 comments

National statistics website around the worldGovernments and authority bodies around the world collect population and economic data and make it available for the public online.

Here is a compilation of some of those websites, most of which have an English language option. If you find one for a country that is not in the list below, please suggest it in the comments.

Country/Region National statistics website name Website name in English (where available)
Austria Statistik Austria Statistics Austria
Belgium Statistics Belgium Statistics Belgium
Bosnia and Herzegovina Federalni zavod za statistiku Federal Office of Statistics
Catalonia (Spain) Institut d’Estadística de Catalunya Statistical Institute of Catalonia
Chile Registro Civil e Identificación
Croatia imeHrvatsko.net
Czech Republic Český statistický úřad Czech Statistical Office
Denmark Danmarks Statistik Statistics Denmark
Finland Befolkningsregistercentralen Population Registration Centre
France Institut national de la statistique et des études économiques National Institute of Statistics and Economic Studies
Hungary Közigazgatási és Elektronikus Közszolgáltatások Központi Hivatala Central Office for Administrative and Electronic Public Services
Iceland Hagstofa Íslands Statistics Iceland
Ireland Central Statistics Office Central Statistics Office
Italy Istat Istat
New Zealand The Department of Internal Affairs The Department of Internal Affairs
Northern Ireland Northern Ireland Statistics and Research Agency Northern Ireland Statistics and Research Agency
Norway Statistisk sentralbyrå Statistics Norway
Poland Ministerstwo Spraw Wewnętrznych Ministry of Interior and Administration
Scotland National Records of Scotland National Records of Scotland
Slovenia Statistični urad Republike Slovenije Statistical Office of the Republic of Slovenia
Spain National Statistics Institute of Spain Instituto Nacional de Estadistica
Sweden Statistiska centralbyrån Statistics Sweden
Turkey Türkiye İstatistik Kurumu Turkish Statistical Institute
United Kingdom Office for National Statistics Office for National Statistics
United States Social Security Administration Social Security Administration
Advertisements
Categories: Economic data Tags:

Importing data into SPSS Statistics (part 3)

7 May 2014 1 comment

SPSS logoThere are many ways to bring your data into IBM SPSS Statistics, for whatever manner of analysis or reorganisation you wish to perform. Perhaps you just want to open a very large text document that is too big for Excel. Please read Part 2 of this blog post before proceeding, which covers preparing an Excel document and the direct but limited way to import data into SPSS.

Approach 3: Read text data via Comma-Separated Values format

You may have data in a text file. If so, each line must represents a reading/case/observation, and the variables are delimited (separated) by a tab, comma or other character. The most common of these is the Comma-Separated Values (CSV) file, which may be saved with a *.CSV file extension or a *.TXT one.

Prepared CSV dataAn Excel worksheet can be saved in this format using the Save As command and the option “Save as type: CSV (Comma delimited) (*.csv)”. Doing this will save just the current worksheet and the current text formatting of the data, with a comma between each cell value, and with double quote marks around every cell value that itself contains a comma. Once saved, close the Excel document and dismiss any warnings, assuming you already have saved the document in Excel format as a backup. You can open a CSV file in Notepad or a similar text editor to check the format, but don’t just double-click the file or it will be imported straight back into Excel.

Now you are ready to bring the data into SPSS. Choose the File -> Read Text Data menu, and change the filter to “Files of type: Text (*.txt, *.dat, *.csv)”. Open your CSV file, and the Text Import Wizard will begin.

Text Import Wizard

1Text Import Wizard Step 1 2Text Import Wizard Step 2 3Text Import Wizard Step 3

The wizard asks questions about the format of your text file. It lets you save all the answers you give at the end as a format, so you can skip all the questions on subsequent imports. Step 1 simply asks if you already have a format file to load. If you don’t, just click Next.

Step 2 asks if your data is delimited (values have a comma or other character between them) or fixed width (the number of characters along each line determines where the value boundaries are). A CSV file is delimited. The next question asks if the first row contains variable names or not. The preview at the bottom will update with your answers, then click Next. Don’t worry if it doesn’t look totally right yet.

Step 3 asks about how many lines of data to read in. Usually the default values are correct here, but check with the preview to be sure, then click Next.

Text Import Wizard Step 4 5Text Import Wizard Step 5 6Text Import Wizard Step 6

If your data is delimited, Step 4 asks which delimiter characters you have, and if there are any text qualifiers (quote marks around the values). If you data has been exported from Excel to CSV, choose comma (and space), then double quote mark text qualifier. The preview should be looking better by now. Click Next.

Step 5 gives you the chance to name your variables and set which variable type (data format) each one is. (Comma is used if your number has commas for thousands marks.) You cannot change the number of decimal places — that had to be done in the formatting in Excel before exporting — but variable names that would not be valid in SPSS are automatically fixed. Click Next.

Step 6 is the final stage, which just offers the chance to save all your answers to a format file. Click Finish to see you data appear in SPSS, and remember to save immediately to the native SPSS Statistics format.

You can now add value labels and variable labels. One further thing to note if your data includes CUSIP codes (a type of US company identifier): if one of the entries contains the letter E with numbers and no other letters, it may get converted to a number in scientific format. This is a danger in Excel too, so it might be worth checking for this manually after the data has been imported.

Going further: Create a new query using Database Wizard

You may open from an ODBC database, such as a Microsoft Access file, but that is beyond the scope of this blog post.

Notes

IBM SPSS Statistics version 20 was used in this blog post, but the methods should apply to older and newer versions too. The data used in these examples comes from UK’s carbon footprint, 1997-2011, retrieved 3 April 2014.

Categories: Data Analysis Tags: , , ,

Importing data into SPSS Statistics (part 2)

24 April 2014 2 comments

SPSS logoThere are many ways to bring your data into IBM SPSS Statistics, for whatever manner of analysis or reorganisation you wish to perform. Here are a few approaches to consider, with some of their relative merits and shortcomings. Please read Part 1 of this blog post before proceeding, which covers opening an existing data source in SPSS Statistics format and typing or pasting in data.

Approach 2: Read text data directly from other formats

You can open data files created in other applications, from the File -> Open menu or the File -> Read Text Data menu. (If you spot any difference between these two, please comment below.) The file type choices include:

  • Text (*.txt, *.dat, *.csv) — tab, comma or otherwise-delimited
  • Excel (*.xls, *.xlsx, *.xlsm) — one sheet only
  • SAS (*.sas7bdat or others) — one data set
  • Stata (*.dta)

The plain text or Excel document may have variable names in the first row, but only the first row. It may instead have no variable names at all. It is worth preparing your source data file and checking that it has no surprises for SPSS like a variable type changing from 2 decimal places to no decimal places to text within the same column.

Preparing an Excel document

Excel original data layoutThis example uses UK carbon emissions data across difference streams from 1997 to 2011, starting in an Excel document with several sheets and some formatting.

  • The numerical data has 9 decimal places but is formatted to show none.
  • The variable names are split over two rows and contain space characters.
  • The data does not start in the top-left corner.

Excel file with prepared data layoutThe images to the right show how the data has been copied to a new workbook with one worksheet (tab).

  • The variable names are in the first row only, with no spaces or punctuation (although underscores and dots would be okay).
  • The data starts from the first column (which contains an identifier “year”), and from the second row.

Example: reading text directly from Excel (quick)

SPSS: CO2 import from ExcelIn, SPSS, choose the File -> Read Text Data menu and select your prepared Excel document. You can choose if the file has variable names in the first row or now, choose which sheet to read from, and optionally specify a cell range and maximum column character width for string (text) columns.

SPSS file imported from ExcelAfter clicking on OK, the data will immediately appear in SPSS (see right). The variable types were guessed by SPSS based on the content of the first row of actual data, and the number of decimal places is set to 1 for numerical data. Although you can change the number of decimal places in the Variable View tab, the data past the first place is lost. The identifier variable “year” has had one decimal place added too, which should be removed too.

This method does not allow you to change the variable type or properties, so consider the longer approach outlined in Part 3, going via a plain text file.

This post continues with Part 3.

Notes

IBM SPSS Statistics version 20 was used in this blog post, but the methods should apply to older and newer versions too. The data used in these examples comes from UK’s carbon footprint, 1997-2011, retrieved 3 April 2014.

See also: Research Financial: Using plain text files in Excel

 

Categories: Data Analysis Tags: , , ,

Importing data into SPSS Statistics (part 1)

4 April 2014 1 comment

SPSS logoThere are many ways to bring your data into IBM SPSS Statistics, for whatever manner of analysis or reorganisation you wish to perform. Here are a few approaches to consider, with some of their relative merits and shortcomings. This post is split over three parts.

Before you proceed, you should be at least slightly familiar with the main window in SPSS Statistics, the Data Editor. Specifically, there are two views as identified by the orange tab at the bottom-left of the screen: Data View and Variable View. The former has the variables in columns with observations/readings in rows; the latter has the variables in rows with their meta-data in columns.

SPSS Data Editor showing labels SPSS Data Editor tabs  SPSS Data Editor with Variables View

Best-case scenario: Open an existing data source in SPSS Statistics format

You might be fortunate enough to already have data in the native format to SPSS Statistics (*.sav). This is the format to choose when saving your data while working in SPSS.

Each variable has a Name, and that name cannot contain spaces, punctuation (except dots or underscores) and cannot begin with a number. In older versions of SPSS, variable names could only be 8 characters long. It is good practice to use more explanatory Labels with your variables as well as short-hand Names. This will help you if you come back to your data in the future and cannot remember it as well as you thought (it happens to everyone!) or if you pass on your data to somebody else.

Approach 1: Type in data, or copy and paste

You may create a new, blank document and save it in SPSS Statistics default format. Set up your variables carefully, including the variable type (e.g. Number) and number of decimal places, before you type any data in. This is especially important if you choose to copy and paste your data in from another source such as a spreadsheet, or you risk your data being rounded down to integers.

SPSS Data Editor showing valuesRemember to use a numerical variable type wherever possible, even if your data appears to be in labelled categories such as Yes/No or UK/Europe/World. SPSS works best with numbers, so record your categories as integers (e.g. 0/1 for No/Yes) then assign value labels once it is in SPSS (that’s the Values column in the Variable View).

To toggle the display of the data category labels and the numbers behind them, go to View -> Value Labels when in the Data View tab.

This post continues with Part 2.

Notes

IBM SPSS Statistics version 20 was used in this blog post, but the methods should apply to older and newer versions too. The data in the screenshots come from a British Crime Survey, 2010, and were prepared by The Cathy Marsh Centre for Census and Survey Research.

Categories: Data Analysis Tags: , , ,

Quandl – a search engine for time-series datasets

5 November 2013 4 comments

QuandlHomeQuandl has indexed over 7,000,000 time-series datasets from over 400 sources. All these datasets are open and free. The long term goal of Quandl is to make all the numerical data on the internet easy to find and easy to use (Quandl, 2013a).

You have to like a web site that gives you quick and easy access to oil prices, exchange rates, unemployment, world stock market indices, and Fama French factors. The screenshot below is from spot price for Brent crude oil (US Department of Energy, 2013)

Try scrolling down the Quandl home page to the “browse pages” section to get an overview of the coverage, and also look at the Quandl data sources page (Quandl, 2013b).

When using Quandl, like other web resources, you need to do a little work to check that the quality of the data is appropriate.

For example the oil price in the screenshot below is from the US Department of Energy, a reputable source, and if you want you could double-check with other sources – see Oil price – historical data (January 2011).  In contrast trying to find non-US stock prices can be tricky – for Unilever you tend to get the prices for US listings rather than the primary listing in Amsterdam or London (Unilever is dual-listed) and the NYSE Euronext figures jump around suggesting some conversion error (NYSE Euronext, 2013)

Oil price graph (Brent spot) (click to expand)

Oil price graph (Brent spot) (click to expand)

The “Data Sources” page (Quandl, 2013b) gives a good overview of the range of free datasets available through Quandl. The following list some of the major sources and some that look of particular interest to Business Research Plus readers (approximate dataset counts in brackets).

  • United Nations (2,100,000)
  • World bank (792,000)
  • Eurostat (321,000)
  • Federal Reserve Economic Data (61,000)
  • US Department of Energy (440,000)
  • UK Office of National Statistics (14,000)
  • Ken French (U Dartmouth) (25)

Quandl’s search often returns a long list of results and further filtering or revision may be needed. If you want to get 3 month treasury bill data from the Bank of England then a search “uk risk free rate” or “uk treasury bill” will not help because the series name is “End Month Level Of Discount Rate, 3 Month Treasury Bills, Sterling” (Bank of England, 2013).

Acknowledgements

Quandl was first mentioned on a couple of other business library blogs earlier this year (Datasets in Quandl, reposted Feb 2013) but at that time I didn’t explore further – thanks to the Warwick librarian who prompted this post – Quandl- freely available time-series data.

References

Bank of England (2103) End Month Level Of Discount Rate, 3 Month Treasury Bills, Sterling. Available at http://www.quandl.com/BOE-Bank-of-England/IUMAJNB-End-Month-Level-Of-Discount-Rate-3-Month-Treasury-Bills-Sterling  (Accessed: 05 November 2013)

NYSE Euronext (2013) NYSE Euronext – UNILEVER OS (UNIA). Available at http://www.quandl.com/NYX-NYSE-Euronext/XAMS_UNIA-UNILEVER-OS-UNIA (Accessed: 05 November 2013)

Quandl (2013a) Quandl > About > Overview. Available at http://www.quandl.com/about/overview (Accessed: 04 November 2013)

Quandl (2013b) Quandl > Data > Data Sources. Available at http://www.quandl.com/data/sources (Accessed: 05 November 2013)

US Department of Energy (2013) Europe Brent Crude Oil Spot Price FOB (Dollars per Barrel). Available at http://www.quandl.com/DOE-US-Department-of-Energy/RBRTE-Europe-Brent-Crude-Oil-Spot-Price-FOB (Accessed: 05 November 2013)

Researching Regional Intelligence: European/International Regional Data

We recently posted on how to find UK Regional data, see  Researching Regional Intelligence:  UK Regional Data (31/03/2011),  and have subsequently been asked how to find similar data for European and International regions.

Europe: Eurostat
Eurostat, the European Union’s official statistical agency produces an annual publication “Eurostat Regional Yearbook” (available in pdf) plus  a wide range of regional datasets available to search, view and download free of charge.

Search Tip: To access the publication and the datasets  select the “Statistics Tab”, then “Regions and Cities”. There are options to view the Regional Yearbook plus access to the main data tables and more detailed datasets for regions and cities. See our quick video for “Finding European Regional Datasets“.

North America: FedStats
The US Statistical agency Fedstats produces a wealth of detailed regional data, see the sections for “MapStats” and various datasets by theme via the “Statistics by Geography”.

International: Official Government Statistical Agencies
To find regional statistics for other international countries try the official Government Statistical Agency for the specific country. Data will vary from country to country but you will often find a great deal of data published online and/or contact the agency for further information. You can locate official statistical agencies using a simple search within any search engine eg: “Australia and Official Statistics” or alternatively you can use directories collated by the United Nations or US Bureau of Statistics:


Researching Regional Intelligence: UK Regional Data

31 March 2011 2 comments

The John Rylands Univesity Library  subscribes to a range of data sources providing national economic and socio-demographic data. Databases such as GMID (Global Market Information Database), ESDS International, Global Insights and Global Financial data provide a wealth of datasets providing a valuable insight into economic and social trends from a country perspective. Regional data can, however, be a little trickier to find but there are many freely available sources you can use to research UK regional data.

Regional Trends: Office of National Statistics
Regional Trends in a comprehensive annual publication collated by the Office of National Statistics (the UK Government Statistical Office) and is the most authoritative publication for regional data. It provides detailed demographic, social, industrial and economic statistics for the sub-regions of the UK (economy, education, environment, health, housing, labour, lifestyles, population, transport and crime).   Selecting either the latest online or pdf report provides interactive access to related analysis and datasets. Navigating the data can be confusing, see the following videos demonstrate three key ways to finding data:

ONS Neighbourhood Statistics
You can drill even further for local data using the Office of National Statistics Neighbourhood statistics site. This site provides an ever increasing range of small area data allowing you to paint a picture of life in local communities. It is possible to locate detailed data using the “Area search” (enter an area name or a postcode) or a snapshot using the “Neighbourhood Summary” search. Don’t know the postcode or area name? use the “Topic Search” to find predefined interactive datasets by theme.  As local data is more difficult to research and obtain much of the data is sourced from the UK Census (taken every 10 years) although recent data is available depending on the dataset.  Note: using this website does require some knowledge of local authority/ward area names or postcode areas and you may need to do a little preparatory work to identify your area before searching.

Regional Development Agencies (RDAs)
RDAs are regional government bodies established to encourage regional growth and investment. They generally have research units, (eg North West Development Agency’s North West Regional Intelligence Unit),  collating facts and figures for the region which are published on their websites, (where on the website will vary but look for links to publications, economic statistics, research and statistics, economic strategy under various headings). It may take a little time to find but can be worth the effort!

UK City Councils: Economic Development Units
Finally another way of finding local regional data is to try the local City or Borough Council. Most councils will produce an economic strategy or have an Economic Development Unit collating local area statistics, (an example is Manchester City Council’s Corporate Research & Intelligence Unit). Note: This can be a more time consuming route as you need to consult each individual website, the layout of which will vary from council to council and it is not always guaranteed data will be published online. Details of UK councils can be found online from the UK Government website.

Knowing where to start to find high quality data from external websites can be a daunting task, why not try our Delicious page providing details of useful web sites evaluated by our expert staff. You can access the site via the MBS Library Website or directly via http://www.delicious.com/mbslibrary.  Alternatively contact us and speak to a member of our expert team who can help and advise you find relevant resources for your research:

Telephone: 0161 275 6507
Email: libdesk@mbs.ac.uk