Bangor University SPSS

By | 12th April 2017

bangor university

Bangor University SPSS

This page will show you how to manage data in SPSS: how to create a data file, name and label variables, deal with missing values, compute new variables based on existing variable scores and recode scores. The instructions on this page are for SPSS Version 9.0 or lower. Newer versions (10.0 and higher) are different for some procedures. To see instructions for using these newer versions click here.

The data used to illustrate these procedures are questionnaire-based and taken from a study conducted by students taking a Health Behaviours module. The fact that the data are from a questionnaire makes no difference to the procedures. You would do exactly the same whatever the nature of your data; they are only numbers after all!

For the study, students generated a set of questionnaire items to measure the various constructs of the theory of planned behaviour aimed at predicting avoidance of over-exposure to the sun. To keep things simple for this example, we will only use two of the subscales of the questionnaire: behavioural beliefs which are peoples’ perceptions of the consequences of engaging in the behaviour, and perceived behavioural control, which is peoples’ perception of the degree of control they have over the behaviour. In addition I have included the variables age and sex. 315 participants completed the questionnaire.

Each of the subscales comprises four questionnaire items which are scored on a five-point scale. The items follow in the order in which they appear in the questionnaire (although they are intermingled with the other items not included in this example):

Strongly disagree
Strongly agree
1Too much sunshine can lead to skin cancer
1
2
3
4
5
2I could easily avoid overexposure to the sun if I needed to
1
2
3
4
5
3Overexposure to the sun causes premature aging of the skin
1
2
3
4
5
4I find it difficult to avoid getting too much sun when the weather is nice
1
2
3
4
5
5Too much sun can damage your eyes
1
2
3
4
5
6I don’t find it easy to avoid overexposure to the sun
1
2
3
4
5
7Overexposure to the sun is not that bad for your health
1
2
3
4
5
8Following the experts’ advice on avoidance of too much sun is easier said than done
1
2
3
4
5

Items 1, 3, 5, and 7 are behavioural belief items whilst 2, 4, 6 and 8 are perceived behavioural control items.

Note that some of these items are keyed in a different direction to others. That is, for the items highlighted in yellow a high score (strongly agree) means more of the property being measured while for the others a high score means less. This is often the case with questionnaire items and is designed to prevent extreme response biases by the participants (that is, circling scores at one end of the scale or the other for all the items).

When we have items keyed in different directions we have to decide which way we want them scored. Although it does not really matter which way round we do it, in this case it makes sense for high scores to indicate stronger behavioural beliefs about the harmful effects of the sun and greater perceptions of control. For behavioural beliefs, items 1, 3 and 5 are already keyed in the right direction. For perceived behavioural control only item 2 is keyed in the right direction. For the other items we will have to recode the scores so that high scores are changed to low scores and vice versa. We will look at how to do this later on.

Creating the data file

Having collected the data, the first step is to create the data file in SPSS. To do this, open SPSS and you will see a blank spreadsheet ready for data input:

We want to enter the data in the order it appears in the questionnaire: age, sex, then the items 1 to 8. Each variable will be in a column and each respondent’s scores (called cases in SPSS) will be in a row. So the file will have 10 columns and 315 rows. Each data point is entered into a cell in the file.

Defining the data

You could start entering data now, but it makes more sense to first define the data so that it is easier to keep track of where you are as you enter it. Defining the data involves (at least) giving each variable a name and specifying missing values.

Missing values

Missing values are what they sound like: data points for which you have no score. For example, some participants might not have turned up for a data collection session but you still have other data from them that you want to use. So you need to enter the data you do have whilst taking account of missing data points. With questionnaires, people often fail to complete one or more items. This may be purely accidental, in which case the missing data points are referred to as missing at random because there is no systematic reason for their omission. Sometimes they may deliberately miss an item because they do not want to complete it for some reason. If lots of participants fail to complete a particular item it suggests there is something wrong with it; it may be ambiguous or perceived as too sensitive or whatever. Such systematic missing data points are more problematic than data missing at random because they mean that you have to do something about the offending item. Data missing at random, however, can essentially be ignored (actually it’s not that simple but this issue goes beyond what I want to cover in this lesson).

You can simply leave cells with missing data points blank in the SPSS data file. In this case, SPSS inserts a full stop in the cell to indicate missing values and they are referred to as system missing values. Alternatively, you can specify a code number that represents missing values. This is called a user-defined missing value. This gives you more control over the data and subsequent computations and analyses and allows you to maximise the use of all the data you have worked so hard to collect (more on this below).

So, where there are missing data, we assign a code number that will designate a missing value. This number must be a value that cannot appear in the data for that variable. If it is, then any other cases that gave that value would be treated as having missing values! For example, the questionnaire data in our example can only take values from 1 to 5, so any other number would do for specifying missing values. Age, though, could take on lots more values so we would need to assign a number that cannot be the age of any of our participants. It makes things easier if you assign the same number to missing values for all the variables. I nearly always use either 99 or 999, since those values do not appear in the sort of data I normally collect. In this case, we’ll use 99; there are no 99 year olds in the sample that provided these data!

Naming variables

Data are defined variable by variable (i.e. one column at a time). To do it, double click on a column heading where it says var and the Define Variable dialogue box will open up. Type a meaningful name in the box for Variable Name: In this case, I’ve named the first variable age. The name can be up to eight characters long and must begin with a letter.

CONTACT

IT Services
Bangor University, Deiniol Building, Deiniol Road, Bangor, Gwynedd LL57 2UX

Phone: 01248 38 8111

Email: helpdesk@bangor.ac.uk

Bangor University is a Registered Charity: No. 1141565