winidams tips

WINIDAMS TIP PAGE

Return to software page

Introduction

This is an example of simple data analysis in WinIdams. This is only some basic steps and some key points.

This shows how to import data from a csv file, and how to read files from a text file.

Just to mention, when I asked questions of the WinIdams staff, I got fairly quick and good responses.

WinIDAMS is available at http://portal.unesco.org/ci/en/ev.php-URL_ID=2070&URL_DO=DO_TOPIC&URL_SECTION=201.html

There are several steps involved in analysis of data.

1. Getting the data ready.
2. Defining the data.
3. Defining and running the analysis.

Below are an example data file, dictionary file and analysis file (the 'set up') for correlation. The data file is the data. The dictionary file defines the variables, missing values, labels, and so on. The analysis or set up file tells what kind of analyses to do.

In this tips page, the data, dictionary and set up files begin and end with a row of asterics. That's just to seperate them from the explanation. The actual files, as they are really to be used, are everything in between, but not including, the rows of asterics. Similarly, where I've included code from the WinIdams manual, I've enclosed those parts in lines of asterics.

Also, for the example of reading a text data file, I made this file in a text editor, and in that program, all the columns lined up as they were supposed to. I'm not sure whether they do in this web page. If you copy this text to a text editor like EditPad, I hope the text lines back up.

Following the files, I mention some key points that you need to know.

Getting the data ready

Basically, this data set has an ID, then four variables, and there are 10 cases.

This is the data file as text
**********************
1 1 -9 3 -9
2 2 5 4 -9
3 3 3 -9 6
4 1 4 0 5
5 2 7 8 4
6 3 8 9 3
7 1 -9 6 3
8 2 6 -9 2
9 3 5 -9 1
10 3 0 1 0
**********************

KEY POINTS about data files in text format:

The variables are all aligned in their specific columns.

The variables can't have any blanks. For missing, put in -9 or -99 or some other identifier. The actual identifier will be define below in the dictionary file.

Don't put in a row of variable names at the top. This data file only includes the actual numbers. The dictionary file below gives the variable names.

All data fields have to be 9 or less spaces in width, including decimal places.

Defining the data

This is the dictionary file ('.dic' file) for the data above
**********************
   3   1   5   1   1
T   1 varID                       1   20
T   2 var2                        3   20
C   2         1      okay
C   2         2      notokay
C   2         3      between
T   3 var3                        6   20         -9
T   4 var4                        9   20         -9
T   5 var5                       12   20         -9
**********************

KEY POINTS about the dictionary file:

-On my computer, the data and dictionary file are in this directory (where WinIdams is installed)

C:\Program Files\WinIDAMS11-EN\data
testing.dat (the data file)
testing.dic (the dictionary file)

WinIdams will automatically find these files in the data directory.

From the WinIdams manual, the first line in the dictionary always has the following form:

****************
column   content
4        3 (indicates the type of dictionary).
5-8      First variable number (right justifed).
9-12     Last variable number (right justifed).
13-16    Number of records per case (right justifed).
20       Form in which variable location is specifed (columns 32-39).
                               blank          Record number and starting and ending columns. (I didn't use this method)
             1       Starting location and field width.
****************

I have 5 variables (varid and var2 to var5), 1 record (or row) per case, and "1" in column 20, indicating I'm going to define the variables by indicating starting location and field width.

In my dictionary file above:

In this
column this is what the code means

4                   "3"   means this is a certain type of file (I forgot what this means exactly)
5-8               "   1" the first variable is variable number 1.
9-12             "   5" the last variable is variable number 5.
13-16         "   1"   there is one record per case.
20                  "1" this tells how the variables are going to be defined. If this field has a "1" then vars are defined by giving the
                                column number it starts in and the field width. I'll explain this more below.

Then in the rest of the lines in the data dictionary, I define the variables, using this format, taken from the WinIdams manual (pages 14, 15):

****************
Variable-descriptor records (T-records). The dictionary contains one such record for each variable. These records are arranged in ascending order by the variable number. The variable numbers need not be contiguous. The maximum number of variables is 1000.

column      content
1           T
2-5         Variable number.
7-30        Variable name.
32-35       Starting location of the variable within the case.
36-39       Field width (1-9 for numeric variables and 1-255 for alphabetic variables).
40          Number of decimal places (numeric variables only).
            Blank implies no decimal places.
41          Type of variable.
            blank means Numeric.
45-51       First missing data code for numeric variables (or blanks if no 1st missing data code).
           (right justified)
****************

I gave the variables names var2 to var5, but you can call them anything. The dictionary I gave above lists the five variables, starting in columns 1, 3, 6, 9, and 12, with 0 decimal places, and with (for vars3 to 5) -9 for missing.

For example:

T 3 var3 6 20 -9

Means the following:

T in the first colum means this is a Variable-descriptor record. This line describes the variable, e.g., how many columns, what missing codes are, and so on.
3 in column 2-5 means this is variable number 3.
var3 in column 7-30 means the variable name is "var3"
6 in column 32-35 means this variable starts in column 6.
2 in column 36-39 means this variable is 2 columns wide.
0 in column 40 means this variable has no decimal places.
(Thus, the numbers "20" in the line above is actually a "2" in columns 36-39 and a "0" in column 40.)
-9 in column 45-51 the means missing value for this variable is -9.

Variable 2 is a categorical variable ("okay", "not okay", "between"). So following the variable descriptor record (3rd line), I defined the values of this variable. These lines (lines 4-6) show the values:

C   2         1      okay
C   2         2      notokay
C   2         3      between

C in the first column means this line defines a value.
2 in column 5 means this value is for variable 2.
1, 2 and 3 in columns 15 tells which labels go with which values. So, as it is listed above:
    1 = okay
    2 = notokay
    3 = between

Doing some analysis - TABLES

(frequencies)

This is the 'set up' file for TABLES

**********************
$COMMENT basic freqs of testing data
$RUN TABLES
$FILES
DICTIN = testing.dic
DATAIN = testing.dat
$SETUP
FREQUENCY TABLES
PRINT=(CDICT)
TABLES
ROWVARS=(V2) CELLS=(ROWP,FREQS)
**********************

KEY POINTS: about TABLES (frequency analysis)

-The set up file is in this directory

C:\Program Files\WinIDAMS11-EN\work

testingfreq.set (the set up file)

$RUN     tells what statistical procedure (e.g., tables) you are going to run.
$FILES   the following lines tell the name of the data and dictionary files.
$SETUP   this part tells how to do the statistical procedure (e.g., tables)

The first line after the $SETUP can be a filter, which is optional. The filter is to select a subset of cases. I didn't do that (yet), so I don't have anything.

The second line is the LABEL. This is the title of your analysis. I followed the WinIdams manual and just wrote "FREQUENCY TABLES". You have to have a label line. Since I had no filter, this is actually the first line.

The third line is a PARAMETER. This line is also required. As above, since I don't have a filter line, this is actually the second line. You can put in max number of cases, how to treat non numeric data, name of the dictionary and data files, some other stuff.

I used PRINT=(CDICT) which says that your results file should include printing the input dictionary for the variables you use.

The TABLES line says you are going to, in the next lines, define the table you want. In this case,

ROWVARS=(V2) CELLS=(ROWP,FREQS)

says this is a simple table of one row, for variable 2. In the cells, print the row percent (ROWP) and frequencies.

Doing some analysis - Correlation

This is the 'set up' file for correlation
**********************
$COMMENT correlation of testing data
$RUN PEARSON
$FILES
DICTIN   = testing.dic
DATAIN   = testing.dat
$SETUP
CORRELATION COEFFICIENTS
MATRIX=SQUA ROWV=(V2,V3,V4,V5) -
   PRINT=(PAIR,CDICT,CORR) WRITE=CORR -
   MDHANDLING=PAIR
**********************

KEY POINTS: about correlation analysis

-The set up file is in this directory

C:\Program Files\WinIDAMS11-EN\work

testingcorr.set (the set up file)

$RUN     tells what statistical procedure (e.g., correlation) you are going to run.
$FILES   the following lines tell the name of the data and dictionary files.
$SETUP   this part tells how to do the statistical procedure (e.g., correlation)

MATRIX=SQUA ROWV=(V1,V2,V3,V4) This tells what variables to include. Use variable names internal to WinIdams (e.g., V1,V2,V3 etc) and NOT the variable names you gave.

MDHANDLING=PAIR this means use pairwise deletion of data, e.g., for each pair of variables to be correlated, use all cases that have data for those two variables. This is opposed to casewise, which is use only those cases that have valid values for all variable listed in the correlation procedure (e.g., in ROWV=(list of vars).

Command lines: if your command is more than one line, you need a "-" to tell that the command continues next line. So in the line starting "MATRIX" there is a "-" at the end, and another "-" at the end of the "PRINT" line, because its all one command.

Last modified 5/23/04
last checked 5/31/08