Return to software
page
Introduction |
This is an example of simple data analysis in WinIdams.
This is only some basic steps
and some key
points.
This shows how to import data from a csv file, and how to read files from a text file.
Just to mention, when I asked questions of the WinIdams staff, I got fairly quick and good responses.
WinIDAMS is available at http://portal.unesco.org/ci/en/ev.php-URL_ID=2070&URL_DO=DO_TOPIC&URL_SECTION=201.htmlThere are several steps involved in analysis of data.
1. Getting
the data ready.
2. Defining
the data.
3. Defining
and running the analysis.
Below are an example data file, dictionary
file and analysis
file (the
'set up') for correlation. The data file
is the data. The dictionary
file defines the variables, missing values, labels, and so on. The analysis or set up
file tells what kind of analyses to do.
Also, for the example of reading a text data file, I made this file in
a text editor, and in that program, all
the
columns lined up as they were supposed to. I'm not sure whether they do
in this web page. If you copy this text to a text editor like EditPad,
I hope the text lines back up.
Following the files, I mention some key points that you need to
know.
Getting the data ready |
Basically, this data set has an ID,
then four variables, and there
are 10 cases.
This is the
data file as text
**********************
1 1
-9 3 -9
2
2
5 4 -9
3
3
3 -9 6
4
1
4 0 5
5
2
7 8 4
6
3
8 9 3
7 1
-9 6 3
8
2
6 -9 2
9
3
5 -9 1
10 3
0 1 0
**********************
KEY
POINTS
about data files in text format:
Defining the data |
This is the
dictionary file ('.dic' file) for the data above
**********************
3 1 5 1 1
T
1
varID
1 20
T
2
var2
3 20
C
2
1
okay
C 2
2
notokay
C 2
3
between
T
3
var3
6 20 -9
T
4
var4
9 20 -9
T
5
var5
12 20 -9
**********************
KEY POINTS about the dictionary file:
-On my computer, the data and dictionary file are in this directory (where WinIdams is installed)
C:\Program Files\WinIDAMS11-EN\data
testing.dat (the data
file)
testing.dic (the
dictionary
file)
WinIdams will automatically find these files in the data
directory.
From the WinIdams manual, the first line in the dictionary always has the following form:
****************I have 5 variables (varid and var2 to var5), 1 record (or row)
per
case,
and "1" in column 20, indicating I'm going to define the variables by
indicating
starting location and field width.
In my dictionary file above:
In this
column
this is what the code means
4
"3" means this is a certain type of file (I
forgot what this means exactly)
5-8
" 1" the first variable is variable number 1.
9-12
" 5" the last variable is variable number 5.
13-16 "
1" there is one record per case.
20
"1" this tells how the variables are going to be defined.
If this field has a "1" then vars are defined by giving the
column
number it starts in and the field width. I'll explain this more below.
Then in the rest
of the lines in the data dictionary, I define the variables,
using this format, taken from the WinIdams manual (pages 14, 15):
****************
Variable-descriptor
records (T-records). The dictionary contains one such record for each
variable.
These records are arranged in ascending order by the variable number.
The
variable numbers need not be contiguous. The maximum number of
variables
is 1000.
column
content
1
T
2-5
Variable number.
7-30
Variable name.
32-35
Starting location of the variable within the case.
36-39
Field width (1-9 for numeric variables and 1-255 for alphabetic
variables).
40
Number of decimal places (numeric variables only).
Blank implies no decimal places.
41
Type of variable.
blank means Numeric.
45-51
First missing data code for numeric variables (or blanks if no 1st
missing
data code).
(right justified)
****************
I gave the variables names var2 to var5, but you can call them
anything. The dictionary I gave above lists the five variables,
starting in columns
1, 3, 6, 9, and 12, with 0 decimal places, and with (for vars3 to 5) -9
for missing.
For example:
T 3 var3 6 20 -9
Means the following:
Variable 2 is a categorical variable ("okay", "not okay", "between").
So following the variable descriptor record (3rd line), I defined the
values of this variable. These lines (lines 4-6) show the values:
C
2
1
okay
C 2
2
notokay
C 2
3 between
C in the first column means
this line defines a value.
2 in column 5 means
this value is for variable 2.
1, 2 and 3 in columns 15 tells
which labels go with which values. So, as it is listed above:
1 = okay
2 = notokay
3 = between
Doing some analysis - TABLES
(frequencies)
|
**********************
$COMMENT basic freqs of
testing data
$RUN TABLES
$FILES
DICTIN = testing.dic
DATAIN = testing.dat
$SETUP
FREQUENCY TABLES
PRINT=(CDICT)
TABLES
ROWVARS=(V2) CELLS=(ROWP,FREQS)
**********************
KEY
POINTS:
about TABLES (frequency analysis)
-The set up file is in this directory
C:\Program Files\WinIDAMS11-EN\work
testingfreq.set (the set up file)
$RUN tells what statistical procedure
(e.g., tables) you are going to run.
$FILES the following lines tell the name of the data and
dictionary files.
$SETUP this part tells how to do the statistical procedure
(e.g., tables)
The first line
after the $SETUP can be a filter, which is optional. The filter is to
select a subset of cases. I didn't do that (yet), so I don't have
anything.
The second line
is the LABEL. This is the title of your analysis. I followed the
WinIdams manual and just wrote "FREQUENCY TABLES". You have to have a
label line. Since I had no filter, this is actually the first line.
The third line is a PARAMETER. This line is also required. As above, since I don't have a filter line, this is actually the second line. You can put in max number of cases, how to treat non numeric data, name of the dictionary and data files, some other stuff.
The TABLES line says you
are going to, in the next lines, define the table you want. In this
case,
ROWVARS=(V2) CELLS=(ROWP,FREQS)
says this is a simple table of one row, for variable 2. In the
cells, print the row percent (ROWP) and frequencies.
Doing some analysis -
Correlation |
This is the
'set up' file for correlation
**********************
$COMMENT correlation of testing data
$RUN PEARSON
$FILES
DICTIN = testing.dic
DATAIN = testing.dat
$SETUP
CORRELATION COEFFICIENTS
MATRIX=SQUA ROWV=(V2,V3,V4,V5) -
PRINT=(PAIR,CDICT,CORR) WRITE=CORR -
MDHANDLING=PAIR
**********************
-The set up file is in this directory
C:\Program Files\WinIDAMS11-EN\work
testingcorr.set (the set up file)
$RUN tells what statistical procedure
(e.g.,
correlation) you are going to run.
$FILES the following lines tell the name of the data and
dictionary files.
$SETUP this part tells how to do the statistical procedure
(e.g., correlation)
MATRIX=SQUA ROWV=(V1,V2,V3,V4) This tells what variables to include. Use variable names internal to WinIdams (e.g., V1,V2,V3 etc) and NOT the variable names you gave.
MDHANDLING=PAIR this means use pairwise deletion of data, e.g., for each pair of variables to be correlated, use all cases that have data for those two variables. This is opposed to casewise, which is use only those cases that have valid values for all variable listed in the correlation procedure (e.g., in ROWV=(list of vars).
Command lines: if your command is more than one line, you need
a "-"
to tell that the command continues next line. So in the line starting
"MATRIX"
there is a "-" at the end, and another "-" at the end of the "PRINT"
line,
because its all one command.