Comparing free statistical software
Handling missing data

Click here to return to the free software page

This page shows some output from Epi Info, MicrOsiris and WinIDAMS. For comparison, there is output from Excel, using a version of the data set with no missing. The output is correlations and regression. I did this in November 2006 using the most recent versions of the software at that time. I used a version of PD-Plus, available on my data page. I updated this on March 2012 to include Instat and PSPP, and in April 2016 to include JASP.

The main finding is that all programs give the same results. Except, see note 8, where Instat operates differently from the other programs in correlation. It uses casewide deletion, while the other programs use pairwise.


MicrOsiris  http://www.microsiris.com/
Epi Info   http://wwwn.cdc.gov/epiinfo/    
WinIDAMS   http://portal.unesco.org/ci/en/ev.php-URL_ID=2070&URL_DO=DO_TOPIC&URL_SECTION=201.html  
PSPP   http://www.gnu.org/software/pspp/  
JASP  https://jasp-stats.org/  
 


Special case
Instat   http://www.reading.ac.uk/ssc/resourcepage/instat.php    (see note #8)


Data for this analysis.

Using this data set with blanks for missing:
http://gsociology.icaap.org/data/PD_data_cia.csv
listed here   http://gsociology.icaap.org/dataupload.html

and this data set with -999 or -9 for missing (for WinIDAMS)
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.csv
saved as these WinIDAMS data set and dictionary
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.dat
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.dic

and this version, with no missing, and only including the variables used in the regressions below:
http://gsociology.icaap.org/methods/PD_data_cia_stat4u_nomiss.csv

I include this data set because I used it with excel and Stat4U to compare the results with the other programs.


NOTES

1. MicrOsiris, Instat and Epi Info read files with blanks for missing.

2. For WinIDAMS, each variable has to have a 'missing' indicator. I used -999 or -9, and these have to be clearly defined in the file definition. See the .dic file listed above.

3.  MicrOsiris uses a .csv file and Epi Info can read excel or csv. Instat reads excel or csv files.

4. When using MicrOsiris,
a. import the .csv file, then call up commands. 
b. for blanks, Microsiris assigns 1.5 and 1.6 billion, but automatically recognises these values as missing.
c. the data dictionary shows 0 decimal places, but if the data actually have decimal places, like 1.23, the number is read as 1.23, with the decimal place.  The data dictionary shows how many decimal places are implied, if there isn't one.

5. Epi Info doesn't do correlation (at least in a version I used in 2008). You need to use regression with 2 variables to get the correlation coefficient.

6. Regression: The basic output for Epi Info and MicrOsiris seems to be the step regression for WinIDAMS.

7. For an old version of Instat that I used, it gives the same results, except that it seems to operate differently than does the others.
    For correlation, Instat deletes all cases with any missing values (casewise deletion). All the other programs do pairwise deletion, that is, they do correlations for variables, pairs at a time and only
    exclude missing for that pair.
    I'm not sure if the same thing applies to the current version.





Just Correlations
Return to top

Using these variables: gini (inequality), phone_kpop (phone lines per 1,000 population, c-arable (land cultivated for crops like wheat, maize, and rice that are replanted after each harvest), gdp per capita, infant mortality rate and literacy rate.

Pairwise deletion of cases.


*****************
MicrOsiris
*****************

                      V10        V15        V16        V19        V36
                     gini phone_kpop   c-arable     gdpcap        IMR
phone_kpop V15    -0.4101
c-arable   V16    -0.4310     0.0236
gdpcap     V19    -0.3597     0.7845     0.0014
IMR        V36     0.3555    -0.6597    -0.1165    -0.5196
literacy   V37    -0.2679     0.5955     0.1046     0.4671    -0.7191
 



*****************
WinIDAMS
*****************
use this setup
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.set

                           VAR     10       15       16       19       28

 phone_kpop                 15  -0.4101
 c-arable                   16  -0.4310   0.0236
 gdpcap                     19  -0.3597   0.7845   0.0014
 IMR                        28   0.3555  -0.6597  -0.1165  -0.5196
 literacy                   29  -0.2676   0.5951   0.1051   0.4671  -0.7195
 
(V10 is gini)
The correlation coeffients are slightly different because I had to format the data set slightly differently, e.g., different number of decimal places.


*****************
PSPP
*****************

Correlations
|------------------------------|----|----------|--------|------|----|--------|
|                              |gini|phone_kpop|c_arable|gdpcap|IMR |literacy|
|----------+-------------------|----+----------+--------+------+----+--------|
|gini      |Pearson Correlation|1.00|      -.41|    -.43|  -.36| .36|    -.27|
|          |Sig. (2-tailed)    |    |       .00|     .00|   .00| .00|     .00|
|          |N                  | 122|       121|     122|   122| 122|     120|
|----------+-------------------|----+----------+--------+------+----+--------|
|phone_kpop|Pearson Correlation|-.41|      1.00|     .02|   .78|-.66|     .60|
|          |Sig. (2-tailed)    | .00|          |     .72|   .00| .00|     .00|
|          |N                  | 121|       228|     225|   227| 220|     212|
|----------+-------------------|----+----------+--------+------+----+--------|
|c_arable  |Pearson Correlation|-.43|       .02|    1.00|   .00|-.12|     .10|
|          |Sig. (2-tailed)    | .00|       .72|        |   .98| .08|     .13|
|          |N                  | 122|       225|     232|   227| 221|     215|
|----------+-------------------|----+----------+--------+------+----+--------|
|gdpcap    |Pearson Correlation|-.36|       .78|     .00|  1.00|-.52|     .47|
|          |Sig. (2-tailed)    | .00|       .00|     .98|      | .00|     .00|
|          |N                  | 122|       227|     227|   230| 223|     215|
|----------+-------------------|----+----------+--------+------+----+--------|
|IMR       |Pearson Correlation| .36|      -.66|    -.12|  -.52|1.00|    -.72|
|          |Sig. (2-tailed)    | .00|       .00|     .08|   .00|    |     .00|
|          |N                  | 122|       220|     221|   223| 223|     211|
|----------+-------------------|----+----------+--------+------+----+--------|
|literacy  |Pearson Correlation|-.27|       .60|     .10|   .47|-.72|    1.00|
|          |Sig. (2-tailed)    | .00|       .00|     .13|   .00| .00|        |
|          |N                  | 120|       212|     215|   215| 211|     215|
|----------|-------------------|----|----------|--------|------|----|--------|



*****************
JASP
*****************

Correlation Matrix

Pearson Correlations
  gini phone_kpop c-arable gdpcap IMR literacy
gini
—
-0.410
-0.431
-0.360
0.355
-0.268
phone_kpop
 
—
0.024
0.785
-0.660
0.596
c-arable
 
 
—
0.001
-0.116
0.105
gdpcap
 
 
 
—
-0.520
0.467
IMR
 
 
 
 
—
-0.719
literacy
 
 
 
 
 
—

JASP


Casewise deletion of cases.


*****************
Instat
*****************

           gini       phone_k    c_arabl    gdpcap     IMR        literac
gini       1.0000

phone_k   -0.4159     1.0000

c_arabl   -0.4388     0.1867     1.0000

gdpcap    -0.3681     0.9229     0.0961     1.0000

IMR        0.3580    -0.7194    -0.2125    -0.6610    1.0000

literac   -0.2720     0.6118     0.1418     0.5440   -0.7388     1.0000



*****************
Excel
*****************

Excel doesn't do casewise deletion. I just created a comparison data set with no missing, to compare results with Instat.

            gini           phone_kpop      c-arable        gdpcap         IMR         literacy
gini         1                   
phone_kpop  -0.415920344    1               
c-arable    -0.438828794    0.186734057    1           
gdpcap      -0.368095742    0.922881434    0.096105248     1       
IMR          0.35798348    -0.719421668   -0.212493601    -0.660980617    1   
literacy    -0.271988365    0.611754978    0.141757381     0.543983666    -0.738766709    1




Regressions
Return to top

Predicting gini (inequality) from phone_kpop (phone lines per 1,000 population), c-arable (land cultivated for crops like wheat, maize, and rice that are replanted after each harvest), climate and North (degrees from the equator).

*****************
Epi Info
*****************

Linear Regression


Variable Coefficient Std Error F-test P-Value
c_arable -0.161 0.057 7.8573 0.006030
climate -1.190 1.149 1.0717 0.302945
North -0.217 0.033 43.7577 0.000000
phone_kpop -0.004 0.004 0.7065 0.402507
CONSTANT 51.095 2.212 533.3821 0.000000


Correlation Coefficient: r^2= 0.52


Source df Sum of Squares Mean Square F-statistic
Regression 4 6447.825 1611.956 28.315
Residuals 106 6034.414 56.928  
Total 110 12482.239    



*****************
MicrOsiris
*****************
Return to top

Total case count:       111
 
STANDARD REGRESSION
 
THE DEPENDENT VARIABLE IS V1: gini
 
     STANDARD ERROR OF ESTIMATE                7.55
     F-RATIO FOR THE REGRESSION              28.315    PROBABILITY  0.00
     MULTIPLE CORRELATION COEFFICIENT        0.7187    ADJUSTED   0.7059
     FRACTION OF EXPLAINED VARIANCE          0.5166    ADJUSTED   0.4983
     DETERMINANT OF THE CORRELATION MATRIX  0.43109
     RESIDUAL DEGREES OF FREEDOM (N-K-1)        106
 
     CONSTANT TERM    51.095                           STD. ERROR   2.21236
 
 VARIABLE     NAME                   B         SIGMA(B)      BETA       SIGMA(BETA)
 
   V15  phone_kpop              -0.37364E-02  0.44452E-02 -0.70633E-01  0.84032E-01
   V16  c-arable                -0.16091      0.57403E-01 -0.21265      0.75862E-01
   V20  climate                  -1.1896       1.1492     -0.88728E-01  0.85710E-01
   V29  North                   -0.21671      0.32760E-01 -0.52881      0.79941E-01
MicrOsiris
Nov 15, 2006                                                                                                    REGRESSION    2
 
 
                               PARTIAL  PART  MARGINAL               COVARIANCE
 VARIABLE     NAME                R       R     RSQD    T-RATIO(PROB)   RATIO
 
   V15  phone_kpop              -0.081  0.057  0.0032   0.8406 (.407)   0.354
   V16  c-arable                -0.263  0.189  0.0358   2.8031 (.006)   0.207
   V20  climate                 -0.100  0.070  0.0049   1.0352 (.303)   0.379
   V29  North                   -0.541  0.447  0.1996   6.6150 (.000)   0.286


********************
WinIDAMS
********************
Return to top

using this setup
http://gsociology.icaap.org/methods/pd_cia_giniregress.set
this is the last step

  Step no   4

      Variable entered     15     phone_kpop             

      F-level            0.706
      T-level            0.841

          Standard error of estimate                 7.545   
          F ratio for the regression                28.315
          Multiple correlation coefficient         0.71872          adjusted        0.70592
          Fraction of explained variance (RSQD)    0.51656          adjusted        0.49832
          Determinant of the correlation matrix    0.43109   
          Residual degrees of freedom (N-p-1)          106
          Constant term                             51.095   


                                                            Partial
  Var. no.        B       Sigma(B)     Beta    Sigma(Beta)   RSQD     Marg RSQD  T-ratio  Cov. ratio  Variable name
    15         -0.0037     0.0044    -0.0706     0.0840     0.0066     0.0032     0.8405     0.3541   phone_kpop             
    16         -0.1609     0.0574    -0.2126     0.0759     0.0690     0.0358     2.8031     0.2075   c-arable               
    20         -1.1896     1.1492    -0.0887     0.0857     0.0100     0.0049     1.0352     0.3792   climate                
    26         -0.2167     0.0328    -0.5288     0.0799     0.2922     0.1996     6.6150     0.2863   North                  

 **************** Listing of marginal R-squares for all potential predictors ***


    Step no.     Var. no.     Variable name              Marg rsqd     Categorical variables (all codes)        Previously in (*)
                                                                             Marg RSQD         T-ratio

        4          15      phone_kpop                      0.0032                                                       *
        4          16      c-arable                        0.0358                                                       *
        4          20      climate                         0.0049                                                       *
        4          26      North                           0.1996                                                       *



********************
Instat
********************

ANOVA for regression of gini
on phone_k c_arable climate North

-------------------------------------------------------------------

Source      df            SS            MS      F value     Prob>F

-------------------------------------------------------------------

Regression   4       6447.83          1612        28.32     0.0000

Residual   106       6034.41        56.928

-------------------------------------------------------------------

Total      110       12482.2

-------------------------------------------------------------------

124 missing or zero-weighted cases

R-squared = 0.5166  (adjusted = 0.4983)


REGRESSION COEFFICIENTS


Y-variate: gini

-----------------------------------------------------------------------------------------

Param.        Estimate           SE           t         Prob>|t|       95% CI

-----------------------------------------------------------------------------------------

Const           51.095        2.212       23.10           0.0000     46.71      55.48

phone_k       -0.00374       0.0044       -0.84           0.4025   -0.0125     0.0051

c_arabl       -0.16091       0.0574       -2.80           0.0060   -0.2747    -0.0471

climate        -1.1896        1.149       -1.04           0.3029    -3.468      1.089

North         -0.21671       0.0328       -6.61           0.0000   -0.2817    -0.1518

---------------------------------------------------------------------------------------


Just to note, I had to search to get these regression coefficients. After you do regression, then you can get the coefficients.


********************
PSPP
********************

Model Summary
|---|--------|-----------------|--------------------------|
| R |R Square|Adjusted R Square|Std. Error of the Estimate|
|---|--------|-----------------|--------------------------|
|.72|     .52|              .50|                      7.55|
|---|--------|-----------------|--------------------------|

ANOVA
|----------|--------------|---|-----------|-----|------------|
|          |Sum of Squares| df|Mean Square|  F  |Significance|
|----------|--------------|---|-----------|-----|------------|
|Regression|       6447.83|  4|    1611.96|28.32|         .00|
|Residual  |       6034.41|106|      56.93|     |            |
|Total     |      12482.24|110|           |     |            |
|----------|--------------|---|-----------|-----|------------|

Coefficients
|----------|-----|----------|----|-----|------------|
|          |  B  |Std. Error|Beta|  t  |Significance|
|----------|-----|----------|----|-----|------------|
|(Constant)|51.09|      2.21| .00|23.10|         .00|
|phone_kpop|  .00|       .00|-.07| -.84|         .40|
| c_arable | -.16|       .06|-.21|-2.80|         .01|
|  climate |-1.19|      1.15|-.09|-1.04|         .30|
|   North  | -.22|       .03|-.53|-6.61|         .00|
|----------|-----|----------|----|-----|------------|



********************
JASP
********************

Linear Regression

Model Summary
Model R Adjusted Rē RMSE
1
0.719
0.517
0.498
7.545

ANOVA
Model
Sum of Squares df Mean Square F p
1
Regression
6448
4
1611.96
28.32
< .001


Residual
6034
106
56.93


 


Total
12482
110




 

Coefficients
Model
Unstandardized Standard Error Standardized t p
1
intercept
51.095
2.212
 
.
< .001


phone_kpop
-0.004
0.004
-0.071
.
0.402


c-arable
-0.161
0.057
-0.213
.
0.006


climate
-1.190
1.149
-0.089
.
0.303


North
-0.217
0.033
-0.529
.
< .001

JASP


t test
Return to top

*****************
Instat
*****************

Simple Models - Normal Distribution, Two Samples


TINt 'phone_kpop' 'c_arable';test 0
Normal model, two samples
Column            phone_kpop c_arable
 Sample size       228        232
 Minimum           0.16917    0
 Maximum           1385.1     62.11
 Range             1385       62.11
 Mean              244.91     13.447
 Std. deviation    241.57     13.028

Pooled standard deviation = 170.319

Difference between means  = 231.47  s.e. of difference = 15.883 with 458 d.f.
95% confidence interval for the difference between means
          200.25   to   262.68

t value testing mean difference=0            is 14.57
Significance level is 0.0000 (0.00%) for 2 sided test

*****************
Excel
*****************

t-Test: Two-Sample Assuming Equal Variances       
       
                                phone_kpop    c-arable
Mean                            244.9124484    13.44711207
Variance                        58355.36494    169.7312336
Observations                    228            232
Pooled Variance                 29008.46235   
Hypothesized Mean Difference    0   
df                              458   
t Stat                          14.57323966   
P(T<=t) one-tail                4.25214E-40   
t Critical one-tail             1.648187415   
P(T<=t) two-tail                8.50428E-40   
t Critical two-tail             1.965157018   



Return to top
Click here to return to the free software page
last validated 12/25/08