comparing freestat programs

Comparing free statistical software
Handling missing data

Click here to return to the free software page

This page shows some output from Epi Info, MicrOsiris and WinIDAMS. For comparison, there is output from Excel, using a version of the data set with no missing. The output is correlations and regression. I did this in November 2006 using the most recent versions of the software at that time. I used a version of PD-Plus, available on my data page. I updated this on March 2012 to include Instat and PSPP, and in April 2016 to include JASP.

The main finding is that all programs give the same results. Except, see note 8, where Instat operates differently from the other programs in correlation. It uses casewide deletion, while the other programs use pairwise.

MicrOsiris http://www.microsiris.com/
Epi Info   http://wwwn.cdc.gov/epiinfo/
WinIDAMS   http://portal.unesco.org/ci/en/ev.php-URL_ID=2070&URL_DO=DO_TOPIC&URL_SECTION=201.html
PSPP   http://www.gnu.org/software/pspp/
JASP https://jasp-stats.org/

Special case
Instat http://www.reading.ac.uk/ssc/resourcepage/instat.php (see note #8)

Data for this analysis.

Using this data set with blanks for missing:
http://gsociology.icaap.org/data/PD_data_cia.csv
listed here   http://gsociology.icaap.org/dataupload.html

and this data set with -999 or -9 for missing (for WinIDAMS)
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.csv
saved as these WinIDAMS data set and dictionary
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.dat
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.dic

and this version, with no missing, and only including the variables used in the regressions below:
http://gsociology.icaap.org/methods/PD_data_cia_stat4u_nomiss.csv
I include this data set because I used it with excel and Stat4U to compare the results with the other programs.

NOTES

1. MicrOsiris, Instat and Epi Info read files with blanks for missing.

2. For WinIDAMS, each variable has to have a 'missing' indicator. I used -999 or -9, and these have to be clearly defined in the file definition. See the .dic file listed above.

3. MicrOsiris uses a .csv file and Epi Info can read excel or csv. Instat reads excel or csv files.

4. When using MicrOsiris,

a. import the .csv file, then call up commands.

b. for blanks, Microsiris assigns 1.5 and 1.6 billion, but automatically recognises these values as missing.

c. the data dictionary shows 0 decimal places, but if the data actually have decimal places, like 1.23, the number is read as 1.23, with the decimal place. The data dictionary shows how many decimal places are implied, if there isn't one.

5. Epi Info doesn't do correlation (at least in a version I used in 2008). You need to use regression with 2 variables to get the correlation coefficient.

6. Regression: The basic output for Epi Info and MicrOsiris seems to be the step regression for WinIDAMS.

7. For an old version of Instat that I used, it gives the same results, except that it seems to operate differently than does the others.
    For correlation, Instat deletes all cases with any missing values (casewise deletion). All the other programs do pairwise deletion, that is, they do correlations for variables, pairs at a time and only
    exclude missing for that pair.
    I'm not sure if the same thing applies to the current version.

Just Correlations

Return to top

Using these variables: gini (inequality), phone_kpop (phone lines per 1,000 population, c-arable (land cultivated for crops like wheat, maize, and rice that are replanted after each harvest), gdp per capita, infant mortality rate and literacy rate.

Pairwise deletion of cases.

*****************
MicrOsiris
*****************

                      V10        V15        V16        V19        V36
                     gini phone_kpop   c-arable     gdpcap        IMR
phone_kpop V15    -0.4101
c-arable   V16    -0.4310     0.0236
gdpcap     V19    -0.3597     0.7845     0.0014
IMR        V36     0.3555    -0.6597    -0.1165    -0.5196
literacy   V37    -0.2679     0.5955     0.1046     0.4671    -0.7191

*****************
WinIDAMS
*****************
use this setup
http://gsociology.icaap.org/methods/PD_data_cia_nine_comma_w3.set

                           VAR     10       15       16       19       28

phone_kpop                 15 -0.4101
c-arable                   16 -0.4310   0.0236
gdpcap                     19 -0.3597   0.7845   0.0014
IMR                        28   0.3555 -0.6597 -0.1165 -0.5196
literacy                   29 -0.2676   0.5951   0.1051   0.4671 -0.7195

(V10 is gini)
The correlation coeffients are slightly different because I had to format the data set slightly differently, e.g., different number of decimal places.

*****************
PSPP
*****************

Correlations
|------------------------------|----|----------|--------|------|----|--------|
|                              |gini|phone_kpop|c_arable|gdpcap|IMR |literacy|
|----------+-------------------|----+----------+--------+------+----+--------|
|gini      |Pearson Correlation|1.00|      -.41|    -.43| -.36| .36|    -.27|
|          |Sig. (2-tailed)    |    |       .00|     .00|   .00| .00|     .00|
|          |N                  | 122|       121|     122|   122| 122|     120|
|----------+-------------------|----+----------+--------+------+----+--------|
|phone_kpop|Pearson Correlation|-.41|      1.00|     .02|   .78|-.66|     .60|
|          |Sig. (2-tailed)    | .00|          |     .72|   .00| .00|     .00|
|          |N                  | 121|       228|     225|   227| 220|     212|
|----------+-------------------|----+----------+--------+------+----+--------|
|c_arable |Pearson Correlation|-.43|       .02|    1.00|   .00|-.12|     .10|
|          |Sig. (2-tailed)    | .00|       .72|        |   .98| .08|     .13|
|          |N                  | 122|       225|     232|   227| 221|     215|
|----------+-------------------|----+----------+--------+------+----+--------|
|gdpcap    |Pearson Correlation|-.36|       .78|     .00| 1.00|-.52|     .47|
|          |Sig. (2-tailed)    | .00|       .00|     .98|      | .00|     .00|
|          |N                  | 122|       227|     227|   230| 223|     215|
|----------+-------------------|----+----------+--------+------+----+--------|
|IMR       |Pearson Correlation| .36|      -.66|    -.12| -.52|1.00|    -.72|
|          |Sig. (2-tailed)    | .00|       .00|     .08|   .00|    |     .00|
|          |N                  | 122|       220|     221|   223| 223|     211|
|----------+-------------------|----+----------+--------+------+----+--------|
|literacy |Pearson Correlation|-.27|       .60|     .10|   .47|-.72|    1.00|
|          |Sig. (2-tailed)    | .00|       .00|     .13|   .00| .00|        |
|          |N                  | 120|       212|     215|   215| 211|     215|
|----------|-------------------|----|----------|--------|------|----|--------|

*****************
JASP
*****************

Correlation Matrix

Pearson Correlations
	gini	phone_kpop	c-arable	gdpcap	IMR	literacy
gini	—	-0.410	-0.431	-0.360	0.355	-0.268
phone_kpop		—	0.024	0.785	-0.660	0.596
c-arable			—	0.001	-0.116	0.105
gdpcap				—	-0.520	0.467
IMR					—	-0.719
literacy						—

JASP

Casewise deletion of cases.

*****************
Instat
*****************

           gini       phone_k    c_arabl    gdpcap     IMR        literac
gini       1.0000
phone_k   -0.4159 1.0000
c_arabl   -0.4388    0.1867     1.0000
gdpcap    -0.3681   0.9229     0.0961     1.0000
IMR      0.3580    -0.7194    -0.2125    -0.6610    1.0000
literac   -0.2720     0.6118     0.1418     0.5440   -0.7388     1.0000

*****************
Excel
*****************

Excel doesn't do casewise deletion. I just created a comparison data set with no missing, to compare results with Instat.

          gini          phone_kpop      c-arable        gdpcap         IMR         literacy
gini         1
phone_kpop -0.415920344    1
c-arable    -0.438828794    0.186734057    1
gdpcap      -0.368095742    0.922881434    0.096105248     1
IMR          0.35798348    -0.719421668   -0.212493601    -0.660980617    1
literacy    -0.271988365    0.611754978    0.141757381   0.543983666    -0.738766709    1

Regressions

Return to top

Predicting gini (inequality) from phone_kpop (phone lines per 1,000 population), c-arable (land cultivated for crops like wheat, maize, and rice that are replanted after each harvest), climate and North (degrees from the equator).

*****************
Epi Info
*****************

Linear Regression

Variable	Coefficient	Std Error	F-test	P-Value
c_arable	-0.161	0.057	7.8573	0.006030
climate	-1.190	1.149	1.0717	0.302945
North	-0.217	0.033	43.7577	0.000000
phone_kpop	-0.004	0.004	0.7065	0.402507
CONSTANT	51.095	2.212	533.3821	0.000000

Correlation Coefficient: r^2=

0.52

Source	df	Sum of Squares	Mean Square	F-statistic
Regression	4	6447.825	1611.956	28.315
Residuals	106	6034.414	56.928
Total	110	12482.239

*****************
MicrOsiris
*****************
Return to top

Total case count:       111

STANDARD REGRESSION

THE DEPENDENT VARIABLE IS V1: gini

     STANDARD ERROR OF ESTIMATE                7.55
     F-RATIO FOR THE REGRESSION              28.315    PROBABILITY 0.00
     MULTIPLE CORRELATION COEFFICIENT        0.7187    ADJUSTED   0.7059
     FRACTION OF EXPLAINED VARIANCE          0.5166    ADJUSTED   0.4983
     DETERMINANT OF THE CORRELATION MATRIX 0.43109
     RESIDUAL DEGREES OF FREEDOM (N-K-1)        106

     CONSTANT TERM    51.095                           STD. ERROR   2.21236

VARIABLE     NAME                   B         SIGMA(B)      BETA       SIGMA(BETA)

   V15 phone_kpop              -0.37364E-02 0.44452E-02 -0.70633E-01 0.84032E-01
   V16 c-arable                -0.16091      0.57403E-01 -0.21265      0.75862E-01
   V20 climate                  -1.1896       1.1492     -0.88728E-01 0.85710E-01
   V29 North                   -0.21671      0.32760E-01 -0.52881      0.79941E-01
MicrOsiris
Nov 15, 2006                                                                                                    REGRESSION    2

                               PARTIAL PART MARGINAL               COVARIANCE
VARIABLE     NAME                R       R     RSQD    T-RATIO(PROB)   RATIO

   V15 phone_kpop              -0.081 0.057 0.0032   0.8406 (.407)   0.354
   V16 c-arable                -0.263 0.189 0.0358   2.8031 (.006)   0.207
   V20 climate                 -0.100 0.070 0.0049   1.0352 (.303)   0.379
   V29 North                   -0.541 0.447 0.1996   6.6150 (.000)   0.286

********************
WinIDAMS
********************
Return to top

using this setup
http://gsociology.icaap.org/methods/pd_cia_giniregress.set
this is the last step

Step no   4

      Variable entered     15     phone_kpop

      F-level            0.706
      T-level            0.841

          Standard error of estimate                 7.545
          F ratio for the regression                28.315
          Multiple correlation coefficient         0.71872          adjusted        0.70592
          Fraction of explained variance (RSQD)    0.51656          adjusted        0.49832
          Determinant of the correlation matrix    0.43109
          Residual degrees of freedom (N-p-1)          106
          Constant term                             51.095

                                                            Partial
Var. no.        B       Sigma(B)     Beta    Sigma(Beta)   RSQD     Marg RSQD T-ratio Cov. ratio Variable name
    15         -0.0037     0.0044    -0.0706     0.0840     0.0066     0.0032     0.8405     0.3541   phone_kpop
    16         -0.1609     0.0574    -0.2126     0.0759     0.0690     0.0358     2.8031     0.2075   c-arable
    20         -1.1896     1.1492    -0.0887     0.0857     0.0100     0.0049     1.0352     0.3792   climate
    26         -0.2167     0.0328    -0.5288     0.0799     0.2922     0.1996     6.6150     0.2863   North

**************** Listing of marginal R-squares for all potential predictors ***

    Step no.     Var. no.     Variable name              Marg rsqd     Categorical variables (all codes)        Previously in (*)
                                                                             Marg RSQD         T-ratio

        4          15      phone_kpop                      0.0032                                                       *
        4          16      c-arable                        0.0358                                                       *
        4          20      climate                         0.0049                                                       *
        4          26      North                           0.1996                                                       *

********************
Instat
********************

ANOVA for regression of gini
on phone_k c_arable climate North
-------------------------------------------------------------------
Source      df            SS            MS      F value     Prob>F
-------------------------------------------------------------------
Regression   4       6447.83          1612        28.32     0.0000
Residual   106       6034.41        56.928
-------------------------------------------------------------------
Total      110       12482.2
-------------------------------------------------------------------
124 missing or zero-weighted cases
R-squared = 0.5166 (adjusted = 0.4983)

REGRESSION COEFFICIENTS

Y-variate: gini
-----------------------------------------------------------------------------------------
Param.        Estimate           SE           t         Prob>|t|       95% CI
-----------------------------------------------------------------------------------------
Const           51.095        2.212       23.10           0.0000     46.71      55.48
phone_k       -0.00374       0.0044       -0.84           0.4025   -0.0125     0.0051
c_arabl       -0.16091       0.0574       -2.80           0.0060   -0.2747    -0.0471
climate        -1.1896        1.149       -1.04           0.3029    -3.468      1.089
North         -0.21671       0.0328       -6.61           0.0000   -0.2817    -0.1518
---------------------------------------------------------------------------------------

Just to note, I had to search to get these regression coefficients. After you do regression, then you can get the coefficients.

********************
PSPP
********************

Model Summary
|---|--------|-----------------|--------------------------|
| R |R Square|Adjusted R Square|Std. Error of the Estimate|
|---|--------|-----------------|--------------------------|
|.72|     .52|              .50|                      7.55|
|---|--------|-----------------|--------------------------|

ANOVA
|----------|--------------|---|-----------|-----|------------|
|          |Sum of Squares| df|Mean Square| F |Significance|
|----------|--------------|---|-----------|-----|------------|
|Regression|       6447.83| 4|    1611.96|28.32|         .00|
|Residual |       6034.41|106|      56.93|     |            |
|Total     |      12482.24|110|           |     |            |
|----------|--------------|---|-----------|-----|------------|

Coefficients
|----------|-----|----------|----|-----|------------|
|          | B |Std. Error|Beta| t |Significance|
|----------|-----|----------|----|-----|------------|
|(Constant)|51.09|      2.21| .00|23.10|         .00|
|phone_kpop| .00|       .00|-.07| -.84|         .40|
| c_arable | -.16|       .06|-.21|-2.80|         .01|
| climate |-1.19|      1.15|-.09|-1.04|         .30|
|   North | -.22|       .03|-.53|-6.61|         .00|
|----------|-----|----------|----|-----|------------|

********************
JASP
********************

Linear Regression

Model Summary
Model	R	R²	Adjusted R²	RMSE
1	0.719	0.517	0.498	7.545

ANOVA
Model		Sum of Squares	df	Mean Square	F	p
1	Regression	6448	4	1611.96	28.32	< .001
	Residual	6034	106	56.93
	Total	12482	110

Coefficients
Model		Unstandardized	Standard Error	Standardized	t	p
1	intercept	51.095	2.212		.	< .001
	phone_kpop	-0.004	0.004	-0.071	.	0.402
	c-arable	-0.161	0.057	-0.213	.	0.006
	climate	-1.190	1.149	-0.089	.	0.303
	North	-0.217	0.033	-0.529	.	< .001

JASP

t test

Return to top

*****************
Instat
*****************

Simple Models - Normal Distribution, Two Samples

TINt 'phone_kpop' 'c_arable';test 0
Normal model, two samples
Column            phone_kpop c_arable
Sample size       228        232
Minimum           0.16917    0
Maximum           1385.1     62.11
Range             1385       62.11
Mean              244.91     13.447
Std. deviation    241.57     13.028

Pooled standard deviation = 170.319

Difference between means = 231.47 s.e. of difference = 15.883 with 458 d.f.
95% confidence interval for the difference between means
          200.25   to   262.68

t value testing mean difference=0            is 14.57
Significance level is 0.0000 (0.00%) for 2 sided test

*****************
Excel
*****************

t-Test: Two-Sample Assuming Equal Variances

                            phone_kpop    c-arable
Mean                        244.9124484    13.44711207
Variance                  58355.36494    169.7312336
Observations                 228            232
Pooled Variance                 29008.46235
Hypothesized Mean Difference    0
df                            458
t Stat                          14.57323966
P(T<=t) one-tail                4.25214E-40
t Critical one-tail             1.648187415
P(T<=t) two-tail                8.50428E-40
t Critical two-tail             1.965157018

Return to top
Click here to return to the free software page
last validated 12/25/08