Free Statistical Tools on the WEB

click here
to return to **stats** page

click here
to return to **reports** page

click here to
return to main page

A short version of this article first appeared in the International Statistical Institute newsletter, Vol 26, Number 1 (76), 2002, and is at http://isi.cbs.nl/NLet/NLet021-04.htm and http://isi.cbs.nl/FreeTools.htm

There is a great deal of information about statistics available
for free on the web. Information includes data, general
statistical textbooks, email lists, software, and many sites about
special topics, such as epidemiology, forecasting, data
presentation, data editing, multiple imputation, and propensity
score analysis. This article is a brief review of some
useful sites covering these topics.

One place to start is to look at sites that are general links to
other statistical sites. General sites are Betty Jung's statsites
http://www.bettycjung.net/Statsites.htm,
statsci http://www.statsci.org/index.html,
the World Wide Web Virtual Library: Statistics http://www.stat.ufl.edu/vlib/statistics.html,
and Dr. Hossein Arsham's list http://home.ubalt.edu/ntsbarsh/Business-stat/R.htm. These
sites can be used to find other sites.

At present, it is interesting to note that a number of statistical
organizations are making concerted efforts to promote statistics,
by increasing public awareness of how statistics impacts on
everyday life. A current example is the **World of Statistics**
http://www.worldofstatistics.org/
This project is comprised of several thousand statistical
organizations, to promote statistics in public awareness and as a
profession. Similar projects are: the American Statistical
Association's (ASA), **Statistical Significance** series, http://www.amstat.org/policy/statsig.cfm,
which is a set of pamphlets each dedicated to showing how
statistics informs some particular area, such as energy, health
care and the environment; and the International Statistical Literacy Project http://iase-web.org/islp/
with a mission "to contribute to promoting statistical literacy
across the world, among young and adults." A kind of related
project is stats.org
http://www.stats.org/
from
George Mason University. This project describes basic statistical
terms but the main focus seems to be discussing news stories and
how to understand the statistics in those news stories. The Australian Bureau of Statistics
http://www.abs.gov.au/websitedbs/a3121120.nsf/home/Understanding%20statistics
has
an
on
line
class and a page defining statistical terms.

People can also take free on line training classes on statistics,
for example, from the North Carolina Center for Public Health
Preparedness Training Web Site, http://nciph.sph.unc.edu/tws/index.php
or the Northwest Center for Public Health Practice http://www.nwcphp.org/training
These classes offers a certificate at the end of the
training. StatTrek http://stattrek.com/ also
has a couple of on line tutorials. An open course from
Carnegie Mellon http://oli.cmu.edu/learn-with-oli/see-our-free-open-courses/
is basically presenting material used in the course taught at the
University. A number of organizations of on line classes have
statistics classes and videos. Coursera https://www.coursera.org/
is one example, but unfortunately you have to search. One class is
Statistics One https://www.coursera.org/course/stats1
Others inc lude EdX https://www.edx.org/course-list/allschools/statistics-data-analysis/allcourses
Kahn Academy https://www.khanacademy.org/math/probability
and School of Data http://schoolofdata.org/

Other statistical sites of interest on the the web statistical associations. An international association is the International Statistical Institute http://isi-web.org/ . Some other associations are the American Statistical Association http://www.amstat.org/ the International Chinese Statistical Association http://www.icsa.org/ and the International Indian Statistical Association http://www.intindstat.org/ .

There is also tons of free software on the net. The best place to find free statistical software is the

One page reviews which statistical software packages are most popular http://r4stats.com/articles/popularity/ A variety of measures are used, including job advertisements, scholarly articles mentioning which software was used for analysis, website popularity (number of sites linking to that software), blogs and discussion forum activity, among other measures. Most of the programs are commercial, but R, http://cran.r-project.org/ a free program, ranks high on many of the indicators. R is a language for statistical computing, but has a number of graphical interfaces to make it easier to use, as well as many tutorials and guides.

There are a number of email lists. **Allstat**, at
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=allstat
is a general list, although a great deal of the postings
appear to be postings about jobs or training
courses. Another list **stat-l**, at http://lists.mcgill.ca/archives/stat-l.html
focuses more on statistical questions. Another useful list, not on
Allstat, is **Epidemio**, at http://www.listes.umontreal.ca/wws/info/epidemio-l
This
list
is
about
epidemiology.
Another
form
of
discussion
group
is
the
forum.
TalkStats http://www.talkstats.com/
is
one
forum,
with
discussions
about
basic
to
advanced, homework to theory.

There are a number of comprehensive places to look for
data. One starting point for social, political and economic
data is the **Global Social Change Research Project** http://gsociology.icaap.org/,
which
has
both
links
to
a
very
large
number
of
other
data
link
sites, and a page of data sets compiled or created from other data
sets. Many of the data sets listed on this project site are public
domain. All of the data are free to use. This UN site http://data.un.org/
has data on nearly every topic, from the UN and it's various
associates. Recently, the UN also posted a note saying that all of
their data are free to use, copy, duplicate, etc, provide the UN
is cited http://data.un.org/Host.aspx?Content=UNdataUse
The Worldbank also
has a data page http://data.worldbank.org/
The WorldBank also says most of their data can be used freely http://go.worldbank.org/OJC02YMLA0.
This UN site http://unstats.un.org/unsd/methods/inter-natlinks/sd_natstat.asp
and this BLS site http://www.bls.gov/bls/other.htm
link to national statistical centers of most countries of the
world.

There are many statistical journals on the web with free content.
Many of these are listed at the Directory
of Open Access Journals (DOAJ) http://doaj.org/ . Some of the
journals listed here include the Electronic Journal of Applied Statistical Analysis
http://siba-ese.unisalento.it/index.php/ejasa/index
, the **Electronic Journal of Statistics** http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ejs
the Journal of Official
Statistics http://www.jos.nu/
and InterStat http://interstat.statjournals.net/

click here to return to top

There are resources about dozens of specific topics on the
web. Some of these topics include epidemiology, graphical
analysis and presentation, missing data, forecasting, gathering
data and meta-analysis.

**Bayesian inference** is a topic that should interest all
statisticians. Basically, the traditional approach (called the
"frequentist") is to use previously accumulated data to design a
study and propose a hypothesis. Then a test is used to draw a
conclusion. In contrast, Bayesian inference is a formal process to
learn from evidence as it accumulates. "The Bayesian approach uses
Bayes’ Theorem to formally combine prior information with current
information on a quantity of interest. The Bayesian idea is to
consider the prior information and the test results as part of a
continual data stream, in which inferences are being updated each
time new data become available." (Guidance for the Use of Bayesian
Statistics in Medical Device Clinical Trials http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm)
There is much more to Bayesian inference, like that it makes much
more explicit the examination of underlying assumptions. More
resources include The International Society for Bayesian Analysis
(ISBA) http://bayesian.org/,
Bayesian perspectives for epidemiological research:
I. Foundations and basic methods http://ije.oxfordjournals.org/content/35/3/765.full
by Sander Greenland, and Bayesian Statistics in
Oncology http://onlinelibrary.wiley.com/doi/10.1002/cncr.24628/full,
by Adamina, Tomlinson and Guller. There is also an interesting
video, Bayesian statistics made (as) simple (as
possible) http://www.youtube.com/watch?v=bobeo5kFz1g
from Allen Downey, Professor of Computer Science at the Franklin
W. Olin College of Engineering.

A very good place to find world epidemiological data, reports, issues and information is from WHO http://www.who.int/topics/epidemiology/en/ which includes for example the 10 leading causes of death, and the Weekly Epidemiological Record.

**Missing Data**: One site that gives an overview of
missing data page is the FAQ page of this missing data project http://www.missingdata.org.uk/
A couple of publications devoted to missing data are the June 2011
issue of Journal of Official Statistics http://www.jos.nu/Contents/issue.asp?vol=27&no=2
and the National Academies Press 2013 report, Nonresponse in
Social Science Surveys: A Research Agenda http://www.nap.edu/catalog/18293/nonresponse-in-social-science-surveys-a-research-agenda
One method frequently used is multiple imputation which fills in
missing data by using other variables to predict the missing
values. One software program for estimating missing data is
**AMELIA**, at http://gking.harvard.edu/software/

**Forecasting**: Several faculty members have lectures about
forecasting. **Hossein Arsham's Time Series Analysis and
Forecasting Techniques**, at http://home.ubalt.edu/ntsbarsh/Business-stat/stat-data/Forecast.htm
and Robert Nau's Lecture Notes on Forecasting http://people.duke.edu/~rnau/411home.htm
Also, another forecasting site is the **Federal
Forecasters Consortium**, at http://www.va.gov/HEALTHPOLICYPLANNING/FFC_2014.asp
Conference proceedings can be downloaded from this site.

**Methods of gathering data**: There are a number of
sites on gathering data. Two places to start are **Resources
for Methods in Evaluation and Social Research**,
at http://gsociology.icaap.org/methods/
and **The World Wide Evaluation Information Gateway**
http://www.policy-evaluation.org/
These
site
are
link
to
other
sites
about
methods,
quantitative
and
qualitative.
Some
sites
are
about specific tools in data gathering. Tom O'Connor's
lecture notes, at http://www.drtomoconnor.com/3760/default.htm
covers various issues such as measurement, validity and
reliability, and scales in indexes.

**Meta-analysis**: A brief overview is here, at Study
Design 101 http://himmelfarb.gwu.edu/tutorials/studydesign101/metaanalyses.html
One of the lectures for this methods in epidemiology is
about meta analysis http://www.uic.edu/classes/epid/epid401/
A somewhat strict standard is described by Cleophas and
Zwinderman http://circ.ahajournals.org/content/115/22/2870.full
This article by Kattan reviews the strengths and weaknesses of
meta analysis http://www.ccjm.org/content/75/6/431.full.

**Propensity score analysis** http://www.epa.gov/caddis/da_advanced_5.html
.
Propensity
score
analysis
is
a
method
of
dealing
with
self
selection
bias
or other selection bias. The methodology center at Penn
State has a podcast about propensity score analysis http://methodology.psu.edu/multimedia/podcast
Other papers include this by Nicholas and Gulliford http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553527/
and another by Austin http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/
Paul Rosenbaum, the father of propensity score analysis, has a
couple of overviews, http://www-stat.wharton.upenn.edu/~rosenbap/index.html,
one about propensity scores, and another about observational
studies.

**Data mining**. Statsoft has an entry on data mining http://www.statsoft.com/textbook/data-mining-techniques/
Professors Anand Rajaraman and Jeffrey D. Ullman have a book,
Mining of Massive Datasets, http://infolab.stanford.edu/~ullman/
which includes a chapter on data mining. This youtube video
http://www.youtube.com/watch?v=R-sGvh6tI04
from NJIT School of Management professor Stephan P Kudyba,
introduces data mining

click here to return to

click here to return to main page

click here to return to top

last updated 8/30/15

last verified 8/30/2015

click here
to see who we are or to contact us