click here
to return to stats page
click here
to return to reports page
click here to
return to main page
A short version of this article first appeared in the International Statistical Institute newsletter, Vol 26, Number 1 (76), 2002, and is at http://isi.cbs.nl/NLet/NLet021-04.htm and http://isi.cbs.nl/FreeTools.htm
There is a great deal of information about statistics available
for free on the web. Information includes data, general
statistical textbooks, email lists, software, and many sites about
special topics, such as epidemiology, forecasting, data
presentation, data editing, multiple imputation, and propensity
score analysis. This article is a brief review of some
useful sites covering these topics.
One place to start is to look at sites that are general links to
other statistical sites. General sites are Betty Jung's statsites
http://www.bettycjung.net/Statsites.htm,
statsci http://www.statsci.org/index.html,
the World Wide Web Virtual Library: Statistics http://www.stat.ufl.edu/vlib/statistics.html,
and Dr. Hossein Arsham's list http://home.ubalt.edu/ntsbarsh/Business-stat/R.htm. These
sites can be used to find other sites.
At present, it is interesting to note that a number of statistical
organizations are making concerted efforts to promote statistics,
by increasing public awareness of how statistics impacts on
everyday life. A current example is the World of Statistics
http://www.worldofstatistics.org/
This project is comprised of several thousand statistical
organizations, to promote statistics in public awareness and as a
profession. Similar projects are: the American Statistical
Association's (ASA), Statistical Significance series, http://www.amstat.org/policy/statsig.cfm,
which is a set of pamphlets each dedicated to showing how
statistics informs some particular area, such as energy, health
care and the environment; and the International Statistical Literacy Project http://iase-web.org/islp/
with a mission "to contribute to promoting statistical literacy
across the world, among young and adults." A kind of related
project is stats.org
http://www.stats.org/
from
George Mason University. This project describes basic statistical
terms but the main focus seems to be discussing news stories and
how to understand the statistics in those news stories. The Australian Bureau of Statistics
http://www.abs.gov.au/websitedbs/a3121120.nsf/home/Understanding%20statistics
has
an
on
line
class and a page defining statistical terms.
People can also take free on line training classes on statistics,
for example, from the North Carolina Center for Public Health
Preparedness Training Web Site, http://nciph.sph.unc.edu/tws/index.php
or the Northwest Center for Public Health Practice http://www.nwcphp.org/training
These classes offers a certificate at the end of the
training. StatTrek http://stattrek.com/ also
has a couple of on line tutorials. An open course from
Carnegie Mellon http://oli.cmu.edu/learn-with-oli/see-our-free-open-courses/
is basically presenting material used in the course taught at the
University. A number of organizations of on line classes have
statistics classes and videos. Coursera https://www.coursera.org/
is one example, but unfortunately you have to search. One class is
Statistics One https://www.coursera.org/course/stats1
Others inc lude EdX https://www.edx.org/course-list/allschools/statistics-data-analysis/allcourses
Kahn Academy https://www.khanacademy.org/math/probability
and School of Data http://schoolofdata.org/
There are a number of email lists. Allstat, at
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=allstat
is a general list, although a great deal of the postings
appear to be postings about jobs or training
courses. Another list stat-l, at http://lists.mcgill.ca/archives/stat-l.html
focuses more on statistical questions. Another useful list, not on
Allstat, is Epidemio, at http://www.listes.umontreal.ca/wws/info/epidemio-l
This
list
is
about
epidemiology.
Another
form
of
discussion
group
is
the
forum.
TalkStats http://www.talkstats.com/
is
one
forum,
with
discussions
about
basic
to
advanced, homework to theory.
There are a number of comprehensive places to look for
data. One starting point for social, political and economic
data is the Global Social Change Research Project http://gsociology.icaap.org/,
which
has
both
links
to
a
very
large
number
of
other
data
link
sites, and a page of data sets compiled or created from other data
sets. Many of the data sets listed on this project site are public
domain. All of the data are free to use. This UN site http://data.un.org/
has data on nearly every topic, from the UN and it's various
associates. Recently, the UN also posted a note saying that all of
their data are free to use, copy, duplicate, etc, provide the UN
is cited http://data.un.org/Host.aspx?Content=UNdataUse
The Worldbank also
has a data page http://data.worldbank.org/
The WorldBank also says most of their data can be used freely http://go.worldbank.org/OJC02YMLA0.
This UN site http://unstats.un.org/unsd/methods/inter-natlinks/sd_natstat.asp
and this BLS site http://www.bls.gov/bls/other.htm
link to national statistical centers of most countries of the
world.
There are many statistical journals on the web with free content. Many of these are listed at the Directory of Open Access Journals (DOAJ) http://doaj.org/ . Some of the journals listed here include the Electronic Journal of Applied Statistical Analysis http://siba-ese.unisalento.it/index.php/ejasa/index , the Electronic Journal of Statistics http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ejs the Journal of Official Statistics http://www.jos.nu/ and InterStat http://interstat.statjournals.net/
click here to return to top
There are resources about dozens of specific topics on the
web. Some of these topics include epidemiology, graphical
analysis and presentation, missing data, forecasting, gathering
data and meta-analysis.
Bayesian inference is a topic that should interest all
statisticians. Basically, the traditional approach (called the
"frequentist") is to use previously accumulated data to design a
study and propose a hypothesis. Then a test is used to draw a
conclusion. In contrast, Bayesian inference is a formal process to
learn from evidence as it accumulates. "The Bayesian approach uses
Bayes’ Theorem to formally combine prior information with current
information on a quantity of interest. The Bayesian idea is to
consider the prior information and the test results as part of a
continual data stream, in which inferences are being updated each
time new data become available." (Guidance for the Use of Bayesian
Statistics in Medical Device Clinical Trials http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm)
There is much more to Bayesian inference, like that it makes much
more explicit the examination of underlying assumptions. More
resources include The International Society for Bayesian Analysis
(ISBA) http://bayesian.org/,
Bayesian perspectives for epidemiological research:
I. Foundations and basic methods http://ije.oxfordjournals.org/content/35/3/765.full
by Sander Greenland, and Bayesian Statistics in
Oncology http://onlinelibrary.wiley.com/doi/10.1002/cncr.24628/full,
by Adamina, Tomlinson and Guller. There is also an interesting
video, Bayesian statistics made (as) simple (as
possible) http://www.youtube.com/watch?v=bobeo5kFz1g
from Allen Downey, Professor of Computer Science at the Franklin
W. Olin College of Engineering.
Missing Data: One site that gives an overview of
missing data page is the FAQ page of this missing data project http://www.missingdata.org.uk/
A couple of publications devoted to missing data are the June 2011
issue of Journal of Official Statistics http://www.jos.nu/Contents/issue.asp?vol=27&no=2
and the National Academies Press 2013 report, Nonresponse in
Social Science Surveys: A Research Agenda http://www.nap.edu/catalog/18293/nonresponse-in-social-science-surveys-a-research-agenda
One method frequently used is multiple imputation which fills in
missing data by using other variables to predict the missing
values. One software program for estimating missing data is
AMELIA, at http://gking.harvard.edu/software/
Forecasting: Several faculty members have lectures about
forecasting. Hossein Arsham's Time Series Analysis and
Forecasting Techniques, at http://home.ubalt.edu/ntsbarsh/Business-stat/stat-data/Forecast.htm
and Robert Nau's Lecture Notes on Forecasting http://people.duke.edu/~rnau/411home.htm
Also, another forecasting site is the Federal
Forecasters Consortium, at http://www.va.gov/HEALTHPOLICYPLANNING/FFC_2014.asp
Conference proceedings can be downloaded from this site.
Methods of gathering data: There are a number of sites on gathering data. Two places to start are Resources for Methods in Evaluation and Social Research, at http://gsociology.icaap.org/methods/ and The World Wide Evaluation Information Gateway http://www.policy-evaluation.org/ These site are link to other sites about methods, quantitative and qualitative. Some sites are about specific tools in data gathering. Tom O'Connor's lecture notes, at http://www.drtomoconnor.com/3760/default.htm covers various issues such as measurement, validity and reliability, and scales in indexes.
Meta-analysis: A brief overview is here, at Study
Design 101 http://himmelfarb.gwu.edu/tutorials/studydesign101/metaanalyses.html
One of the lectures for this methods in epidemiology is
about meta analysis http://www.uic.edu/classes/epid/epid401/
A somewhat strict standard is described by Cleophas and
Zwinderman http://circ.ahajournals.org/content/115/22/2870.full
This article by Kattan reviews the strengths and weaknesses of
meta analysis http://www.ccjm.org/content/75/6/431.full.
Propensity score analysis http://www.epa.gov/caddis/da_advanced_5.html
.
Propensity
score
analysis
is
a
method
of
dealing
with
self
selection
bias
or other selection bias. The methodology center at Penn
State has a podcast about propensity score analysis http://methodology.psu.edu/multimedia/podcast
Other papers include this by Nicholas and Gulliford http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553527/
and another by Austin http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/
Paul Rosenbaum, the father of propensity score analysis, has a
couple of overviews, http://www-stat.wharton.upenn.edu/~rosenbap/index.html,
one about propensity scores, and another about observational
studies.
Data mining. Statsoft has an entry on data mining http://www.statsoft.com/textbook/data-mining-techniques/
Professors Anand Rajaraman and Jeffrey D. Ullman have a book,
Mining of Massive Datasets, http://infolab.stanford.edu/~ullman/
which includes a chapter on data mining. This youtube video
http://www.youtube.com/watch?v=R-sGvh6tI04
from NJIT School of Management professor Stephan P Kudyba,
introduces data mining
last updated 8/30/15
last verified 8/30/2015
click here
to see who we are or to contact us