Free Statistical Tools on the WEB

click here to return to stats page
click here to return to reports page
click here to return to main page

A short version of this article first appeared in the International Statistical Institute newsletter, Vol 26, Number 1 (76),  2002, and is at   and 

There is a great deal of information about statistics available for free on the web.  Information includes data, general statistical textbooks, email lists, software, and many sites about special topics, such as epidemiology, forecasting, data presentation, data editing, multiple imputation, and propensity score analysis.  This article is a brief review of some useful sites covering these topics.

One place to start is to look at sites that are general links to other statistical sites. General sites are Betty Jung's statsites, statsci,  the World Wide Web Virtual Library: Statistics, and Dr. Hossein Arsham's list   These sites can be used to find other sites.

At present, it is interesting to note that a number of statistical organizations are making concerted efforts to promote statistics, by increasing public awareness of how statistics impacts on everyday life. A current example is the World of Statistics  This project is comprised of several thousand statistical organizations, to promote statistics in public awareness and as a profession. Similar projects are: the American Statistical Association's (ASA), Statistical Significance series,, which is a set of pamphlets each dedicated to showing how statistics informs some particular area, such as energy, health care and the environment; and the International Statistical Literacy Project  with a mission "to contribute to promoting statistical literacy across the world, among young and adults."  A kind of related project is   from George Mason University. This project describes basic statistical terms but the main focus seems to be discussing news stories and how to understand the statistics in those news stories. The Australian Bureau of Statistics  has an on line class and a page defining statistical terms.   

For students or those who want to learn about statistics, the best place to start is at various on-line statistics books. One is HyperStatistics Online, at  This is a a nice statistics book, and it is a comprehensive list of other on line statistics books.  Most of these are basic to intermediate.  One book, the Statsoft text,   has the basics as well as fairly advanced topics.  Another approach is a site is Robert Niles' site Statistics Every Writer Should Know   with plain English explanations for many basic statistical concepts.  Another list of online statistics books is here 

People can also take free on line training classes on statistics, for example, from the North Carolina Center for Public Health Preparedness Training Web Site, or the Northwest Center for Public Health Practice  These classes offers a certificate at the end of the training.  StatTrek   also has a couple of on line tutorials.  An open course from Carnegie Mellon  is basically presenting material used in the course taught at the University. A number of organizations of on line classes have statistics classes and videos. Coursera is one example, but unfortunately you have to search. One class is Statistics One   Others inc lude EdX  Kahn Academy  and School of Data 

On the other side of learning statistics, a couple of sites are about teaching statistics. The American Statistical Association has an on line journal, the Journal of Statistical Education, at  which has free articles about teaching statistics.  The  Consortium for the Advancement of Undergraduate Statistics Education at   is also about teaching statistics and has a great many links to texts, notes, journals, data sets, etc, in particular in the resources section.  The Web Interface for Statistics Education (WISE)  provides internet resources for teaching statistics.

Other statistical sites of interest on the the web statistical associations.  An international association is the International Statistical Institute   . Some other associations are the American Statistical Association   the International Chinese Statistical Association   and the International Indian Statistical Association . Statsci has a list of associations  as does the International Statistical Institute .

There is also tons of free software on the net.  The best place to find free statistical software is the Free Statistical Software site at  This site lists general purpose software, as well as software devoted to specific purposes, such as curve fitting, epidemiology, surveys, and programming.  There are also brief descriptions of each package. We also list software packages on our page  along with a list of other sites that list free statistical packages.  As far as we can tell, our site is the only one that compares the statistical output from many of the different programs, and we report that all of the programs that we reviewed gave the same results. That's good. One great site about learning how to use statistical software is the Statistical Computing site, at  They have a large number of links, how to's and other material, mostly for commercial packages.  One review of free statistical software is here   which  briefly describes the history, quality, functions and limits of a number of free packages. Wikipedia practically copied this citizendium page.

One page reviews which statistical software packages are most popular  A variety of measures are used, including job advertisements, scholarly articles mentioning which software was used for analysis, website popularity (number of sites linking to that software), blogs and discussion forum activity, among other measures. Most of the programs are commercial, but R, a free program, ranks high on many of the indicators. R is a language for statistical computing, but has a number of graphical interfaces to make it easier to use, as well as many tutorials and guides.

There are a number of email lists.  Allstat, at   is a general list, although a great deal of the postings appear to be postings about jobs or training courses.    Another list stat-l, at  focuses more on statistical questions. Another useful list, not on Allstat, is Epidemio, at   This list is about epidemiology. Another form of discussion group is the forum. TalkStats   is one forum, with discussions about basic to advanced, homework to theory. 

There are a number of comprehensive places to look for data.  One starting point for social, political and economic data is the Global Social Change Research Project,  which has both links to a very large number of other data link sites, and a page of data sets compiled or created from other data sets. Many of the data sets listed on this project site are public domain.  All of the data are free to use.  This UN site   has data on nearly every topic, from the UN and it's various associates. Recently, the UN also posted a note saying that all of their data are free to use, copy, duplicate, etc, provide the UN is cited   The Worldbank also has a data page  The WorldBank also says most of their data can be used freely  This UN site  and this BLS site  link to national statistical centers of most countries of the world.

There are many statistical journals on the web with free content. Many of these are listed at the Directory of Open Access Journals (DOAJ) .  Some of the journals listed here include the  Electronic Journal of Applied Statistical Analysis , the Electronic Journal of Statistics   the Journal of Official Statistics   and InterStat  

click here to return to top

There are resources about dozens of specific topics on the web.  Some of these topics include epidemiology, graphical analysis and presentation, missing data, forecasting, gathering data and meta-analysis.

Bayesian inference is a topic that should interest all statisticians. Basically, the traditional approach (called the "frequentist") is to use previously accumulated data to design a study and propose a hypothesis. Then a test is used to draw a conclusion. In contrast, Bayesian inference is a formal process to learn from evidence as it accumulates. "The Bayesian approach uses Bayes’ Theorem to formally combine prior information with current information on a quantity of interest. The Bayesian idea is to consider the prior information and the test results as part of a continual data stream, in which inferences are being updated each time new data become available." (Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials  There is much more to Bayesian inference, like that it makes much more explicit the examination of underlying assumptions. More resources include The International Society for Bayesian Analysis (ISBA),    Bayesian perspectives for epidemiological research: I. Foundations and basic methods   by Sander Greenland, and Bayesian Statistics in Oncology,  by Adamina, Tomlinson and Guller. There is also an interesting video, Bayesian statistics made (as) simple (as possible)   from Allen Downey, Professor of Computer Science at the Franklin W. Olin College of Engineering.

Epidemiology: The two best places to start for epidemiology are EpiMonitor,, which has a very comprehensive list of links and the WWW Virtual Library: Epidemiology  another gateway.  Another very good place to start is epidemiolog, at  This site also has a fairly comprehensive listing of epidemiology sites, as well as an on-line textbook. First time visitors should start at  is another on line learning site, with learning modules for people to learn epidemiology. Another free on-line textbook is Epidemiology for the Uninitiated, at   
     A very good place to find world epidemiological data, reports, issues and information is from WHO   which includes for example the 10 leading causes of death, and the  Weekly Epidemiological Record.

Presenting Results: After analyzing data, it is very helpful to know how to best present the results.  Very good sites are:  Communicating Statistics Data Visualization’s Guide to Good Statistical Practice  section on presenting results, at, Stat Canada's Statistics book, chapter on presenting graphs  and UNECEs Making Data Meaningful series   For some interesting good and bad examples, see Michael Friendly's Gallery of Data Visualization, at   More recently, there are sites showing moving charts, like Gapminder   or mapping international data like Show  

Missing Data:  One site that gives an overview of missing data page is the FAQ page of this missing data project   A couple of publications devoted to missing data are the June 2011 issue of Journal of Official Statistics  and the National Academies Press 2013 report, Nonresponse in Social Science Surveys: A Research Agenda  One method frequently used is multiple imputation which fills in missing data by using other variables to predict the missing values.  One software program for estimating missing data is AMELIA, at  

Forecasting: Several faculty members have lectures about forecasting. Hossein Arsham's Time Series Analysis and Forecasting Techniques, at  and  Robert Nau's Lecture Notes on Forecasting   Also, another forecasting site is the Federal Forecasters Consortium, at  Conference proceedings can be downloaded from this site.

Methods of gathering data:  There are a number of sites on gathering data.  Two places to start are Resources for Methods in Evaluation and Social Research, at  and The World Wide Evaluation Information Gateway    These site are link to other sites about methods, quantitative and qualitative.  Some sites are about specific tools in data gathering.  Tom O'Connor's lecture notes, at  covers various issues such as measurement, validity and reliability, and scales in indexes.

Meta-analysis:  A brief overview is here, at Study Design 101   One of the lectures for this methods in epidemiology is about meta analysis   A somewhat strict standard is described by Cleophas and Zwinderman   This article by Kattan reviews the strengths and weaknesses of meta analysis    

Propensity score analysis  .  Propensity score analysis is a method of dealing with self selection bias or other selection bias.  The methodology center at Penn State has a podcast about propensity score analysis   Other papers include this by Nicholas and Gulliford  and another by Austin   Paul Rosenbaum, the father of propensity score analysis, has a couple of overviews,, one about propensity scores, and another about observational studies.

Data mining. Statsoft has an entry on data mining  Professors Anand Rajaraman and Jeffrey D. Ullman have a book, Mining of Massive Datasets,  which includes a chapter on data mining. This youtube video from NJIT School of Management professor Stephan P Kudyba, introduces data mining

I don't necessarily endorse any of the sites listed here, and do not assume responsibility for content of the web sites listed in this article. This article is solely presented for educational purposes. click here to return to stats page

click here to return to reports page
click here to return to main page
click here to return to top

last updated 8/30/15
last verified  8/30/2015
click here to see who we are or to contact us