Free Statistical Tools on the WEB

click here to return to stats page
click here to return to reports page
click here to return to main page

A short version of this article first appeared in the International Statistical Institute newsletter, Vol 26, Number 1 (76),  2002, and is at   http://isi.cbs.nl/NLet/NLet021-04.htm   and    http://isi.cbs.nl/FreeTools.htm 

There is a great deal of information about statistics available for free on the web.  Information includes data, general statistical textbooks, email lists, software, and many sites about special topics, such as epidemiology, forecasting, data presentation, data editing, multiple imputation, and propensity score analysis.  This article is a brief review of some useful sites covering these topics.

One place to start is to look at sites that are general links to other statistical sites. General sites are Betty Jung's statsites  http://www.bettycjung.net/Statsites.htm, statsci   http://www.statsci.org/index.html,  the World Wide Web Virtual Library: Statistics  http://www.stat.ufl.edu/vlib/statistics.html, and Dr. Hossein Arsham's list  http://home.ubalt.edu/ntsbarsh/Business-stat/R.htm.   These sites can be used to find other sites.

At present, it is interesting to note that a number of statistical organizations are making concerted efforts to promote statistics, by increasing public awareness of how statistics impacts on everyday life. A current example is the World of Statistics  http://www.worldofstatistics.org/  This project is comprised of several thousand statistical organizations, to promote statistics in public awareness and as a profession. Similar projects are: the American Statistical Association's (ASA), Statistical Significance series, http://www.amstat.org/policy/statsig.cfm, which is a set of pamphlets each dedicated to showing how statistics informs some particular area, such as energy, health care and the environment; and the International Statistical Literacy Project  http://iase-web.org/islp/  with a mission "to contribute to promoting statistical literacy across the world, among young and adults."  A kind of related project is stats.org   http://www.stats.org/   from George Mason University. This project describes basic statistical terms but the main focus seems to be discussing news stories and how to understand the statistics in those news stories. The Australian Bureau of Statistics  http://www.abs.gov.au/websitedbs/a3121120.nsf/home/Understanding%20statistics  has an on line class and a page defining statistical terms.   

For students or those who want to learn about statistics, the best place to start is at various on-line statistics books. One is HyperStatistics Online, at http://davidmlane.com/hyperstat/.  This is a a nice statistics book, and it is a comprehensive list of other on line statistics books.  Most of these are basic to intermediate.  One book, the Statsoft text,  http://www.statsoft.com/textbook/   has the basics as well as fairly advanced topics.  Another approach is a site is Robert Niles' site Statistics Every Writer Should Know   http://www.robertniles.com/stats/   with plain English explanations for many basic statistical concepts.  Another list of online statistics books is here  http://gsociology.icaap.org/methods/stat.htm 

People can also take free on line training classes on statistics, for example, from the North Carolina Center for Public Health Preparedness Training Web Site, http://nciph.sph.unc.edu/tws/index.php or the Northwest Center for Public Health Practice  http://www.nwcphp.org/training  These classes offers a certificate at the end of the training.  StatTrek   http://stattrek.com/   also has a couple of on line tutorials.  An open course from Carnegie Mellon http://oli.cmu.edu/learn-with-oli/see-our-free-open-courses/  is basically presenting material used in the course taught at the University. A number of organizations of on line classes have statistics classes and videos. Coursera  https://www.coursera.org/ is one example, but unfortunately you have to search. One class is Statistics One  https://www.coursera.org/course/stats1   Others inc lude EdX  https://www.edx.org/course-list/allschools/statistics-data-analysis/allcourses  Kahn Academy  https://www.khanacademy.org/math/probability  and School of Data  http://schoolofdata.org/ 

On the other side of learning statistics, a couple of sites are about teaching statistics. The American Statistical Association has an on line journal, the Journal of Statistical Education, at http://www.amstat.org/publications/jse/  which has free articles about teaching statistics.  The  Consortium for the Advancement of Undergraduate Statistics Education at  http://www.causeweb.org/   is also about teaching statistics and has a great many links to texts, notes, journals, data sets, etc, in particular in the resources section.  The Web Interface for Statistics Education (WISE)  http://wise.cgu.edu/  provides internet resources for teaching statistics.

Other statistical sites of interest on the the web statistical associations.  An international association is the International Statistical Institute  http://isi-web.org/   . Some other associations are the American Statistical Association   http://www.amstat.org/   the International Chinese Statistical Association   http://www.icsa.org/   and the International Indian Statistical Association  http://www.intindstat.org/ . Statsci has a list of associations  http://www.statsci.org/soc.html  as does the International Statistical Institute http://www.isi-web.org/statistical-societies .

There is also tons of free software on the net.  The best place to find free statistical software is the Free Statistical Software site at http://statpages.org/javasta2.html.  This site lists general purpose software, as well as software devoted to specific purposes, such as curve fitting, epidemiology, surveys, and programming.  There are also brief descriptions of each package. We also list software packages on our page   http://gsociology.icaap.org/methods/soft.html  along with a list of other sites that list free statistical packages.  As far as we can tell, our site is the only one that compares the statistical output from many of the different programs, and we report that all of the programs that we reviewed gave the same results. That's good. One great site about learning how to use statistical software is the Statistical Computing site, at http://www.ats.ucla.edu/stat/default.htm.  They have a large number of links, how to's and other material, mostly for commercial packages.  One review of free statistical software is here   http://en.citizendium.org/wiki/Free_statistical_software   which  briefly describes the history, quality, functions and limits of a number of free packages. Wikipedia practically copied this citizendium page.

One page reviews which statistical software packages are most popular  http://r4stats.com/articles/popularity/  A variety of measures are used, including job advertisements, scholarly articles mentioning which software was used for analysis, website popularity (number of sites linking to that software), blogs and discussion forum activity, among other measures. Most of the programs are commercial, but R, http://cran.r-project.org/ a free program, ranks high on many of the indicators. R is a language for statistical computing, but has a number of graphical interfaces to make it easier to use, as well as many tutorials and guides.

There are a number of email lists.  Allstat, at   https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=allstat   is a general list, although a great deal of the postings appear to be postings about jobs or training courses.    Another list stat-l, at   http://lists.mcgill.ca/archives/stat-l.html  focuses more on statistical questions. Another useful list, not on Allstat, is Epidemio, at  http://www.listes.umontreal.ca/wws/info/epidemio-l   This list is about epidemiology. Another form of discussion group is the forum. TalkStats   http://www.talkstats.com/   is one forum, with discussions about basic to advanced, homework to theory. 

There are a number of comprehensive places to look for data.  One starting point for social, political and economic data is the Global Social Change Research Project  http://gsociology.icaap.org/,  which has both links to a very large number of other data link sites, and a page of data sets compiled or created from other data sets. Many of the data sets listed on this project site are public domain.  All of the data are free to use.  This UN site   http://data.un.org/   has data on nearly every topic, from the UN and it's various associates. Recently, the UN also posted a note saying that all of their data are free to use, copy, duplicate, etc, provide the UN is cited http://data.un.org/Host.aspx?Content=UNdataUse   The Worldbank also has a data page   http://data.worldbank.org/  The WorldBank also says most of their data can be used freely http://go.worldbank.org/OJC02YMLA0.  This UN site  http://unstats.un.org/unsd/methods/inter-natlinks/sd_natstat.asp  and this BLS site  http://www.bls.gov/bls/other.htm  link to national statistical centers of most countries of the world.

There are many statistical journals on the web with free content. Many of these are listed at the Directory of Open Access Journals (DOAJ)  http://doaj.org/ .  Some of the journals listed here include the  Electronic Journal of Applied Statistical Analysis  http://siba-ese.unisalento.it/index.php/ejasa/index , the Electronic Journal of Statistics  http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ejs   the Journal of Official Statistics  http://www.jos.nu/   and InterStat  http://interstat.statjournals.net/  


click here to return to top
 

There are resources about dozens of specific topics on the web.  Some of these topics include epidemiology, graphical analysis and presentation, missing data, forecasting, gathering data and meta-analysis.

Bayesian inference is a topic that should interest all statisticians. Basically, the traditional approach (called the "frequentist") is to use previously accumulated data to design a study and propose a hypothesis. Then a test is used to draw a conclusion. In contrast, Bayesian inference is a formal process to learn from evidence as it accumulates. "The Bayesian approach uses Bayes’ Theorem to formally combine prior information with current information on a quantity of interest. The Bayesian idea is to consider the prior information and the test results as part of a continual data stream, in which inferences are being updated each time new data become available." (Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials   http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm)  There is much more to Bayesian inference, like that it makes much more explicit the examination of underlying assumptions. More resources include The International Society for Bayesian Analysis (ISBA)   http://bayesian.org/,    Bayesian perspectives for epidemiological research: I. Foundations and basic methods   http://ije.oxfordjournals.org/content/35/3/765.full   by Sander Greenland, and Bayesian Statistics in Oncology   http://onlinelibrary.wiley.com/doi/10.1002/cncr.24628/full,  by Adamina, Tomlinson and Guller. There is also an interesting video, Bayesian statistics made (as) simple (as possible)   http://www.youtube.com/watch?v=bobeo5kFz1g   from Allen Downey, Professor of Computer Science at the Franklin W. Olin College of Engineering.

Epidemiology: The two best places to start for epidemiology are EpiMonitor,   http://www.epimonitor.net/index.htm, which has a very comprehensive list of links and the WWW Virtual Library: Epidemiology  http://www.epibiostat.ucsf.edu/epidem/epidem.html  another gateway.  Another very good place to start is epidemiolog, at http://www.epidemiolog.net/.  This site also has a fairly comprehensive listing of epidemiology sites, as well as an on-line textbook. First time visitors should start at  http://www.epidemiolog.net/evolving/Epiville  http://epiville.ccnmtl.columbia.edu/  is another on line learning site, with learning modules for people to learn epidemiology. Another free on-line textbook is Epidemiology for the Uninitiated, at  http://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated   
     A very good place to find world epidemiological data, reports, issues and information is from WHO   http://www.who.int/topics/epidemiology/en/   which includes for example the 10 leading causes of death, and the  Weekly Epidemiological Record.

Presenting Results: After analyzing data, it is very helpful to know how to best present the results.  Very good sites are:  Communicating Statistics   https://gss.civilservice.gov.uk/statistics/presentation-and-dissemination/Improving Data Visualization  http://www.improving-visualisation.org/BTS’s Guide to Good Statistical Practice  section on presenting results, at   http://www.bts.gov/publications/guide_to_good_statistical_practice_in_the_transportation_field/index.html, Stat Canada's Statistics book, chapter on presenting graphs  http://www.statcan.gc.ca/edu/power-pouvoir/ch9/5214821-eng.htm  and UNECEs Making Data Meaningful series http://www.unece.org/stats/documents/writing/.   For some interesting good and bad examples, see Michael Friendly's Gallery of Data Visualization, at   http://euclid.psych.yorku.ca/datavis.ca/   More recently, there are sites showing moving charts, like Gapminder   http://www.gapminder.org/   or mapping international data like Show   http://show.mappingworlds.com/world/  

Missing Data:  One site that gives an overview of missing data page is the FAQ page of this missing data project http://www.missingdata.org.uk/   A couple of publications devoted to missing data are the June 2011 issue of Journal of Official Statistics  http://www.jos.nu/Contents/issue.asp?vol=27&no=2  and the National Academies Press 2013 report, Nonresponse in Social Science Surveys: A Research Agenda  http://www.nap.edu/catalog/18293/nonresponse-in-social-science-surveys-a-research-agenda  One method frequently used is multiple imputation which fills in missing data by using other variables to predict the missing values.  One software program for estimating missing data is AMELIA, at   http://gking.harvard.edu/software/  

Forecasting: Several faculty members have lectures about forecasting. Hossein Arsham's Time Series Analysis and Forecasting Techniques, at  http://home.ubalt.edu/ntsbarsh/Business-stat/stat-data/Forecast.htm  and  Robert Nau's Lecture Notes on Forecasting   http://people.duke.edu/~rnau/411home.htm   Also, another forecasting site is the Federal Forecasters Consortium, at  http://www.va.gov/HEALTHPOLICYPLANNING/FFC_2014.asp  Conference proceedings can be downloaded from this site.

Methods of gathering data:  There are a number of sites on gathering data.  Two places to start are Resources for Methods in Evaluation and Social Research, at   http://gsociology.icaap.org/methods/  and The World Wide Evaluation Information Gateway   http://www.policy-evaluation.org/    These site are link to other sites about methods, quantitative and qualitative.  Some sites are about specific tools in data gathering.  Tom O'Connor's lecture notes, at  http://www.drtomoconnor.com/3760/default.htm  covers various issues such as measurement, validity and reliability, and scales in indexes.

Meta-analysis:  A brief overview is here, at Study Design 101  http://himmelfarb.gwu.edu/tutorials/studydesign101/metaanalyses.html   One of the lectures for this methods in epidemiology is about meta analysis   http://www.uic.edu/classes/epid/epid401/   A somewhat strict standard is described by Cleophas and Zwinderman  http://circ.ahajournals.org/content/115/22/2870.full   This article by Kattan reviews the strengths and weaknesses of meta analysis  http://www.ccjm.org/content/75/6/431.full.    

Propensity score analysis  http://www.epa.gov/caddis/da_advanced_5.html  .  Propensity score analysis is a method of dealing with self selection bias or other selection bias.  The methodology center at Penn State has a podcast about propensity score analysis  http://methodology.psu.edu/multimedia/podcast   Other papers include this by Nicholas and Gulliford http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553527/  and another by Austin  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/   Paul Rosenbaum, the father of propensity score analysis, has a couple of overviews, http://www-stat.wharton.upenn.edu/~rosenbap/index.html, one about propensity scores, and another about observational studies.

Data mining. Statsoft has an entry on data mining  http://www.statsoft.com/textbook/data-mining-techniques/  Professors Anand Rajaraman and Jeffrey D. Ullman have a book, Mining of Massive Datasets,  http://infolab.stanford.edu/~ullman/  which includes a chapter on data mining. This youtube video  http://www.youtube.com/watch?v=R-sGvh6tI04 from NJIT School of Management professor Stephan P Kudyba, introduces data mining

I don't necessarily endorse any of the sites listed here, and do not assume responsibility for content of the web sites listed in this article. This article is solely presented for educational purposes. click here to return to stats page

click here to return to reports page
click here to return to main page
click here to return to top
 

last updated 8/30/15
last verified  8/30/2015
click here to see who we are or to contact us