Academic Publications



Statistical Perspectives and Recommendations

Altman, M., Gill, J., & McDonald, M. P. (2004). Sources of inaccuracy in statistical computation. Numerical Issues in Statistical Computing for the Social Scientist, 12-43.

Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66(6), 423.

Barto, E. K., & Rillig, M. C. (2012). Dissemination biases in ecology: effect sizes matter more than quality. Oikos, 121(2), 228-235.

Basch, C. E., Sliepcevich, E. M., Gold, R. S., Duncan, D. F., & Kolbe, L. J. (1985). Avoiding type III errors in health education program evaluations: a case study. Health Education & Behavior, 12(3), 315-331.

Begley, C. G., & Ioannidis, J. P. (2015). Reproducibility in science: Improving the standard for basic and preclinical research. Circulation Research, 116(1), 116-126.

Berk, R., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2012). Valid post-selection inference. Annals of Statistics, 41(2), 802-837.

Bernau, C., Riester, M., Boulesteix, A. L., Parmigiani, G., Huttenhower, C., Waldron, L., & Trippa, L. (2014). Cross-study validation for the assessment of prediction algorithms. Bioinformatics, 30(12), i105-i112.

Berry, D. (2012). Multiplicities in cancer research: Ubiquitous and necessary evils. Journal of the National Cancer Institute, 104(15), 1125-1133.

Betz, M. A., & Gabriel, K. R. (1978). Type IV errors and analysis of simple effects. Journal of Educational and Behavioral Statistics, 3(2), 121-143.

Bland, J. M. (2009). The tyranny of power: is there a better way to calculate sample size?. BMJ, 339.

Bofinger, E. (1985). Multiple comparisons and type iii errors. Journal of the American Statistical Association, 80(390), 433-437.

Bolik, R.J. (1979). Interactions, partial interactions, and interaction contrasts in the analysis of variance. Psychological Bulletin, 86(5), 1084-1089.

Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Confidence and precision increase with high statistical power. Nature Reviews Neuroscience, 14(8), 585-585.

Carmer, S. G., & Swanson, M. R. (1973). An evaluation of ten pairwise multiple comparison procedures by Monte Carlo methods. Journal of the American Statistical Association, 68(341), 66-74.

Carver, R.J. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378-399.

Chan, A. W., Hróbjartsson, A., Haahr, M. T., Gøtzsche, P. C., & Altman, D. G. (2004). Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. Jama, 291(20), 2457-2465.

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312.

Cohen, J. (1995). The earth is round (p<. 05). American Psychologist, 49, 997-1003.

Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science, 1(3), 140216.

Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3(4), 286-300.

Cumming, G. (2014). The new statistics: How and why. Psychological Science, 25, 7-29.

De Long, J. B., & Lang, K. (1992). Are all economic hypotheses false?. Journal of Political Economy, 1257-1272.

Doerfler, L. A., & Chaplin, W. F. (1985). Type III error in research on interpersonal models of depression. Journal of Abnormal Psychology, 94(2), 227.

Dunn, W. N. (2001). Using the method of context validation to mitigate Type III errors in environmental policy analysis. Hisschemoller, M., Hoppe, R., Ravetz, J.R. (Ed.). New Brunswick, New Jersey: Transaction Publishers. Knowledge, Power and Participation in Environmental Policy Analysis, 417-436.

Djulbegovic, B., Hozo, I., & Ioannidis, J. P. (2014). Improving the drug development process: more not less randomized trials. The Journal fo the American Medical Association, 311(4), 355-356.

Dobson, D., & Cook, T. J. (1980). Avoiding type III error in program evaluation: Results from a field experiment. Evaluation and Program Planning, 3(4), 269-276.

Donoho, D., & Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics, 962-994.

Doshi, P., Goodman, S. N., & Ioannidis, J. P. (2013). Raw data from clinical trials: within reach?. Trends in Pharmacological Sciences, 34(12), 645-647.

Dwan, K., Altman, D. G., Arnaiz, J. A., Bloom, J., Chan, A. W., Cronin, E., … & Williamson, P. R. (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PloS One, 3(8), e3081.

Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 201602413.

Games, P. A. (1978). Nesting, crossing, type IV errors, and the role of statistical models. American Educational Research Journal, 15(2), 253-258.

Gelman, A. (2013). Commentary: P values and statistical practice. Epidemiology, 24(1), 69-72.

Gelman, A. (2013). Ethics and statistics: It’s too hard to publish criticisms and obtain data for republication. Chance, 26(3), 49-52.

Gelman, A. (2014). The connection between varying treatment effects and the crisis of unreplicable research a Bayesian perspective. Journal of Management, 41(2), 632-643.

Goodman, S. N., Altman, D. G., & George, S. L. (1998). Statistical reviewing policies of medical journals. Journal of General Internal Medicine, 13(11), 753-756.

Gurusamy, K. S., Gluud, C., Nikolova, D., & Davidson, B. R. (2009). Assessment of risk of bias in randomized clinical trials in surgery. British Journal of Surgery, 96(4), 342-349.

Falk, R., & Greenbaum, C. W. (1995). Significance Tests Die Hard The Amazing Persistence of a Probabilistic Misconception. Theory & Psychology, 5(1), 75-98.

Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences, 109(42), 17028-17033.

Fiedler, K. (2009). Voodoo correlations are everywhere – not only in neuroscience. Perspectives on Psychological Science, 6(2), 163-171.

Fiedler, K., Kutzner, F., & Krueger, J. I. (2012). The long way from α-error control to validity proper problems with a short-sighted false-positive debate. Perspectives on Psychological Science, 7(6), 661-669.

Finner, H. (1999). Stepwise multiple test procedures and control of directional errors. The Annals of Statistics, 27, 274-289.

Games, P. A. (1973). Type IV errors revisited. Psychological Bulletin, Vol 80(4), Oct 1973, 304-307.

Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460.

Gillett, R. (1994). Post hoc power analysis. Journal of Applied Psychology, 79, 783-785.

Goodman, S. N. (1992). A comment on replication, p‐values and evidence. Statistics in Medicine, 11(7), 875-879.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions. Seminars in Hematology (Vol. 48, No.43, pp. 135-140).

Greenland, S. (2008, July). Bayesian interpretation and analysis of research results. Seminars in Hematology (Vol. 45, No. 3, pp. 141-149). WB Saunders.

Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82(1), 1-20.

Greenwald, A., Gonzalez, R., Harris, R., & Guthrie, D. (1996). Effect sizes and p values: what should be reported and what should be replicated?. Psychophysiology, 33(2), 175-183.

Gresham, F. M. (1993). Social skills and learning disabilities as a type III error: Rejoinder to Conte and Andrews. Journal of Learning Disabilities, 26(3), 154-158.

Harmon‐Jones, E., Amodio, D. M., & Harmon‐Jones, C. (2009). Action‐based model of dissonance: A review, integration, and expansion of conceptions of cognitive conflict. Advances in Experimental Social Psychology, 41, 119-166.

Heller, R., & Yekutieli, D. (2014). Replicability analysis for genome-wide association studies. The Annals of Applied Statistics, 8(1), 481-498.

Hozo, I., Schell, M. J., & Djulbegovic, B. (2008, July). Decision-making when data and inferences are not conclusive: risk-benefit and acceptable regret approach. Seminars in Hematology (Vol. 45, No. 3, pp. 150-159). WB Saunders.

Hung, J.H.M., O’Neil, R.T., Bauer, P., & Kohne, K. (1997). The behavior of the p-value when the alternative hypothesis is true. Biometrics, 53, 11-22.

Imai, K. (2005). Do get-out-the-vote calls reduce turnout? The importance of statistical methods for field experiments. American Political Science Review,99(02), 283-300.

IntHout, J., Ioannidis, J. P., & Borm, G. F. (2014). The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Medical Research Methodology, 14(1), 25.

Ioannidis, J. P., Hozo, I., & Djulbegovic, B. (2013). Optimal type I and type II error pairs when the available sample size is fixed. Journal of Clinical Epidemiology, 66(8), 903-910.

Ioannidis, J. P., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4(3), 245-253.

Ioannidis, J. P. (2013). Clarifications on the application and interpretation of the test for excess significance and its extensions. Journal of Mathematical Psychology, 57(5), 184-187.

Ioannidis, J. P. (2005). Contradicted and initially stronger effects in highly cited clinical research. JAMA, 294(2), 218-228.

Ioannidis, J. P. (2008). Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized trials?. Philosophy, Ethics, and Humanities in Medicine, 3(1), 14.

Ioannidis, J. P. (2008). Effect of formal statistical significance on the credibility of observational associations. American Journal of Epidemiology, 168(4), 374-383.

Ioannidis, J. P. (2008, July). Interpretation of research results: an indispensable mission impossible?. Seminars in Hematology (Vol. 45, No. 3, pp. 133-134). WB Saunders.

Ioannidis, J. P. (2013). Meta-analyses of hydroxyethyl starch for volume resuscitation. JAMA, 309(21), 2209-2209.

Ioannidis, J. P. (2014). Research accomplishments that are too good to be true. Intensive Care Medicine, 40(1), 99-101.

Ioannidis, J. P. (2012). Scientific communication is down at the moment, please check again later. Psychological Inquiry, 23(3), 267-270.

Ioannidis, J. P. (2008). Why most discovered true associations are inflated. Epidemiology, 19(5), 640-648.

Jager, L. R., & Leek, J. T. (2014). An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics, 15(1), 1-12.

Judd, C.M., Westfall, J., & Kenny, D.A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103, 54-69.

Kavvoura, F. K., McQueen, M. B., Khoury, M. J., Tanzi, R. E., Bertram, L., & Ioannidis, J. P. (2008). Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer’s disease. American Journal of Epidemiology, 168(8), 855-865.

Kimball, A.W. (1957). Errors of the third kind in statistical consulting. Journal of the American Statistical Association, 52, 133-142.

Kline, R.B. (2013). Beyond significance testing: Statistical reform in the behavioral sciences (2nd ed.). American Psychological Association, Washington DC.

Kyzas, P. A., Denaxa-Kyza, D., & Ioannidis, J. P. (2007). Almost all articles on cancer prognostic markers report statistically significant results. European Journal of Cancer, 43(17), 2559-2579.

Leggett, N.C., Thomas, N.A., Loetscher, T., & Nicholls M.E.R. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66, 2303-2309.

Levin, J. R., & Marascuilo, L. A. (1972). Type IV errors and interactions. Psychological Bulletin, Vol 78(5), Nov 1972, 368-374.

Lenzer, J., Hoffman, J. R., Furberg, C. D., & Ioannidis, J. P. (2013). Ensuring the integrity of clinical practice guidelines: a tool for protecting patients. BMJ, 347.

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P., … & Moher, D. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Annals of Internal Medicine, 151(4), W-65.

Lu, T. H. (2001). International comparisons: they do help and are essential for avoiding type III error. Injury Prevention, 7(4), 270-271.

Luce, B. R., Kramer, J. M., Goodman, S. N., Connor, J. T., Tunis, S., Whicher, D., & Schwartz, J. S. (2009). Rethinking randomized clinical trials for comparative effectiveness research: the need for transformational change. Annals of Internal Medicine, 151(3), 206-209.

Lyons, R. (2011). The spread of evidence-poor medicine via flawed social-network analysis. Statistics, Politics, and Policy, 2(1).

Macdonald, P. (1999). Power, Type I, and Type III error rates of parametric and nonparametric statistical tests. The Journal of Experimental Education, 67(4), 367-379.

Macleod, M. R., Michie, S., Roberts, I., Dirnagl, U., Chalmers, I., Ioannidis, J. P., … & Glasziou, P. (2014). Biomedical research: increasing value, reducing waste. The Lancet, 383(9912), 101-104.

Marascuilo, L. A., & Levin, J. R. (1970). Appropriate post hoc comparisons for interaction and nested hypotheses in analysis of variance designs: The elimination of type IV errors. American Educational Research Journal, 397-421.

Masicampo, E.J. & Lalande, D.R. (2012). A peculiar prevalence of p values just below .05. The Quarterly Journal of Experimental Psychology, 65, 2271-2279.

McCullough, B. D., & McWilliams, T. P. (2010). Baseball players with the initial “K” do not strike out more often. Journal of Applied Statistics, 37(6), 881-891.

Meyer, D.I. (1991). Misinterpretation of interaction effects: A reply to Rosnow and Rosenthal. Psychological Bulletin, 110(3), 571-573.

Moonesinghe, R., Khoury, M. J., & Janssens, A. C. J. (2007). Most published research findings are false—but a little replication goes a long way. PLoS Medicine, 4(2), e28.

Motulsky, H. J. (in press). Common misconceptions about data analysis and statistics. British Journal of Pharmacology.

Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy. Psychological Methods, 5(2), 241.

Nuzzo, R. (2014). Statistical errors. Nature, 506(13), 150-152.

Patsopoulos, N. A., Analatos, A. A., & Ioannidis, J. P. (2005). Relative citation impact of various study designs in the health sciences. The Journal of the American Medical Association, 293(19), 2362-2366.

Rabbitt, P. M. (1966). Errors and error correction in choice-response tasks. Journal of Experimental Psychology, 71(2), 264.

Rezmovic, E. L. (1982). Program implementation and evaluation results: A reexamination of type III error in a field experiment. Evaluation and Program Planning, 5(2), 111-118.

Rekdal, O. B. (2014). Academic urban legends. Social Studies of Science, 44(4), 638-654.

Rozeboom, W.W. (1960). The fallacy of null hypothesis testing. Psychological Bulletin, 57, 416-428.

Salanti, G., Higgins, J. P., Ades, A. E., & Ioannidis, J. P. (2008). Evaluation of networks of randomized trials. Statistical Methods in Medical Research, 17(3), 279-301.

Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115.

Schneider, D., Tahk A., & Krosnick J. (2007). Reconsidering the impact of behavior prediction questions on illegal drug use: The importance of using proper analytic methods. Social Influence, 2(3), 178-196.

Shaffer, J. P. (2002). Multiplicity, directional (type III) errors, and the null hypothesis. Psychological Methods, 7(3), 356.

Simonsohn, U., Nelson, L.D., & Simmons, J.P. (2014). P-curve: A key to the file drawer. Journal of Experimental Psychology: General, 143, 534-547.

Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician, 49(1), 108-112.

Sterne, J. A., & Smith, G. D. (2001). Sifting the evidence—what’s wrong with significance tests?. Physical Therapy, 81(8), 1464-1469.

Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific misconduct and the myth of self-correction in science. Perspectives on Psychological Science,7(6), 670-688.

Trikalinos, N. A., Evangelou, E., & Ioannidis, J. P. (2008). Falsified papers in high-impact journals were slow to retract and indistinguishable from nonfraudulent papers. Journal of Clinical Epidemiology, 61(5), 464-470.

Tsilidis, K. K., Papatheodorou, S. I., Evangelou, E., & Ioannidis, J. P. (2012). Evaluation of excess statistical significance in meta-analyses of 98 biomarker associations with cancer risk. Journal of the National Cancer Institute, djs437

Tsilidis, K. K., Panagiotou, O. A., Sena, E. S., Aretouli, E., Evangelou, E., Howells, D. W., … & Ioannidis, J. P. (2013). Evaluation of excess significance bias in animal studies of neurological diseases. PLoS Biology, 11(7), e1001609.

Umesh, U.N., Peterson, R.A., McCann-Nelson, M., & Vaidyanathan, R. (1996). Type IV errors in marketing research: The investigation of ANOVA interactions. Journal of the Academy of Marketing Science, 24, 17-

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4, 274-290.

Wacholder, S., Chanock, S., Garcia-Closas, M., & Rothman, N. (2004). Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. Journal of the National Cancer Institute, 96(6), 434-442.

Wade, D. T. (2001). Research into the black box of rehabilitation: the risks of a Type III error. Clinical Rehabilitation, 15(1), 1-4.

Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PloS One, 6(11), e26828.

Yadav, S. B., & Korukonda, A. (1985). Management of type III error in problem identification. Interfaces, 15(4), 55-61.

Yarkoni, T. (2009). Big correlations in little studies: Inflated fMRI correlations reflect low statistical power—Commentary on Vul et al.(2009). Perspectives on Psychological Science, 4(3), 294-298.

Yekutieli, D. (2008). Hierarchical false discovery rate–controlling methodology.Journal of the American Statistical Association, 103(481), 309-316.

BPS invites readers to send (to relevant papers and links to add to this website.