[1] Paul Ammann, Marcio Eduardo Delamaro, and Jeff Offutt. 2014. Establishing theoretical minimal sets of mutants. In Proceedings of the International Conference (ICSM). 549-558. on Software Testing, Verification and Validation (ICST). 21-30. [2] J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. 2006. Using Muta tion Analysis for Assessing and Comparing Testing Coverage Criteria. IEEE Transactions on Software Engineering (TSE) 32, 8 (Aug 2006), 608-624. [3] Gilbert Becker. 1986. Correcting the point-biserial correlation for attenuation owing to unequal sample size. The Journal of Experimental Education 55, 1 (1986), Engineering (TSE) 29, 3 (2003), 195-209. 5-8. [4] Jennifer Black, Emanuel Melachrinoudis, and David Kaeli. 2004. Bi-criteria models analysis for Java. In Proceedings of the International Symposium on Software Testing for all-uses test suite reduction. In Proceedings of the International Conference on and Analysis (ISSTA). San Jose, CA, USA, 433-436. Software Engineering (ICSE). 106-115. [5] L. C. Briand and D. Pfahl. 2000. Using simulation for assessing the real impact of test-coverage on defect-coverage. IEEE Transactions on Reliability (TR) 49, 1 (2000), 60-70. [6] Xiang Chen, Lijiu Zhang, Qing Gu, Haigang Zhao, Ziyuan Wang, Xiaobing Sun, and Daoxu Chen. 2011. A test suite reduction approach based on pairwise interaction of requirements. In ACM Symposium on Applied Computing (SAC). 1390-1397. [7] Ying Cheng and Haiyan Liu. 2016. A short note on the maximal point-biserial correlation under non-normality. Brit. J. Math. Statist. Psych. 69, 3 (2016), 344-351. redundant mutation operators and test suite prioritization to achieve efficient [8] William S Cleveland. 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American statistical association 74, 368 (Dec. 1979), Software Reliability Engineering (ISSRE). 11-20. 829-836. [9] Jacob Cohen. 2013. Statistical Power Analysis for the Behavioral Sciences. Academic Developer-Provided to User-Provided Tests for Fault Localization and Automated Press. [10] Christoph Csallner and Yannis Smaragdakis. 2004. JCrasher: An Automatic Robustness Tester for Java. Software: Practice and Experience 34, 11 (Sept. 2004), [34] Bob Kurtz, Paul Ammann, Jeff Offutt, Márcio E Delamaro, Mariet Kurtz, and Nida 1025-âĂŞ1050. [11] Donald E Farrar and Robert R Glauber. 1967. Multicollinearity in Regression Analysis: The Problem Revisited. The Review of Economic and Statistics 49, 1 (1967), 92-107. [12] Joseph L Fleiss and Judith M Tanur. 1971. A Note on the Partial Correlation Coefficient. The American Statistician 25, 1 (1971), 43-45. [13] Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Gen eration for Object-Oriented Software. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE). 416-âĂŞ419. [14] Rahul Gopinath, Carlos Jensen, and Alex Groce. 2014. Code Coverage for Suite Evaluation by Developers. In Proceedings of the International Conference on Softconfounding and suppression effect. Prevention Science 1, 4 (Dec. 2000), 173-181. ware Engineering (ICSE). [15] Mark Gradstein. 1986. Maximal Correlation Between Normal and Dichotomous Variables. Journal of Educational Statistics 11, 4 (Dec. 1986), 259-261. [16] Michael Harder, Jeff Mellen, and Michael D Ernst. 2003. Improving test suites via coverage on test suite effectiveness. In Proceedings of the International Symposium operational abstraction. In Proceedings of the International Conference on Software on Software Testing and Analysis (ISSTA). 57-68. Engineering (ICSE). 60-71. [17] Wolfgang Karl Härdle and Léopold Simar. 2015. Applied Multivariate Statistical Analysis. Springer, Berlin, Heidelberg. [18] F. Hariri, A. Shi, V. Fernando, S. Mahmood, and D. Marinov. 2019. Comparing Mutation Testing at the Levels of Source Code and Compiler Intermediate Rep resentation. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST). 114-124. [19] M Jean Harrold, Rajiv Gupta, and Mary Lou Soffa. 1993. A methodology for controlling the size of a test suite. ACM Transactions on Software Engineering and Conference on Software Engineering (ICSE). 75-84. Methodology (TOSEM) 2, 3 (1993), 270-285. [20] Miguel A Hernán. 2018. The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data. American Journal of Public Health 108, 5 (May 2018), 616-619. [21] Miguel A Hernán, Sonia Hernández-Díaz, Martha M Werler, and Allen A Mitchell. 2002. Causal knowledge as a prerequisite for confounding evaluation: an applica tion to birth defects epidemiology. American journal of epidemiology 155, 2 (Jan. 163-171. 2002), 176-184. [22] Hwa-You Hsu and Alessandro Orso. 2009. MINTS: A general framework and tool for supporting test-suite minimization. In Proceedings of the International Conference on Software Engineering (ICSE). 419-429. [23] Laura Inozemtseva and Reid Holmes. 2014. Coverage Is Not Strongly Correlated With Test Suite Effectiveness. In Proceedings of the International Conference on Software Engineering (ICSE). 435-445. [24] Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser. 2019. Code Coverage at Google. In Proceedings of the Joint Meeting of the European Soft ware Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE). 955-963. [25] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: with Applications in R. Springer, New York, [50] Jane Sachar. 1980. Cautions in the Interpretation of the Partial Correlation NY. [26] Dennis Jeffrey and Neelam Gupta. 2005. Test suite reduction with selective re dundancy. In Proceedings of the International Conference on Software Maintenance [27] Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Transactions on Software Engineering (TSE) 37, 5 (2011), 649-678. [28] James A Jones and Mary Jean Harrold. 2003. Test-suite reduction and prioriti zation for modified condition/decision coverage. IEEE Transactions on Software [29] René Just. 2014. The Major mutation framework: Efficient and scalable mutation [30] René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Pro ceedings of the International Symposium on Software Testing and Analysis (ISSTA). 437-440. [31] René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are mutants a valid substitute for real faults in soft ware testing?. In Proceedings of the Symposium on the Foundations of Software Engineering (FSE). 654-665. [32] René Just, Gregory M Kapfhammer, and Franz Schweiggert. 2012. Using non and scalable mutation analysis. In Proceedings of the International Symposium on [33] René Just, Chris Parnin, Ian Drosos, and Michael D. Ernst. 2018. Comparing Program Repair. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA). 287-297. Gökçe. 2016. Analyzing the validity of selective mutation with dominator mutants. In Proceedings of the Symposium on the Foundations of Software Engineering (FSE). 571-582. [35] Joseph Lee Rodgers and W Alan Nicewander. 1988. Thirteen Ways to Look at the Correlation Coefficient. The American Statistician 42, 1 (Feb. 1988), 59-66. [36] Jun-Wei Lin, Chin-Yu Huang, and Chu-Ti Lin. 2008. Test suite reduction anal ysis with enhanced tie-breaking techniques. In Proceedings of the International Conference on Management of Innovation and Technology (ICMIT). 1228-1233. [37] David P MacKinnon, Amanda J Fairchild, and Matthew S Fritz. 2007. Mediation analysis. Annual Review of Psycholo