The quality of reporting of experimental results in computing education literature has been previously shown to be less than rigorous. In this study, we first examined research standards set forth by four organizations: American Psychology Association (APA), American Educational Research Association (AERA), What Works Clearinghouse (WWC), and the CONsolidated Standards of Reporting Trials (CONSORT). We selected the most important data standards based on their prominence across all four and the most typical study designs in computing education research. We then examined 76 articles designated as quantitative research studies (K-12) published in ten venues (2012-2018) to determine whether the reporting in these articles met these five standards. Findings indicate that only 48% of these articles report effect size and even fewer (11%) report confidence intervals and levels. We found that reported data did not meet the standard that data should be "reported in a way that the reader could construct effect-size estimates and confidence intervals beyond those supplied in the paper". Additionally, authors used existing instruments less than a quarter of the time (24%) and used instruments with evidence of reliability and validity less than half of the time (39%). We conclude with recommendations for those in the K-12 computing education research community to consider when reporting statistical data in future work so that we can increase the level of rigorous reporting in this growing field.
先前已表明,计算机教育文献中实验结果的报告质量不够严谨。在本研究中,我们首先考察了四个组织所设定的研究标准:美国心理学会(APA)、美国教育研究协会(AERA)、有效教育策略资料中心(WWC)以及试验报告统一标准(CONSORT)。我们根据这些标准在四个组织中的突出程度以及计算机教育研究中最典型的研究设计,挑选出了最重要的数据标准。然后,我们对在十个刊物(2012 - 2018年)上发表的76篇被认定为定量研究(K - 12)的文章进行了研究,以确定这些文章中的报告是否符合这五项标准。研究结果表明,这些文章中只有48%报告了效应量,报告置信区间和置信水平的更少(11%)。我们发现,所报告的数据不符合“应以读者能够构建超出论文所提供的效应量估计值和置信区间的方式进行报告”这一标准。此外,作者使用现有工具的比例不到四分之一(24%),使用具有信度和效度证据的工具的比例不到一半(39%)。最后,我们为K - 12计算机教育研究领域的人员在未来工作中报告统计数据时提出了一些建议,以便我们能够提高这个不断发展的领域中严谨报告的水平。