Does Assessing the Quality of Primary Research Matter? A Meta-Analysis of Analogical Instruction-Induced Changes in Students’ Science Concepts Author:Jin-Chang Hsieh, Yi-Ching Chiu
Research Article
As evidence-based education evolves, an increasing body of evidence has highlighted the issue of constructing high-quality indicators in the synthesis of quantitative research. However, what is considered optimal remains unclear and a topic of debate. Numerous researchers assess the single indicator of primary research such as the publication, pretest performance, reliability and validity, and sample size. This approach can result in bias. Therefore, this study proposed alternative quality indicators to replace the single indicator with suitable evidence synthesis. Research on analogy instruction was integrated through a systematic review and meta-analysis, and the effects of analogy instruction were explored through quantitative research synthesis.
With respect to literature reviews, document retrieval targets journal articles, master’s theses, and doctoral dissertations on the basis of references to relevant retrospective studies. Regarding analogy instruction, systematic literature collection and positioning are performed to avoid missing relevant articles and causing sampling bias. Moreover, the participants included in the present study ranged from preschoolers to elementary, junior high school, vocational high school, and university students. The analogy instruction–related topics discussed included single analogies, analogy bridges, and multiple analogies. However, comments on or introductions of analogy instruction and scientific concept learning performance were excluded from analysis, as were articles on the development at different stages and other irrelevant topics. Because quantitative meta-analysis was conducted, qualitative studies were also disregarded, as were publications that failed to provide the information required for calculating effect sizes.
In meta-analyses, Hedges’ g is used as a measure of the effect size of standardized mean differences (Hedges, 1982). Owing to pretest–posttest differences in data provided by different institutions, the effect sizes can be roughly classified as those of the mean raw score and the mean-covariate adjusted score (Morris & DeShon, 2002; WWC, 2020b). When the information on adjustments to pretest data, including statistics from analysis of covariance, adjusted means, and standard deviations, was provided, the effect size of the mean-covariate adjusted score was used. According to WWC recommendations (2020b), if this information was not provided but other data (e.g., pretest and posttest means and standard deviations) were and the pretest– posttest assessment tools were consistent, difference-in-differences was used. To account for pretest–posttest correlations, difference-in-differences adjustment was performed. When no pretest data were provided or only posttest-related statistics (e.g., from analysis of variance) were provided, the effect size was calculated from the mean-raw score.
Regarding the integration effect of analogy instruction in promoting students’ scientific concept learning performance, a medium immediate effect size of 0.59 was obtained. The delayed effect size was reduced to 0.39 (medium).
Regarding the moderators of analogy instruction, single analogy regression failed to reach significance, but the p-value was 0.08, that appeared to become more significant or nearly significant. The effect size of analogy instruction for the elementary school students was 0.88, whereas that for students in junior high school and above was 0.52, constituting a significant difference.
With regard to seeking more appropriate indicators for assessing the quality of primary research, three indicators are proposed herein: baseline equivalence of the pretest, reliability and validity of assessment tools, and sample size. For baseline equivalence, the effect size was not significantly different regardless of whether the baseline equivalence of the pretest was even. Regarding the selection of effect calculations, even if the attrition rate was excessively high (more than 10%) or the pretest difference was greater than 0.5 standard deviations, they were classified as nonconforming indicators. By this standard, as long as statistics from covariate analysis or pretest and posttest data were provided, the effect size was estimated by using the mean score after covariate adjustment. This appeared to compensate for both types of possible differences, which in turn led to nonsignificant results.
Regarding the reliability and validity of assessment tools, the regression coefficient β was −0.57 (pβ < 0.0001). The effect size of 1.06 was large. The result indicates that obtaining a large effect size is simple if the reliability and validity of assessment tools are not addressed. Regarding the sample size, the regression coefficient β for the basic number of participants was −0.81 (pβ = 0.003), demonstrating that a favorable effect size of 1.36 could be easily obtained for small-scale studies on analogy instruction. As Slavin (2008) noted, small-scale studies are likely to obtain extremely positive or extremely negative result.
In terms of meeting the three indicators, primary research considering all levels of validity had a medium effect size of 0.49. Similar results were obtained regardless of the quality assessment method used. The effect sizes of the studies that did not meet the quality assessment standards exceeded those of the studies that did, indicating that the effects of analogy instruction in primary research may be overestimated if quality assessment is not considered.
Significant effects were observed for both analogy instruction approaches and learning stages. The regression coefficient β values were 0.27 (pβ = 0.03) and −0.41 (pβ = 0.01), respectively. For the elementary school learners, the effect size of 0.77 for single analogies was large. The effect size for approaches other than single analogies was larger (1.04). For single analogies and nonsingle analogies among students in junior high school and above, the effect sizes were 0.36 and 0.63, respectively—roughly in the upper and lower intervals of medium effect sizes. Therefore, different analogy instruction topics had different effect sizes for students at different learning stages, with better results for topics other than single analogies.
Regarding publication type, multiples analysis was performed on moderators of the analogy instruction topics and learning stages. A tendency toward positive effects was observed. When publication type is controlled and two moderators are examined at the same time, publication type had little influence on analogy instruction.
Regarding the baseline equivalence of the pretest, the comparable regression coefficient β values were 0.07 (pβ = 0.72) and 0.03 (pβ = 0.89) to the analogy instruction topics and learning stages, respectively. Neither were significant. According to the study design and the information provided, the initial selection of an appropriate calculation scale can facilitate the adjustment of the comparability of the baseline equivalence of the pretest.
When all three indicators (i.e., baseline equivalence, reliability and validity of assessment tools, and sample size) were met, the mentioned conditions exerted impacts on analogy instruction topics. When the conditions were not fulfilled, the influence of other moderators or the effects of analogy instruction could be misestimated or overestimated, respectively.
When all three indicators (i.e., baseline equivalence, reliability and validity of assessment tools, and sample size) were met, the mentioned conditions exerted impacts on analogy instruction topics. When the conditions were not fulfilled, the influence of other moderators or the effects of analogy instruction could be misestimated or overestimated, respectively.
The conclusions are as follows: First, analogy instruction had an overall positive effect. Second, when the impacts of quality indicators were not controlled, single analogies were less efficient than other analogy instruction topics, such as multiple and bridging analogies. Furthermore, the elementary school students performed better, especially when they received instruction on topics other than single analogies. Finally, publication type is not a favorable indicator in the quality assessment of primary research because of mixed impacts with publication bias. However, alternatives can be well defined both theoretically and empirically through the examination of baseline equivalence, the reliability and validity of assessment tools, and the establishment of basic sample size requirements.