Meeting Bloom’s 2 sigma challenge with research-validated strategies

Title: Improved Learning in a Large-Enrollment Physics Class (closed access)

Authors: Louis Deslauriers, Ellen Schelew, Carl Wieman.

First Author’s Institution: University of British Columbia

Journal: Science, Vol. 332, Issue 6031, pp. 862-864 (2011).


Physics Education Research and Bloom’s 2 sigma challenge

Discipline based education research, including physics education research, has identified many pedagogical techniques that out-perform traditional lecture methods in terms of increases in student scores in assessments of learning. Even though there are many such techniques, traditional lecturing still dominates. In fact, a recent PerBites post by Emily Kerr discusses efforts by researchers to explore some of the factors that contribute to this state of affairs.

In this blog post we discuss a paper that not only showed that research-validated techniques can improve student learning but also met, and surpassed, a very important challenge in educational psychology — Bloom’s 2 sigma challenge. Educational psychologist Benjamin Bloom published an article in 1984 in which he showed that one-on-one mastery style tutoring can produce differences in learning at a 2 sigma level compared to traditional methods. He then asked whether it is possible to achieve learning outcomes in group instruction that are at the same level as in one-on-one tutoring. This is the 2 sigma challenge.

Now, the authors today’s paper didn’t set out to meet the 2 sigma challenge. In fact, their aim was to see whether combining multiple research-validated techniques can lead to improvements in learning compared to traditional lecturing. To this end, the authors conducted an experimental study in which they compare the scores on an assessment taken by two groups: a class based on traditional lectures (the control group) and a class where the instructor combined 5 research-validated instructional techniques (the experimental group). The fact that the study met the 2 sigma challenge is a testimony to the robustness of the methods recommended by physics education researchers.

Participants, study setup and the final test

For their study, the authors selected students who were taking a first-year calculus-based physics class that was required for all engineering majors. Two of the three sections of this class, each taught by a different instructor, participated in the experiment. One of the sections formed the control group and the other formed the experimental group. The authors carried out the experiment during the 3 hours of lecture time that formed the 12th week (out of 14) of the semester. The topic covered during the 12th week was electromagnetic waves, including traditional topics such as plane waves, energy in waves, and so on.

Given that the authors wanted to conduct an experiment and not an observational study, they decided to check how similar or dissimilar the two groups were on various measures such as perceptions, behaviors, and knowledge with regards to basic physics. It turns out that the two groups are nearly identical in these measures. The first 5 rows of the table presented in figure 1 (Table 1 in the paper) shows this clearly. The number of students enrolled in each section, the BEMA scores(Brief Electricity and Magnetism Assessment), the CLASS scores (Colorado Learning Attitudes about Science Survey), and scores on two midterm exams for the students in the two groups are nearly identical.

DeslauriersEtAl2011Table1
Figure 1: Table 1 from the paper. Measures of student perceptions, behaviors, and knowledge of students in the control group and the experimental group.

So how did the two groups differ? The two groups differed in who taught the 12th week of classes and how they were taught during that week. The control group continued with the same instructor teaching via power-point slides and summative assessment based on clicker questions (i.e., assessments designed to determine what the students understood after teaching, as opposed to formative assessments which are used while teaching). The experimental group, on the other hand, saw two major changes. First, a new instructor, namely the first author of the paper, took over the classes, and the second author assisted the new instructor in teaching the classes. Second, instead of lecturing, the new instructor used multiple research-validated teaching elements: preclass reading assignments, preclass reading quizzes, in-class clicker questions with student discussions, small group learning activities, and targeted in-class instructor feedback.

The last element of the study is a test for evaluating student learning in both groups. The test consisted of a set of 12 questions that the instructor of the control group and the first author of the paper agreed upon. The test was administered in the first class that followed the 12th week of class.

More details on the experimental setup

As mentioned earlier, during the week in which the experiment was carried out the control group continued with the same instructor. The authors mention that the instructor for the control group, who is not among the authors, had many years of teaching experience and had received high student ratings in previous semesters. The instructor continued with the teaching format that had been used in the previous 11 weeks: lectures using power-point slides and clicker questions for summative assessments. The clicker questions were the same in both groups. But, in the control group the instructor used them to understand how much the students understood (i.e., as a summative assessment tool) where as in the experimental group the instructor used them as a tool to actively teach with (i.e., as a formative assessment tool).

The experimental group saw a few changes. The first author of the paper replaced the instructor who taught the first 11 weeks and the second author assisted in teaching the classes. In addition, before the beginning of the first class of the 12th week, the instructor explained to the students why they were going to be taught differently and how research supports such methods (see Emily Kerr’s PerBites post for how the authors have further expanded on this idea). For each of the 3 lectures for that week, the instructor asked the students to complete a reading assignment, 3-4 pages long, and a short true-false quiz related to the reading, before coming to class. At the beginning of each class time the instructor asked the students to split into groups. The instructor starts the class by showing several clicker questions, and question by question, the students engaged in discussions within their group. After discussions students voted individually on an answer. When voting was complete, the instructor displayed the results and discussed the results with the students. Then the class moved on to the next clicker question. In addition, the instructor presented a few group tasks that required the students to work on problems based on the topics that were covered by the reading and the clicker questions. While the students were engaged in the group tasks, the instructor walked around the classroom and gave feedback on what the students were discussing. The instructor also took time to answer questions that students raised on their own during clicker question discussions as well as during the group tasks.

The authors state that the questions used in the final test were previously used at another university, and were adopted with slight modifications. Two days before the test, the instructor gave the students in both groups access to all materials used in the experiment along with answers (except the test questions and answers, of-course). The instructor informed the students that their score on the test won’t affect their grades, but it would be a good practice for the final exam. The control section had covered material related to all 12 questions, where as the experimental section could only cover material related to 11 questions. Out of the 271 students in the experimental group 211 attended the test where as 171 out of 267 students from the control group took the test. The authors state that this attrition rate was a continuation of the attendance pattern seen in regular class time.

Results and discussions

DeslauriersEtAl2011FinalScores
Figure 2: Histogram of scores in the final test for students in the two groups. Figure 1 in the paper.

Finally, we can look at the results! It wouldn’t come as a surprise to regular readers of PerBites that the experimental group out-performed the control group in the final test. What might come as a surprise, though, is the degree to which the experimental group out-performed the control group: the average score in the control group was 41% while that in the experimental group was 74% (random guessing would have produced a score of 23%)! Figure 2 shows the actual distributions of scores. The standard deviations for both groups were about 13%, which implies an effect size of 2.5 standard deviations(see this link for more on effect sizes). This is considered a very large effect — an effect-size of about 0.4 puts an intervention in a good light.

The authors offer some explanations for why they might have achieved a high effect-size. To start with, the authors had deliberately designed the clicker questions and discussion to arouse student interest and not just engage them intellectually. This increased interest might also explain the fact that both attendance and engagement went up by 20% in the experimental group; see rows 6-9 in figure 1 above. Moreover, this work only looks at one-week of student activities, and the final test took place within a few days of the last class. So this work is really measuring the immediate learning gains from the interventions, without any subsequent studying. Whether we would see such high effect size after, say two weeks or so, is an open question.

No matter how we interpret the large effect-size from this study, a very important take-away is the fact that instructors can use a collection of research-validated instructional methods to produce enhanced learning, attendance, and engagement in large enrollment science classes. To make things easier for the instructors, the students seems to have liked the interventions! The authors report that they gave the students in the experimental section a survey 1 week after the experiment. Though only 150 students replied to the survey, 90% agreed with the statement “I really enjoyed the interactive teaching technique during the three lectures on E&M waves”.

The large effect size and student acceptance makes the intervention reported in this paper something that instructors at universities can start experimenting with in their class rooms. In fact, Emily Kerr’s PerBites post discusses further work done by authors of this paper that offers concrete steps on how to handle challenges, for example student resistance, that may arise while doing so. The main recommendations are: 1) introduce these methods early in the semester, 2) make students aware of the fact that the extra cognitive effort they need to put in is what leads to more learning, 3) give an examination or a quiz as early as possible so that students can gauge their actual learning, 4) use research-based facilitation and explanation strategies throughout the semester, while encouraging students to work hard, and 5) gather frequent feedback from students and respond to their concerns. In light of such scientific support and practical guidance from physics education researchers there really is no excuse for not using research-validated teaching methods in classrooms.

Figures used under Creative Commons Attribution 4.0 International. Header image is PerBites original image, licensed under Creative Commons Attribution 4.0 International.

Leave a Reply

Your email address will not be published. Required fields are marked *