Validating student growth metrics

Mike Preiner
6 min readMar 23, 2021

Our last post analyzed the impact of a math tutoring program we are running at Lowell Elementary. In particular, we found that students in our program have math growth rates of ~2.0 grade levels per year. This is big change from their historical growth rates of ~0.6 grade levels per year. In this point I’d like to dig a bit deeper into why we care so much about growth rates, and how it can be tricky to measure them. Finally, I’ll show why we think we’ve got a solid estimate of it for our students. Specifically, we’ve recently added another source of data (mastery data from Khan Academy) that is in close agreement with our other diagnostic data.

Why is growth rate THE thing to measure?

The single most important in education is figuring out how to increase student learning. More specifically, we want to figure out how to increase how much students learn over a given amount of time. A single snapshot of test scores isn’t enough. If a student does poorly in 5th grade, is it because they fell behind in 5th grade? Or did they actually fall behind in 3rd grade? To answer that question we need to measure their growth rate on a regular basis.

Similarly, whenever we look at academic programs or interventions, we really care about how much they improve student growth. To illustrate this, let’s imagine two different programs that aim to improve learning in 5th graders:

  • Students in Program A start with an average grade level of 5.5 and end at 5.9: a growth rate of 0.4 grade levels per year.
  • Students in Program B start with an average grade level of 4.5 and end at 5.8: a growth rate of 1.3 grade levels per year.

Even though students in Program A ended the year with a higher average score, Program B was clearly the better program.

Why is it tricky to measure growth rates?

Unfortunately, measuring growth rates is more difficult than taking a single snapshot. There are few reasons. The most basic that it requires at least two measurements. Clearly that can be solved, but there is still the issue of noise. Measuring growth rates requires taking a difference between a beginning score and a final score, and differences tend to be sensitive to measurement noise.

How reliable are our estimates of student growth?

Given the inherent noise in growth estimates, we want to be cautious when reporting growth numbers for our students. There are two basic ways we can reduce the noise in our measurements:

  1. Average our data (both across students and over time)
  2. Increase the time between our initial and final measurements

To highlight the first point, I’ve plotted the diagnosed overall math grade level for one of the students in our first cohort below. We can see that over the last few months, her grade level has actually decreased. Has she really been going backwards? This particular student has the strongest practice record of the entire cohort: she has practiced every day since early January. Based on what I’ve seen from her, I’m very confident that she is making real progress. On the other hand, I’m pretty sure that her first diagnoses were too high, probably by 0.3 to 0.5 points. The steady decrease of scores since then reflects the fact that her initial scores were too high…not that she was forgetting math! At this point her scores are beginning to increase again, and I expect they will continue to increase for the rest of the year. But based on the noise in the initial assessment, it will take a few months before she officially shows positive growth.

Diagnosed overall math grade level for a single student as function of time.

Some students, on the other hand, probably had an initial diagnosis that was too low. That would give them initial growth rates that were artificially high.

What is the solution to this problem? By averaging all of our students’ scores together, we’ll average out much of the noise, and thus get a better sense of the actual growth our students are typically seeing. The cohort-average growth rate for our students as function of time is shown below.

Average growth rate (for our entire first cohort) as a function of time. We are currently averaging a growth rate of approximately 2.0 grade levels per year.

We see that even this student-average growth rate is noisy…it goes up and down over time. If we take the time-average of the signal for the last 2 weeks, we end up with a single, fairly stable average estimate of ~2.0 grade levels of growth per year*.

Finally, we do have yet another way to increase our accuracy: simply wait. Over time, as students learn, the difference between the “final” and “beginning” scores will increase, thus increasing the amount of signal we have relative to the amount of noise in our measurement. Because of this, we expect our estimates to continue to become less noisy over time. We can see that in the chart above: the data in the last three weeks is much more stable than in the first three weeks! By the end of the year, I expect we’ll have reliable growth estimates even at the individual student level.

Do we have any other data sources?

Unfortunately, Seattle Public Schools cancelled all of their formal assessments this year, so we don’t have any standardized tests as a reference. However, we do have another relevant metric we can use. Right now we are using Khan Academy for much of our content and regular practice. The content is broken out by grade level (and topic), and the students build up an overall “mastery score” for all of the content they correctly complete. What does this look like? We show data for one of our students below.

For this student, we’ve been practicing skills that range from 4th grade level to below 2nd level. For simplicity, we just show their mastery of 2nd and 3rd skills over time. Between November and the end of March we can see that their total mastery of 2nd and 3rd grade skills has increased by over an entire grade level (~65% in 2nd grade skills and ~40% in 3rd grade skills).

Mastery of 2nd and 3rd grade Khan Academy content over time for a single student. The student is currently in 4th grade, but we’ve done a lot of review of 2nd and 3rd grade material.

Of course, simply completing 2nd grade exercises doesn’t necessarily mean a student’s overall skill level is increasing. A simple way to game the system (in the short term) would be to simply assign content that was far below the student’s actual skill level. They could easily complete the work, and their mastery level would go up! This type of behavior would make the Khan Academy growth estimate artificially high. On the other hand, the students aren’t just learning from Khan Academy. They are still getting regular instruction from their teachers, and we do a lot of extra 1:1 work that is not tracked in Khan Academy. This would tend to make the Khan Academy growth estimate artificially low.

So where does the Khan Academy data end up? If we take the average across our entire first cohort since we began our program, we get an annual mastery growth of 1.95 grade levels per year, which is surprisingly close to our IXL results.

What does it all mean?

We’ve already discussed the importance of accurately measuring student outcomes. Based on our current data, we seem to be in a very good position to do that. We’re seeing pretty good agreement between two independent evaluations of student learning (Khan Academy mastery data and IXL diagnostic data). Even better, the high quality data we’re getting means that we have a good signal for guiding changes to our program. We’re already using this data to test several ideas for further increasing our program impact. More on that to come!

*Our growth rate estimate of 2.0 grade levels per year assumes a 12-month year, since Lowell Elementary runs a summer math program. If we assume a 9-month school year we get a growth rate of 1.5 grade levels per year.



Mike Preiner

PhD in Applied Physics from Stanford. Data scientist and entrepreneur. Working to close education gaps in public schools.