Is it possible to completely close elementary school achievement gaps?

8 min readAug 17, 2020

In light of COVID-19 related school closings, there has been a lot of concern about widening achievement gaps: a McKinsey report from June estimating the long-term cost of lost K-12 education in the US (over $100B assuming an optimistic return to regular schooling in January 2021) also projected a disproportionate impact on students from low-income and and minority families.

This is particularly worrying given that in many places educational gaps were already worsening before the pandemic. In Washington state, for example, 8th grade educational gaps for every single disadvantaged group have actually increased since 2015, a pretty marked contrast from a 2010 report from OSPI’s Achievement Gap Oversight and Accountability Committee, which estimated that all of Washington’s gaps would disappear within 7–50 years, even if they did nothing.

Opportunity Gap chart taken from the 2010 “Closing Opportunity Gaps” report from the OSPI’s Achievement Gap Oversight and Accountability Committee, showing educational gaps from 1998 to 2009. Between 2015 and 2019 (not shown) 8th grade educational gaps increased for every disadvantaged group in Washington.

Given our poor history of closing (or even predicting) these gaps, it seems helpful to step back and see if we even have the tools to close them. This post will look at this question in detail. To keep things simple, we’ll start with elementary schools and focus on standardized tests as our measure of outcomes: in particular, proficiency scores on the math portion of the Smarter Balanced Assessment (SBA).

Throughout this post, we’ll highlight some key points and questions. Let’s start with the first:

There are interventions that seem to repeatably improve learning for disadvantaged students.

It turns out that only a relatively small number of interventions and programs have actually been shown to repeatably improve student outcomes. I won’t go through the list in detail; that will be a topic for another day. For now we’ll focus on one of the interventions that tends to have the largest impact and has been reproduced in several different contexts: small-group tutoring.

If you think of it as the extreme limit of small class sizes, it’s not surprising that intensive tutoring can be effective at boosting student learning. There have been a number of randomly controlled trials (RCTs) on various high-quality programs, and studies often find effects of 0.2 to 0.3 standard deviations (SD) on standardized test scores for math tutoring programs. An effect size of 0.25 SD roughly corresponds to an 16% increase in the number of students testing as proficient on the SBA.

Because of this body of research, tutoring has gotten a lot of interest as a way to combat the effects of COVID-19 related learning loss. The Brookings Institute recognized this fairly early, and the Washington State Institute for Public Policy (WSIPP) has recently been running a lot of cost-benefit analyses on tutoring.

However, this brings us to our second point:

There are still a lot of gaps in our understanding of how well most interventions (including tutoring) work.

There are clearly many different ways to run tutoring programs, and some of them will be much more effective than others. Also, almost all of the research involves evaluating single interventions over a single year. If we are interested in fully eliminating educational gaps, we are probably going to need to “stack” multiple programs (for example, tutoring and social-emotional learning programs) together, both within a single year and across several years.

One thing that we do know about academic interventions is that the effects tend to gradually fade out over time. For example, if you run a program that boosts scores of participating kindergarten students by 50%, by the time those same students reach 5th grade their scores will still be higher than non-participants, but not by 50%. It is critical to keep this fadeout effect in mind when thinking about stacking interventions across years.

We recently analyzed 5 years of data for every K-12 school in Washington to measure the fadeout effects for both math and English. We then used the results to estimate the impact on SBA math scores of implementing an intensive math tutoring program. The results are shown below for at particular elementary school in northern Seattle with 3 different implementation years: 3rd, 4th, and 5th grade. There are a couple of things to notice in the chart:

The fraction of students that meet math standards decreases as students get older. This is true for WA as a whole, and also for all states that use the SBA. It probably highlights some fundamental issues in the way we prioritize and teach math.
We see that due to the fadeout effect, the biggest gain for any tutoring program is in the year that it is implemented. In other words, if you want to boost 5th grade scores and only are going to do one year of tutoring, you’ll want the tutoring to happen in 5th grade.

Fraction of students proficient in the SBA math test in 3rd-5th grade for an elementary school in northern Seattle. The “Actual Scores” line shows scores for the cohort that was in 3rd grade in the 2016–17 academic year. The additional lines show the simulated effects of a 1-year intervention in each grade.

I’ve been speaking to a lot of elementary principals lately, and they often think of running programs for specific grades. But in this post we’re interested in completely eliminating our educational gaps. That brings us to an important question:

What would happen if we ran our interventions across multiple years?

To answer this, we can use our measured fadeout coefficients to simulate continuous interventions (in 3rd, 4th, and 5th grades) for a single cohort of students. What does our model show? Instead of roughly 60% of students meeting math standards in 5th grade, we would have almost 90% of students meeting it. This is enough to fully eliminate all of the educational gaps within this school!

The same measured school data as in the preceding figure, but here we simulate the results of a continuous intervention, in which struggling students in the cohort receive intensive tutoring in 3rd, 4th, and 5th grades.

To put these effects in context, below is a before/after plot for this particular school, where we also show scores for every other school in Washington. The SBA math scores are plotted against the fraction of low-income students at each school, which is the most important demographic predictor of test scores. Today this school already performs well compared to schools with similar low-income fractions. However, stacking the tutoring program across 3 years would make this school one of the highest performing schools in the state, even though almost 80% of its students come from low-income families. This means that programs like we are describing would not only eliminate gaps within schools, but also gaps across most schools as well.

Plot of 5th math SBA results versus the fraction of low-income students for each school in Washington for the 2018–19 academic year. Each school is a separate dot, with the size proportional to the number of 5th grade students. The dashed line shows the result of a fit to the entire dataset. We’ve highlighted the same school shown in previous figures, both with its actual scores (blue) and with the simulated results of 3 years of continuous tutoring (orange).

Given the impressive results from our “stacked” interventions, we should ask what potentially risky assumptions that we have made in our model.

At this point, our model is making two significant assumptions.

The first key assumption is that we can reproduce the single-year effects demonstrated in previous RCTs. As we mentioned earlier, the research on small-group tutoring is some of the more robust work in the education space. Nevertheless, reproducing the results of educational interventions can be difficult and always represents a risk.

Our second key assumption is that the 2nd and 3rd year of tutoring will have effects similar to the first. This hasn’t been studied much in the literature, and is definitely an assumption we’d want to investigate further.

That brings us to our final question:

Why aren’t we trying this already?

Before going further, I should mention that there are already schools trying this in various ways, shapes, and forms. However, I haven’t seen much in terms of projects designed to create robust, measurable outcomes. It seems like there are two main barriers.

The first barrier is financial. Programs like we’ve described cost money. To gauge how much, let’s walk through two financial scenarios: an optimistic scenario and a pessimistic one.

Optimistic scenario: here we’ll assume program structures similar to those used in the RCTs in terms of tutoring hours spent per student and a tutor cost of $20/hr. We’ll also assume that we can accurately predict which students need tutoring, and that we’ll only tutor those students that would not pass the SBA test if they didn’t receive tutoring (this ends up being a little over 70 students a year across 3rd, 4th, and 5th grades). In this case, the continuous tutoring program would cost about $30k/year.

Pessimistic scenario: let’s assume program costs that are twice that in the published research, either because we spend more money per tutor or because we spend more hours on each student. Furthermore, let’s assume that we can’t accurately predict which students need tutoring, and so we end up tutoring twice as many students as actually needed (i.e. half of them would have passed the SBA anyway). In this case, the full program would cost about $120k/year.

So we see that the cost of a continuous tutoring program that fully closes within-school educational gaps would probably be between $30k and $120k/year. That is a lot of money, but to put it in perspective, the school we’ve analyzed throughout this post gets about $750k/year (from various learning assistance programs, Title I, and levy funds) specifically to help disadvantaged students, so this doesn’t seem like a fundamental barrier.

The other barrier is operational. Setting up, implementing, and measuring the effects of a program like we’ve described is no mean feat. When I’ve spoken to elementary school principals about tutoring, several have mentioned that they often have trouble getting the tutors to even show up…this could be one reason to assume we need to spend more money on the tutors! Also, the analytics required to accurately target students and measure the program efficacy aren’t trivial, and so it isn’t surprising that there are real challenges to making something like this work at scale. However, I think it’s fair to say that this also shouldn’t be an insurmountable barrier.

Wrap Up

To bring it all together, there are three main takeaways from this analysis. I want to mention that we focused on only one particular intervention, and in theory we should be able to get even better results if we stacked several different types of them together, such as combining our tutoring with social-emotional learning, for example. Regardless, it seems safe to say that:

There are known interventions that have been shown to repeatably and significantly improve elementary school test scores.
There is still a lot of work to do to fully understand how to make specific implementations repeatably succeed: it will require a combination of detailed operational and analytical work but doesn’t seem unrealistic from a financial perspective.
Stacking these interventions over time should lead to big gains, enough to close educational gaps both within schools and between schools.

In other words, the good news is that it should be possible to significantly or completely close the educational gaps in our elementary schools with known tools. However, it will require a concerted operational and analytical effort, and like most good things, time and money.