What’s needed to observe educational impact in public datasets?

5 min readJan 24, 2023

We recently wrote about the success of one of our school partners in closing educational gaps at their school: their remarkable gains in student learning showed up clearly in public data. However, we worked with three public schools last year, and in this post we’ll take a systematic look at the public data from Washington’s OSPI for all three. We’ll also get more explicit about what it takes to make a large enough impact on students to see it in aggregated datasets like those from OSPI.

At a high level, there are three key criteria we need to satisfy to see a clear student impact via public data:

1. Work with a high fraction of at-risk students.

Washington state OSPI data is aggregated by school and grade. This means if we want to see a significant increase in the percentage of students meeting math standards, we need to work with a significant percentage of the students who are at-risk for not meeting those standards. This requires to:

Have enough capacity to serve a significant fraction of the target students.
Successfully identify and enroll the at-risk students.

2. Work with students for a significant fraction of the year.

Academic learning accumulates over time. For most interventions, the overall impact will be proportional to the length of the intervention. In our program, we’ve generally seen a linear relationship between practice time and learning. This also matches our “common sense” view: it’s difficult to have a big impact on students if you only work with them for a short time.

We see enrollment duration issues generally coming from two areas. The first is when individual students get added/dropped from the program throughout the year. The second and more structural issue comes from the fact that programs often start late (after school has been in session for a month or two) and finish before the end of the school year.

3. Significantly increase academic growth while students are enrolled.

Of course, the time spent in an intervention only matters if the program is effective. We’ve spent a lot of time in previous posts discussing our measurements of program effectiveness: you can read a summary of overall impact in 2021–22 here. For the following analysis, we’ll use the students’ academic growth rates while they are enrolled in our program as a measure of effectiveness.

With these three factors in mind, we’ll look at each partner school individually below. However, we’ll start with a big caveat: our program is only one component of a student’s total learning. There clearly can be significant increases (or deceases) in student growth independent of our work. In this analyses we’ll be checking whether public assessment data is consistent with our program’s observations; we can’t comment on causality using the data presented here!

Finally, we’ll see that student growth rates can vary a lot by grade, even within a single school. This is a topic we’ll address in a future post when we look at what can cause variability in program effectiveness.

With all that in mind, let’s look at the data!

School #1.

The table below shows that we did a good job of working with a significant fraction of students over the school year at School #1. However, the school was frequently adding and dropping students from our program, which means we generally didn’t work with any particular student for very long. In this case we were limited by our enrollment times, and also by program effectiveness for the higher grades. We wouldn’t expect to see a significant increase in aggregated assessment data from our impact.

The public assessment data is consistent with our program observations:

Fraction of students meeting math standards over time time for School #1, broken out by grade.

School #2.

This school had the highest growth rate out of all of our partners. We suspect it was a combination of a low student/coach ratio and potential self-selection among participants. However, the program didn’t start until January, so we missed a big fraction of the school year. Additionally, we only worked with a very small number of students; too small to have a visible impact across an entire grade. Summary: in this case we worked with too few students (and over too short of a time period) to expect to see an increase in aggregated assessment data from our impact.

The assessment data for this school shows a significant dip for COVID and then some recovery, though scores are still significantly lower than pre-COVID. Regardless, it is clear that our program wouldn’t have influenced these outcomes significantly….well, not more than 6%, anyway!

Fraction of students meeting math standards over time time for School #2, broken out by grade.

School #3.

At our third school, we had a significant number of the 4th and 5th grade classes enrolled for a reasonable amount of the school year. Those grades also showed significant academic growth. This school was thus the only school where we hit all three conditions needed to see a material increase in aggregated assessment data.

The assessment data here also shows a significant drop during COVID. However, the recovery in 2021–22 (for both 4th and 5th grades) is major: some of the largest in the state.

Fraction of students meeting math standards over time time for School #3, broken out by grade.

What is the takeaway?

Across all three schools, we see that our measured impact metrics are consistent with what we observe in the publicly available OPSI data. More importantly, these impact metrics highlight a perhaps surprising gap in our program: we need to get more effective at spending the entire school year with students. In our best case (School #3) we still only averaged 70% of the school year with them: getting that to 100% would increase our impact by over 40%! While 100% enrollment time may be a bit of a stretch, it is clear that this type of simple “nuts and bolts” focus on making sure our program spans the full school year has a lot of potential for increasing our impact.