The cost effectiveness of the Math Agency’s 2023–24 program
Given the financial challenges school districts are facing right now, cost-effectiveness is more important than ever when making budgeting decisions. Unfortunately, recent data shows that Washington state is really struggling to make sure spending actually delivers for students. In this post we’ll compare our 2023–24 academic coaching program (described here) to cost-effectiveness benchmarks from academic research. The high-level summary is that compared to standard benchmarks we had a very large impact on academic growth across 9 different cohorts for a moderate cost. This suggests it was a really cost-effective program. If you are interested in the details, keep reading!
The difficult part in measuring cost-effectiveness in education is the effectiveness part; the costs are relatively simple to figure out. The good news is that if student learning is the main goal, standardized tests can provide a solid baseline for measuring the impact of many spending decisions. The general idea is to measure changes in student skill levels to get academic growth, and then compare growth between different cohorts of students to estimate the impacts of any specific program.
Fundamentally, what we’d like to know is how much did the students in the program learn compared to what they would have learned without the program? Unfortunately, we can’t directly measure that. Students were either in the program or they weren’t. However, with some careful analysis we can often get a good approximation of what we’d like to measure. The key is finding a good comparison set.
Seattle Public Schools (SPS) administers the Measures of Academic Progress (MAP) assessment in both fall and spring. This test provides a good starting point for a comparison set: MAP compares each student’s growth to a benchmark that takes into account both 1) the student’s grade and 2) the student’s starting skill level. Starting skill level is important because research shows that individual student growth is highly correlated from year to year. For example, a student that had low growth in the past will tend to have low growth in the future. This means you don’t want to be directly comparing students with historically low growth to those with historically high growth when trying to evaluate an intervention program.
Unfortunately, the standard MAP benchmarks aren’t perfect when trying a measure a program like ours, where we’d like to separate out our impact from school-wide effects (like a great principal) or classroom effects (great teachers). However, we can get an estimate of those effects by looking at growth for students in the same classrooms as our students but who weren’t enrolled in our program. This analysis for the 2023–24 school year is shown below, where we’ve plotted the average “percentage of expected growth” for each cohort. For example, 100% means a student did exactly as expected based on their grade and starting skill level, and 150% would mean they showed 1.5x of expected growth. It is important to note that the expected growth values are based on what similar students actually learned, as opposed to standards-based approaches that benchmark based what students “ought” to learn.
From Figure 1 we can immediately see that:
- In general, both Math Agency and non-Math Agency students averaged more than expected growth. Specifically, Math Agency students (101 of them in total) averaged 165% of expected growth! One natural question is why the non-Math Agency students have higher than expected growth. We know that we have great partner schools, and we’d like to think that above average schools are more likely to want to partner with us:) SPS doesn’t publish their overall results, but there could also be a district-wide trend of higher than expected growth. Finally, we suspect our program may generate “spillover” effects, where having more motivated and skilled students makes it easier to teach the entire classroom. This is definitely an area we’d like to explore more in the future.
- In all seven cohorts, students enrolled in the Math Agency showed more academic growth than their non-enrolled classmates. This seems like a good indication that our program is having a consistent, positive impact on student learning.
However, this analysis doesn’t tell us how material these gains are. For example, how unusual is it to see average expected growth over 200% for a group of students? To answer this, we can convert student growth into effect size, which normalizes the growth differences by the variability of the measurement (the standard deviation). Doing this also lets us compare results from different assessments. For example, Bellevue School District uses the STAR assessment, which is similar to the MAP in that it provides benchmark growth for students in the same grade. However, the STAR assessment doesn’t account for differing initial skill levels, so the methodologies are not identical. We show the results (in standard deviations, and including those for our Bellevue cohorts) below, along with the standard errors based on the differing sample sizes in each cohort.
This data shows a 0.50 standard deviation (SD) average difference between Math Agency and non-Math Agency students from the same classrooms. Having an estimate for our effect size allows us to compare our 2023–24 results to other research on academic interventions.
The good news is that there have been a lot of studies on the effect sizes of various academic interventions. For this post, we’ll use a nice summary from Matthew Kraft, along with a comparison of intervention costs. For reference, our program typically costs between $2000 and $3000 per student to implement. Based on the results from over 240 studies, Kraft proposes that any effect over 0.2 SD should be considered “large”, while a cost of $2500/student would be squarely in the “moderate” cost category. From this we can see that 0.5 SD of extra growth for $2500/student would be considered very cost effective. This is consistent with the most recent report from the Education Recovery Scorecard, where the #1 recommendation to districts is for them to invest their money in “…interventions which have been shown effective, such as tutoring and summer learning”.
There is at least one really important caveat in this comparison. The most common format to measure effect sizes is via a randomized controlled trial (RCT). Our results are from a simpler “A/B” format, and so we should be cautious with exact comparisons of effect sizes. However, the consistency and the magnitude of our impact across all seven cohorts is definitely encouraging.
What is the takeaway?
District standardized tests suggest that students enrolled in our 2023–24 academic coaching learned a lot: students averaged over 165% of expected growth. Comparing growth from students enrolled in our program to that of students not enrolled in our program suggests an effect size of ~0.5 standard deviations. When taking the cost of the program into account and comparing it to research on other academic interventions, we see that it was extremely cost-effective. We suspect that cost-effectiveness will become increasingly important in school budgeting decisions in the upcoming years.