How well do weekly math assessments match up to standardized tests? (revisited)
Our mission at the Math Agency is to close educational gaps in public schools. Shameless plug: if you are interested in helping support our partner schools as an academic coach, we’d love to hear from you!
One of our major goals at the Math Agency is to demonstrate that we can reproducibly close educational gaps in a way that is clearly visible in public datasets. With that in mind, we want to better understand how our regular internal skill assessments (using IXL) match up to the Smarter Balanced Assessment (SBA), which is what the state of Washington uses to assess student math proficiency every year. One great thing about the SBA is that the results are publicly available.
We’ve previously seen that our IXL skill assessments correlate well with district and state standardized tests. In other words, the higher a student scores on IXL, the higher they tend to score on the SBA. In this post, we’ll look at how well the assessments agree on an absolute scale. As an example, if IXL indicates a student has 5th grade math skills, what are the odds they will test “proficient” in 5th grade on the SBA?
For this analysis we’ll be looking at data from over 160 students where we have Spring 2024 assessment data from both IXL and the SBA. The data is shown below. The SBA data is reported as an arbitrary scale score, while the IXL data is recorded as student grade level. For example, an IXL score of 3.0 indicates a student is meeting the early 3rd grade standards according to the Common Core system. We show a linear prediction of SBA scores based on the IXL data as a dashed line. There are several important things we can see from the chart below:
- We see good correlation between the two assessments. In other words, students that score higher on IXL also tend to score higher on the SBA. The correlation coefficient is 0.79, which agrees well with our previous estimate.
- As expected, the assessments don’t agree perfectly. There are a few ways to quantify this. We’ll look at model error: how far the “actual” SBA scores are from the linear predictions. The median (50th percentile) model error is about 0.3 grade levels. For a more conservative approach, about 10 percent of the time the linear predictions of SBA scores will too high by 0.6 grade levels or more.
- To estimate the agreement between the two assessment on an absolute scale, we show the official SBA “proficiency” levels for different grades as determined by the Washington Office of the Superintendent of Public Instruction (OSPI). Roughly, we’d expect the 5th grade proficiency cut-off (measured at the end of 5th grade) to match the skill levels expected for a “end of year” 5th grader, or that of a “beginning of year” 6th grader, which corresponds to a score of 6.0.
To get a better feel for how well the assessments agree in the definition of proficient, we use our linear model to convert the SBA proficiency levels into IXL grade level equivalents. The results are shown below. We see that two assessments agree fairly well for 3rd grade proficiency levels, but the SBA becomes a progressively “easier” test in 4th and 5th grade.
What does it all mean?
We started this analysis with goal of predicting the odds of students meeting state proficiency levels. We now have all the pieces to do it. For example, from Figure 2 we can see that on average, a 3rd grade student with an IXL score of 3.7 would test as proficient on the SBA. However, we may want more confidence than that! If we factor in our model uncertainty from above, we can determine that there is roughly a 90% chance that a 3rd grader with an IXL score of 4.3 scores as proficient on the SBA, a 4th grader with an IXL score of 4.9 does, and a 5th grader with a score of 5.5 does.
At this point, our current analysis is limited by a relatively small sample size, and it contains only a single year’s worth of data. And it is worth noting that our focus is always on getting students learning as much as possible, regardless of their current skill level! However, learning more about how our internal data relates to state assessments helps us better understand our progress towards closing educational gaps.