Shrinking the Zone: What MLB Umpires Teach Us About Grading
- Dr. Chad Lang
- Feb 18
- 6 min read
When you live in the Midwest United States, the moment the Super Bowl is over you starting seeing social media posts about Spring Training. Pitchers and catchers report to sunny locations to start yet another grueling season of Major League Baseball. It got my social media algorithms delivering me baseball content. Scroll social media long enough and you’ll see them, heat maps comparing human-called strikes to the video-defined strike zone in Major League Baseball. This led me down a rabbit hole of baseball fanatics and statisticians on Reddit.
The red zones.
The yellow misses.
The debates.
From 2007 to 2024, something remarkable happened: umpire accuracy steadily improved. With better pitch-tracking technology, clearer feedback, and systematic review, performance sharpened year after year.
Then 2025 happened.


Accuracy jumped again.
Why?
Because MLB reduced the allowable buffer from ±2 inches around the strike zone to just ±0.75 inches.
And the umpires recalibrated…fast.
Not because the job became easier. But because the definition became clearer.
Clarity Drives Precision
For years, a two-inch buffer defined “close enough.” When that buffer shrank, the margin for interpretation narrowed. Expectations tightened. Feedback became sharper. Calibration became more deliberate. Professionals rose to the standard.
There is a profound lesson here for education.
In many grading systems, our buffer is enormous.
“Mostly understands.”
“Good effort.”
“Close.”
“Shows improvement.”
“Kind of gets it.”
When success criteria are vague, grading drifts. Two teachers can evaluate the same student work and arrive at different conclusions. Students guess at what matters. Feedback becomes general rather than actionable; if feedback is given at all.
The strike zone in many classrooms is still two inches wide. Unfortunately, some strikes are “called” even outside of that depending on the “umpire” you might run across, and that leaves students befuddled.
Standards-Based Grading: Shrinking the Buffer in Schools
Standards-based grading (SBG) is often framed as a structural change; new scales, new gradebooks, new reporting formats. While debates continue about the value of modernizing grading; such as SBG, I contend (and others like Reibel and Link & Guskey) it's often implemented when systems are ready and we miss the point in the first place. At its core, SBG is about calibration. It is about defining proficiency precisely. It is about answering: What does “proficient” actually look like? How is it different from “developing”? What separates surface understanding from deep transfer?
Robert Marzano’s take on proficiency scales emphasizes that learning progressions must be clearly articulated and distinguishable. Without clarity in performance levels, scales collapse into subjectivity. Whether it's a learning progression, proficiency scale, or rubric these attending tools provide opportunities for clarity of proficiency beyond vague terms or a percentage grade.
When success criteria are specific, teachers calibrate more consistently.
When performance levels are well-defined, students aim more accurately.
When expectations are transparent, trust increases.
Precision Is Not the Same as Validity
An 87.4% feels precise. An 89.2% feels definitive. An 84.67% feels almost scientific.
But stop for a moment. If I asked you to explain exactly what an 87.4% communicates about a student's understanding — could you? Could the student? Could their parents? If that number appeared on a transcript five years from now, would anyone be able to reconstruct what it actually meant about what that child knew and could do?
Because here's what that number often actually represents: a clinical computation of homework completion, participation points, extra credit, behavior penalties, late deductions, tests, and unevenly weighted assignments — all collapsed into a single percentage. As Ken O'Connor argues in A Repair Kit: 15 Fixes for Broken Grades, too many teachers see their role as calculating grades rather than determining them. The gradebook does the math. But math isn't meaning.
Educational researcher Leann Jung puts it plainly: precision is not akin to validity. The number looks exact. The learning signal is anything but.
A tightly defined strike zone doesn't just tell the umpire where the edges are — it tells everyone watching what a strike actually means. Imagine if our grades did the same.
100 Miles Per Hour And More Accurate?
What stunned me most about the 2024 to 2025 improvement in umpiring wasn’t just the numbers. It was the speed. These pitches are coming in at 98… 99… 100 miles per hour.
The human eye has milliseconds to decide. And yet, when the allowable buffer shrank, accuracy improved.
How?
Because the system sharpened what counted. The definition tightened. The feedback loop intensified. Calibration became more deliberate.
When professionals are trained against a clearer target, their perception sharpens, and naturally so will the betters. While I haven’t come across the heat maps for batters, my sense is they too will adjust; much like our students will.
The classroom parallel is profound.
Teaching is fast.
Questions come rapidly.
Student work piles up.
Decisions about understanding happen in real time.
Just like a 100-mph fastball, instructional judgment often happens in milliseconds.
We see learning differently when we consistently: Unpack standards into precise learning targets, define what proficiency actually looks like, study exemplars, calibrate scoring with colleagues, and share success criteria transparently with students.
Clarity changes perception.
We stop asking, “Is this good?”
And start asking, “Does this meet the defined level of proficiency?”
The Villain Narrative
Even as accuracy in MLB has improved year after year, umpires are still cast as the villain. Fans boo. Players argue. Slow-motion replays circulate on social media highlighting every miss. And yet, the data tells a different story.
Teachers know this feeling intimately.
Picture this: you've spent hours reviewing student work, made a careful professional judgment about what the evidence shows, and returned it with a grade. Then comes the email. Or the conference. Or the hallway conversation with an administrator holding a progress report or grade card.
"You graded that too harshly." "That's not fair." "My child worked hard."
And you're standing there, the focal point of frustration, absorbing something that was never really about you. Because the student worked hard and nobody defined for them what hard work was supposed to produce. The standard was loose; worse yet non-existent. The performance levels were never clearly articulated. Nobody calibrated collaboratively across classrooms. So when the grade landed, it felt arbitrary. Because to everyone on the outside, it was.
The umpire isn't the problem when the strike zone is a suggestion. And the teacher isn't the problem when proficiency is a feeling.
When course expectations are loosely articulated, when grade-level standards aren't unpacked, and when performance levels aren't calibrated collaboratively, teachers reach for the most available crutch: the electronic gradebook. It offers the appearance of precision, 87.4%, 89.2%, 91.67% while obscuring whether any real learning occurred.
The villain isn't the person making the call. It's the system that never clearly defined what a strike looks like.
Professionals in the Moment
There is a deep similarity between a teacher and an umpire. Both are asked to determine proficiency in real time.
No one seriously argues that, given unlimited slow motion and infinite review, umpires couldn’t approach perfection. Likewise, if a teacher had unlimited time to analyze every student artifact with extended collaborative review, their calibration would tighten dramatically.
The issue is not professional capability. It is the condition of judgment.
The Difference Is in the Review
MLB invested in feedback systems: Replay, Pitch tracking and narrowly defined buffers. Don’t forget performance analytics.
The result? Sharper calibration.
In schools, teachers often operate without systematic calibration because they operate with limited collaborative scoring time, limited understanding of standards, and a reluctance on electronic gradebooks that compute but do not clarify.
We ask teachers to make high-stakes proficiency judgments in real time, yet rarely provide the systems and professional learning to refine them.
What If We Took Calibration Seriously?
Imagine if schools invested in calibration the way MLB invested in umpire accuracy.
Clear, tightly defined proficiency scales and collaborative scoring sessions. Anchored exemplars at every performance level with ongoing feedback about scoring consistency, and a separation of behavior from academic achievement
No parent. No student. No school leader would object to clearer, more consistent determination of learning proficiency.
The resistance rarely comes from clarity. It comes from inconsistency. If most of America’s grading history made grading a game, we can’t expect students and parents not to play it, but we contend grading is not a game; it's communication.
Edutopia is a treasure trove of articles, videos, and resources aimed at improving K-12 education. The blog covers a wide range of topics, including innovative teaching strategies, classroom management tips, and social-emotional learning.
Final Thoughts
MLB has shown that even at 100 miles per hour, performance improves when the strike zone tightens and feedback strengthens.
Teachers, like umpires, are the most qualified professionals to determine proficiency in the moment.
Clarity improves accuracy.
Transparency builds trust.
Calibration strengthens professional judgment.
The problem is not the people.
It is the buffer.
Define the zone.
Tighten the buffer.
Invest in calibration.
Then trust professionals (and “players”) to rise to it.