13 Jan 2026

Inter-Rater Variability in CTCAE Grading

Introduction: The Invisible Source of Noise

Inter-rater variability is one of the least discussed yet most influential factors in oncology safety data. Two experienced clinicians can interpret the same clinical scenario differently, particularly at the boundaries between CTCAE grades.

These differences rarely reflect negligence. They emerge from how symptoms are elicited, documented, and interpreted under real-world constraints.

Where Variability Comes From

CTCAE definitions often hinge on functional impact, such as interference with activities of daily living. Patients describe limitations differently, clinicians probe with varying depth, and documentation styles differ across sites. Small differences early in this chain lead to divergent grades later.

Additionally, local practice norms influence grading culture. What is considered "moderate" at one site may be considered "severe" at another.

Why Variability Matters

Inter-rater variability introduces noise into trial safety data. It can exaggerate or mask toxicity signals, complicate cross-site comparisons, and increase the burden of data queries. Importantly, variability is rarely random; it clusters by site, role, or experience level.

Measuring Rather Than Ignoring

Treating inter-rater variability as inevitable is a missed opportunity. Agreement metrics, targeted training, and structured review processes can identify where grading diverges most. Visibility is the first step toward improvement.

The Role of Decision Support

Human-in-the-loop tools can reduce variability by consistently surfacing definitions, evidence, and comparable cases. They do not eliminate judgment, but they create a more standardized starting point. This shifts variability from hidden noise to explicit, reviewable decisions.

Back to Blog