13 Jan 2026
Automation Bias at the Bedside: What CTCAE Workflows Teach Us
When AI says “Grade 2,” will clinicians still argue?
Automation bias is one of the most underappreciated risks in clinical AI. The more often a system gets things “mostly right,” the more likely clinicians are to accept its suggestions without full scrutiny. In CTCAE workflows, that bias can quietly reshape how oncology teams think about toxicity.
How automation bias shows up in CTCAE grading
Consider a clinical decision support tool that reads oncology notes and labs, then suggests CTCAE terms and grades. It highlights: “Diarrhea – Grade 2 (probability 0.91).” The research nurse glances at the note, sees the suggestion, and clicks accept. After dozens of similar encounters, it feels safe and efficient.
But what happens when:
The documentation is incomplete?
The patient’s baseline function is unusual?
The toxicity sits exactly at the boundary between Grade 2 and Grade 3?
If the AI has a strong opinion and the human is rushed, the suggestion may carry the day—even when a fully engaged expert might disagree.
Over time, this drip of deference can change behavior. Instead of using AI as a tool, clinicians begin to work for the AI, curating documentation to match its expectations and hesitating to override it.
Deskilling: losing the tacit knowledge of toxicity
CTCAE grading is partly technical, partly tacit. The technical part is in the tables; the tacit part lives in pattern recognition built over years of practice.
Automation bias threatens that tacit component. If human graders rarely walk through the full reasoning path—read the notes, reconstruct the story, cross-check CTCAE definitions—their ability to do so may erode. New clinicians trained in an AI-heavy environment might never fully develop those skills at all.
The risk is not that humans become unable to click the right dropdown, it is that they become less able to spot when the AI is wrong, incomplete, or blind to context.
Designing CTCAE AI to resist automation bias
We cannot wish automation bias away. We have to design against it.
Several design choices make a difference:
Friction for acceptance, ease for modification
Accepting a suggestion should require at least a glance at the underlying evidence. Editing or downgrading should be as easy—or easier—than accepting. Interfaces that make overrides painful will amplify bias.
Visible uncertainty, not false certainty
Instead of presenting a single grade as “the answer,” systems should show uncertainty ranges or alternate plausible grades (“Grade 2 or 3 – borderline, review required”). This cues clinicians that their input is genuinely needed.
Periodic “AI-off” sampling
Governance can require that a subset of encounters be graded without AI suggestions, then compared. This both monitors deskilling and reinforces humans’ ability to operate independently.
Feedback loops that highlight disagreement
When clinicians often disagree with certain types of suggestions, the system should learn from that—not by forcing conformity, but by surfacing patterns for review.
CTCAE as an anchor for retaining expertise
CTCAE’s structured nature can actually help preserve expertise if used correctly. Because definitions are explicit, they can serve as a shared reference point for calibrating human and AI behavior:
Regular calibration sessions can compare human-only, AI-only, and joint grading.
Disagreements can be mapped back to specific CTCAE wording, improving both training and model design.
Governance committees can track how often humans override AI and for which terms, turning automation bias into a measurable phenomenon.
Rather than silently pulling clinicians into alignment, AI becomes a tool that makes variability visible and tractable.
The real adversary: unexamined reliance
AI is neither ally nor adversary by nature. The adversary in CTCAE workflows is unexamined reliance—using AI without mechanisms to detect when it is distorting human judgment.
We should expect automation bias and deskilling pressures. Then we should build CTCAE systems, dashboards, and governance policies that blunt those pressures, reinforce human accountability, and continuously test whether expertise is being maintained.
When disagreement is easy, safe, and expected, clinicians are far more likely to keep arguing with “Grade 2” when it feels wrong. That is exactly what patient safety requires.
Marc Saint-jour, MD
Back to Blog














