Industry2026-06-20·7 min read

10 to 70 on the Same Paper: MBBS Evaluator Inconsistency and India's Medical Education Accountability Gap

When four evaluators award marks ranging from 10 to 70 on the same MBBS answer paper, averaging conceals a systemic problem. The Karnataka High Court's 2026 ruling on RGUHS reveals why medical education evaluation urgently needs structural reform.

The Protest That Raised the Right Question

MBBS students at Rajiv Gandhi University of Health Sciences (RGUHS) in Bengaluru organised protests outside the university demanding a review of their examination marks. Their core complaint was specific and quantifiable: different evaluators assessing the same answer paper had arrived at starkly different scores. The variation was not a matter of a few marks in either direction. Marks from the four evaluators assigned to a single script ranged from as low as 10 to as high as 70.

The university's position — that the system was working as designed, with marks averaged across all four evaluators to produce the final score — did not satisfy students, and for understandable reasons.

An average of 10 and 70 is 40. An average of 45 and 50 is also close to 47. A student who receives 40 marks has passed but will likely not qualify for a post-graduate entrance examination programme that requires 50 percent or above. Whether that student's result is 40 or 47 may depend entirely on which two evaluators from the pool of four happen to mark their script. That is not evaluation. It is a lottery with a rubric attached.

The Court's Ruling and Its Limits

In 2026, the Karnataka High Court heard a petition from RGUHS students seeking a third round of evaluation — arguing that the existing two rounds and the resulting average had failed to produce a defensible outcome. The court held that it could not override the statutory rules of RGUHS. The university's evaluation framework permits multi-evaluator averaging. The procedure was followed. No legal violation could be established.

The ruling is legally sound. Courts in India have consistently held that they will not interfere with examination evaluation processes that have followed the institution's own prescribed rules. The judiciary's role is not to substitute its judgment for that of examination authorities on the question of how marks should be assigned.

But the ruling also clarifies the structural trap that students in institutions with inadequate evaluation frameworks face. When a university's rules permit a process that produces marks varying by 60 points on the same script, and those rules are followed, students have no legal remedy. The process was not violated. The process is the problem.

Why the Averaging Model Fails

The four-evaluator averaging model sounds rigorous. In practice, it conceals rather than corrects inconsistency.

Consider what the range 10–70 implies statistically. A range of 60 points suggests that evaluators are not assessing the same evidence against the same standard. They are making profoundly different judgments about what constitutes a passing answer, what deserves credit, and how partial knowledge should be scored.

Averaging four such divergent scores does not resolve the disagreement. It masks it. The resulting number — say, 42 — does not represent a consensus judgment about the quality of the answer. It represents a numerical compromise between four incompatible assessments. The student whose marks were averaged from 10 and 70 received 40 marks not because any evaluator believed their work was worth 40, but because the statistical midpoint of two incompatible evaluations happened to fall there.

The specific failure mode is the absence of structured moderation. In well-designed double-evaluation systems, a large divergence between evaluators does not produce an average — it triggers a mandatory review. A senior examiner or moderation panel sees both the original evaluations and the answer script, and sets a defensible final mark based on their own reading of the evidence. The question is resolved rather than arithmetically disguised.

RGUHS's multi-evaluator model does not include this step. When evaluators diverge widely, the average stands.

What Makes Medical Education Different

Evaluation inconsistency is a systemic problem across Indian higher education — not unique to medical colleges. But medical education carries stakes that make the problem more acute.

A student who receives incorrect marks in a B.Com examination faces a real harm: delayed graduation, missed placement opportunity, incorrect academic record. These are serious. The harm is bounded.

A student who receives incorrect marks in an MBBS final examination may be denied a passing grade and thus a medical license. Their career is affected at its most formative stage. That student, if they ultimately pass and practise, has been through an evaluation process whose integrity cannot be verified. And the evaluation system that produced their marks is the same one evaluating every other student who will eventually be licensed to treat patients.

This is not to suggest that incorrect marks are common or that unsafe doctors are being produced. Most MBBS evaluation, even in manual systems, produces broadly accurate results. But the inability of the evaluation process to demonstrate its accuracy — to show, with evidence, that marks were assigned consistently and verifiably — is a structural deficiency that does not belong in the credentialing system for any licensed profession.

The Technical Architecture That Addresses This

The RGUHS situation is a textbook case for what double-blind evaluation with statistical moderation is designed to prevent.

Evaluator isolation is the foundational requirement. When evaluators cannot see each other's marks before submitting their own, each assessment is genuinely independent. The RGUHS system's multi-evaluator approach fails this standard if evaluators at a shared valuation camp can observe each other's scoring.

Threshold-based moderation converts extreme divergence from an averaging problem into a review requirement. If evaluator A gives 15 and evaluator B gives 70 on the same script, the platform does not average these marks and record 42.5. It flags the script as requiring moderation review, suppresses the marks from both evaluators, and routes the script to a senior examiner who reviews it independently. The final mark is the moderation decision, not the average of an irreconcilable pair.

Structured rubrics embedded in the evaluation interface reduce the latitude that drives extreme variance. When evaluators see, alongside the answer, a breakdown of sub-question marks and the expected content for each allocation level, their independent assessments tend to converge significantly more than when they evaluate against an unstated internal standard.

Complete audit trail means that when marks are challenged, the review is not a reconstruction exercise. The platform shows exactly which evaluator gave which marks at which question level, when the script was viewed, and what moderation decision was applied. This evidence is available instantly.

The Broader Pattern Across Indian Medical Colleges

RGUHS is not the only institution where this pattern has emerged. MBBS students in Bengaluru have protested at RGUHS over evaluation discrepancies on multiple occasions. Students at North Bengal Medical College in Siliguri raised allegations of marksheet tampering and preferential treatment in internal assessments, resulting in the resignation of two senior officials. The Rajiv Gandhi University of Health Sciences case that reached the Karnataka High Court is one of several that have tested the boundaries of judicial intervention in examination evaluation.

The National Medical Commission (NMC), which succeeded the Medical Council of India in regulating medical education, prescribes curriculum frameworks and competency requirements for MBBS programmes. It does not prescribe digital evaluation requirements for affiliated colleges. The decision to implement structured, technology-supported evaluation rests with individual universities and, in practice, rarely reaches the top of the institutional priority list unless a crisis forces it there.

What Institutions Can Do Now

The changes required are both procedural and technological:

Replace multi-evaluator averaging with structured double-blind evaluation followed by threshold-moderated review for all divergences exceeding a defined range

Embed question-level marking rubrics into evaluator interfaces, whether digital or physical — the rubric structure does not require a digital platform to provide value, but digital platforms enforce it most reliably

Define and publish the moderation procedure so that students, evaluators, and institutional leaders all understand what happens when evaluators disagree significantly

Implement digital evaluation to create the audit trail that makes all of the above verifiable rather than merely asserted

The last step enables the first three to be demonstrated rather than claimed. When a student or a court asks how marks were determined, the answer should be producible from the system in minutes — not reconstructed from paper records over weeks.