Industry2026-03-23·8 min read

Different Paper Sets, Different Futures: How Digital Moderation Solves the Difficulty Disparity Problem

CBSE Class 12 Physics and Class 10 Maths papers sparked outrage over unequal difficulty across sets. A PIL has been filed. Here's how digital evaluation with real-time moderation can detect and correct scoring anomalies before results are declared.

The Controversy

In March 2026, students taking CBSE Class 12 Physics and Class 10 Mathematics board exams walked out of examination halls with very different experiences — depending on which question paper set they received.

Students who got Set 1 of the Physics paper described it as straightforward and aligned with the syllabus. Students who got Set 2 or Set 3 described questions that felt like JEE Main or JEE Advanced level — significantly harder than what a board exam should contain.

The disparity was not subtle. Social media exploded with comparisons. Students posted specific questions side by side, highlighting the difficulty gap. The hashtag trended nationally.

Educator Prashant Kirad published a viral video calling out the disparity and subsequently filed a Public Interest Litigation (PIL) in court, demanding lenient marking, grace marks, and uniform difficulty standards across paper sets. As of this writing, CBSE has not confirmed any grace marks, and the court's response is awaited.

This is not the first time paper set difficulty has been controversial. But it is the first time it has happened in a year when CBSE is also rolling out On-Screen Marking — a system that, if used properly, has the tools to detect and address exactly this kind of problem.

Why Multiple Paper Sets Exist

Examination boards use multiple paper sets for a valid reason: exam security. If every student in the country receives the same question paper, a leak at any single centre compromises the entire examination. Multiple sets — with the same syllabus coverage, mark distribution, and topic weightage but different specific questions — limit the damage of any individual leak.

The principle is sound. The challenge is ensuring that all sets are equivalent in difficulty.

How Difficulty Calibration Works (In Theory)

Paper setting committees are supposed to ensure equivalence through:

Blueprint adherence — All sets follow the same blueprint: same number of easy, moderate, and difficult questions; same topic distribution; same mark allocation

Expert review — Senior subject experts review all sets to verify comparable difficulty

Statistical moderation — If post-exam analysis shows one set was harder, marks can be moderated (adjusted upward) for students who received that set

In practice, the first two steps are subjective. Two experienced physics professors may genuinely disagree on whether a particular question is "moderate" or "difficult." And the third step — statistical moderation — only works if the examination system has the data infrastructure to detect disparities and the policy framework to act on them.

The Problem With Paper-Based Detection

In traditional paper-based evaluation, detecting difficulty disparity is slow and reactive:

Evaluation happens over 3–4 weeks — Marks are entered on physical award lists, which are then manually compiled

Statistical analysis happens after compilation — Only once all marks are entered into the result processing system can anyone compare average marks across paper sets

By this point, results are nearly ready — Any moderation must be rushed, and the decision to apply grace marks becomes a last-minute political judgment rather than a data-driven correction

The analysis is coarse — Paper-based systems track total marks per student, not marks per question. It is difficult to identify exactly which questions caused the disparity

This is why grace marks decisions so often feel arbitrary to students. The data to make precise, fair adjustments is simply not available in time.

How Digital Evaluation Changes This

In a digital evaluation system, every mark is entered per question, in real-time, and is immediately available for analysis. This creates a fundamentally different capability for detecting and addressing difficulty disparity.

Real-Time Score Distribution Monitoring

As evaluators mark answer sheets, the system continuously calculates:

Average marks per question — across all students, across all paper sets

Score distribution per set — Is Set 2 producing a lower average than Set 1?

Question-level difficulty analysis — Which specific questions in Set 3 are producing lower scores than their counterparts in Set 1?

This analysis does not happen after evaluation is complete. It happens during evaluation, updating with every mark entry. A chief examiner can see, on day 3 of evaluation, that Set 3 Question 14(b) has an average score of 1.2 out of 5, while the equivalent question in Set 1 averages 3.8 out of 5. That is a signal — either the question is genuinely harder, or the marking scheme needs clarification.

Automated Anomaly Alerts

The system can be configured to automatically flag when score distributions across paper sets diverge beyond acceptable thresholds:

Set-level alert — "Set 3 Physics average is 12% below Set 1. Investigation recommended."

Question-level alert — "Question 22 in Set 2 has a 40% lower average than corresponding questions in other sets."

Evaluator-specific alert — "Evaluator X is marking Set 3 papers 15% lower than other evaluators marking the same set." (This distinguishes between genuine difficulty disparity and individual evaluator bias.)

These alerts reach the chief examiner and moderation team while evaluation is still in progress — not after results are compiled.

Data-Driven Moderation

When disparity is detected, digital evaluation provides the data to moderate precisely:

Question-level moderation: If Question 14(b) in Set 3 is demonstrably harder than its counterpart in Set 1, marks for that specific question can be moderated — not a blanket grace mark for the entire paper, but a targeted adjustment for the specific source of disparity.

Set-level normalization: If overall analysis confirms that Set 3 was harder, the system can apply statistical normalization to Set 3 scores — adjusting them upward based on the measured difficulty differential, ensuring that a student who scored 60 on Set 3 is not disadvantaged compared to a student who scored 60 on Set 1.

Transparent audit trail: Every moderation decision — why it was made, what data supported it, what adjustment was applied — is logged in the system. If challenged legally (as CBSE is now facing with the PIL), the institution can produce a complete evidence trail showing that the moderation was systematic, data-driven, and fair.

What the PIL Is Really Asking For

Prashant Kirad's PIL demands three things:

Lenient answer checking for students who received harder paper sets

Grace marks to compensate for the difficulty disparity

Uniform difficulty standards across paper sets in future examinations

The first two demands are reactive — they address the damage after it has occurred. The third demand is proactive but difficult to enforce perfectly, because difficulty calibration is inherently subjective.

Digital evaluation offers a fourth path that the PIL does not envision: systematic, real-time detection and correction of difficulty disparity as part of the evaluation process itself. Not grace marks applied after public outrage, but statistical moderation applied automatically when the data shows a problem.

This is not a hypothetical capability. Institutions using digital evaluation platforms with moderation workflows can implement this today.

The Moderation Workflow

Here is how a properly configured digital evaluation system handles paper set disparity:

During Evaluation

System monitors score distributions across paper sets in real-time

Alerts fire when set-level or question-level averages diverge beyond the configured threshold (e.g., more than 5% difference in set averages)

Chief examiner reviews the flagged data and determines whether the disparity reflects genuine difficulty differences or other factors (e.g., evaluator allocation patterns)

After Evaluation, Before Results

Statistical analysis confirms the disparity with full evaluation data

Moderation committee reviews the analysis and decides on the appropriate correction method (question-level adjustment, set-level normalization, or a combination)

System applies the moderation — automatically adjusting marks for affected students

Audit trail records every moderation decision with supporting data

At Result Declaration

Results reflect moderated marks — students who received harder paper sets have their scores adjusted to reflect equivalent difficulty

Moderation methodology is documented — available for audit, accreditation review, or legal challenge

This entire workflow is impossible in paper-based evaluation — not because the statistical methods do not exist, but because the per-question mark data is not available digitally in time to act on it.

The Deeper Issue: Fairness at Scale

The CBSE paper difficulty controversy is not really about physics questions. It is about a fundamental fairness question that every large-scale examination must answer:

When two students of equal ability receive papers of different difficulty, how do you ensure they receive comparable marks?

Paper-based systems answer this question poorly — with delayed data, coarse adjustments, and politically charged decisions about grace marks. The result is a system where students' futures are affected by the luck of which paper set they receive.

Digital evaluation does not eliminate the possibility of difficulty disparity in paper setting. But it provides the infrastructure to detect disparity early, quantify it precisely, and correct it systematically — before results are declared, before PILs are filed, and before students' futures are compromised.

For Students and Parents

If you are concerned about paper set difficulty affecting your marks:

Digital evaluation systems auto-total marks — There will be no totalling errors regardless of which set you received

Moderation workflows exist in well-designed digital platforms to detect and correct difficulty disparity

Per-question data means adjustments can be precise, not blanket grace marks that do not address the actual problem

Ask your board whether they have a documented moderation policy for paper set disparity. If they do not, they should

For Institutions

If your institution uses multiple paper sets (and most should, for security reasons):

Ensure your evaluation platform tracks marks per question, not just per paper — This is the foundation of difficulty disparity detection

Configure anomaly alerts for set-level and question-level score distributions

Establish a documented moderation policy before the evaluation cycle begins — not as a reactive measure after controversy

Train your chief examiners on interpreting real-time evaluation analytics

Maintain an audit trail of all moderation decisions — this protects the institution legally and demonstrates fairness to accreditation bodies