Industry2026-03-23·8 min read

Different Paper Sets, Different Futures: How Digital Moderation Solves the Difficulty Disparity Problem

CBSE Class 12 Physics and Class 10 Maths papers sparked outrage over unequal difficulty across sets. A PIL has been filed. Here's how digital evaluation with real-time moderation can detect and correct scoring anomalies before results are declared.

Different Paper Sets, Different Futures: How Digital Moderation Solves the Difficulty Disparity Problem

The Controversy

In March 2026, students taking CBSE Class 12 Physics and Class 10 Mathematics board exams walked out of examination halls with very different experiences — depending on which question paper set they received.

Students who got Set 1 of the Physics paper described it as straightforward and aligned with the syllabus. Students who got Set 2 or Set 3 described questions that felt like JEE Main or JEE Advanced level — significantly harder than what a board exam should contain.

The disparity was not subtle. Social media exploded with comparisons. Students posted specific questions side by side, highlighting the difficulty gap. The hashtag trended nationally.

Educator Prashant Kirad published a viral video calling out the disparity and subsequently filed a Public Interest Litigation (PIL) in court, demanding lenient marking, grace marks, and uniform difficulty standards across paper sets. As of this writing, CBSE has not confirmed any grace marks, and the court's response is awaited.

This is not the first time paper set difficulty has been controversial. But it is the first time it has happened in a year when CBSE is also rolling out On-Screen Marking — a system that, if used properly, has the tools to detect and address exactly this kind of problem.

Why Multiple Paper Sets Exist

Examination boards use multiple paper sets for a valid reason: exam security. If every student in the country receives the same question paper, a leak at any single centre compromises the entire examination. Multiple sets — with the same syllabus coverage, mark distribution, and topic weightage but different specific questions — limit the damage of any individual leak.

The principle is sound. The challenge is ensuring that all sets are equivalent in difficulty.

How Difficulty Calibration Works (In Theory)

Paper setting committees are supposed to ensure equivalence through:

  • Blueprint adherence — All sets follow the same blueprint: same number of easy, moderate, and difficult questions; same topic distribution; same mark allocation
  • Expert review — Senior subject experts review all sets to verify comparable difficulty
  • Statistical moderation — If post-exam analysis shows one set was harder, marks can be moderated (adjusted upward) for students who received that set
  • In practice, the first two steps are subjective. Two experienced physics professors may genuinely disagree on whether a particular question is "moderate" or "difficult." And the third step — statistical moderation — only works if the examination system has the data infrastructure to detect disparities and the policy framework to act on them.

    The Problem With Paper-Based Detection

    In traditional paper-based evaluation, detecting difficulty disparity is slow and reactive:

  • Evaluation happens over 3–4 weeks — Marks are entered on physical award lists, which are then manually compiled
  • Statistical analysis happens after compilation — Only once all marks are entered into the result processing system can anyone compare average marks across paper sets
  • By this point, results are nearly ready — Any moderation must be rushed, and the decision to apply grace marks becomes a last-minute political judgment rather than a data-driven correction
  • The analysis is coarse — Paper-based systems track total marks per student, not marks per question. It is difficult to identify exactly which questions caused the disparity
  • This is why grace marks decisions so often feel arbitrary to students. The data to make precise, fair adjustments is simply not available in time.

    How Digital Evaluation Changes This

    In a digital evaluation system, every mark is entered per question, in real-time, and is immediately available for analysis. This creates a fundamentally different capability for detecting and addressing difficulty disparity.

    Real-Time Score Distribution Monitoring

    As evaluators mark answer sheets, the system continuously calculates:

  • Average marks per question — across all students, across all paper sets
  • Score distribution per set — Is Set 2 producing a lower average than Set 1?
  • Question-level difficulty analysis — Which specific questions in Set 3 are producing lower scores than their counterparts in Set 1?
  • This analysis does not happen after evaluation is complete. It happens during evaluation, updating with every mark entry. A chief examiner can see, on day 3 of evaluation, that Set 3 Question 14(b) has an average score of 1.2 out of 5, while the equivalent question in Set 1 averages 3.8 out of 5. That is a signal — either the question is genuinely harder, or the marking scheme needs clarification.

    Automated Anomaly Alerts

    The system can be configured to automatically flag when score distributions across paper sets diverge beyond acceptable thresholds:

  • Set-level alert — "Set 3 Physics average is 12% below Set 1. Investigation recommended."
  • Question-level alert — "Question 22 in Set 2 has a 40% lower average than corresponding questions in other sets."
  • Evaluator-specific alert — "Evaluator X is marking Set 3 papers 15% lower than other evaluators marking the same set." (This distinguishes between genuine difficulty disparity and individual evaluator bias.)
  • These alerts reach the chief examiner and moderation team while evaluation is still in progress — not after results are compiled.

    Data-Driven Moderation

    When disparity is detected, digital evaluation provides the data to moderate precisely:

    Question-level moderation: If Question 14(b) in Set 3 is demonstrably harder than its counterpart in Set 1, marks for that specific question can be moderated — not a blanket grace mark for the entire paper, but a targeted adjustment for the specific source of disparity.

    Set-level normalization: If overall analysis confirms that Set 3 was harder, the system can apply statistical normalization to Set 3 scores — adjusting them upward based on the measured difficulty differential, ensuring that a student who scored 60 on Set 3 is not disadvantaged compared to a student who scored 60 on Set 1.

    Transparent audit trail: Every moderation decision — why it was made, what data supported it, what adjustment was applied — is logged in the system. If challenged legally (as CBSE is now facing with the PIL), the institution can produce a complete evidence trail showing that the moderation was systematic, data-driven, and fair.

    What the PIL Is Really Asking For

    Prashant Kirad's PIL demands three things:

  • Lenient answer checking for students who received harder paper sets
  • Grace marks to compensate for the difficulty disparity
  • Uniform difficulty standards across paper sets in future examinations
  • The first two demands are reactive — they address the damage after it has occurred. The third demand is proactive but difficult to enforce perfectly, because difficulty calibration is inherently subjective.

    Digital evaluation offers a fourth path that the PIL does not envision: systematic, real-time detection and correction of difficulty disparity as part of the evaluation process itself. Not grace marks applied after public outrage, but statistical moderation applied automatically when the data shows a problem.

    This is not a hypothetical capability. Institutions using digital evaluation platforms with moderation workflows can implement this today.

    The Moderation Workflow

    Here is how a properly configured digital evaluation system handles paper set disparity:

    During Evaluation

  • System monitors score distributions across paper sets in real-time
  • Alerts fire when set-level or question-level averages diverge beyond the configured threshold (e.g., more than 5% difference in set averages)
  • Chief examiner reviews the flagged data and determines whether the disparity reflects genuine difficulty differences or other factors (e.g., evaluator allocation patterns)
  • After Evaluation, Before Results

  • Statistical analysis confirms the disparity with full evaluation data
  • Moderation committee reviews the analysis and decides on the appropriate correction method (question-level adjustment, set-level normalization, or a combination)
  • System applies the moderation — automatically adjusting marks for affected students
  • Audit trail records every moderation decision with supporting data
  • At Result Declaration

  • Results reflect moderated marks — students who received harder paper sets have their scores adjusted to reflect equivalent difficulty
  • Moderation methodology is documented — available for audit, accreditation review, or legal challenge
  • This entire workflow is impossible in paper-based evaluation — not because the statistical methods do not exist, but because the per-question mark data is not available digitally in time to act on it.

    The Deeper Issue: Fairness at Scale

    The CBSE paper difficulty controversy is not really about physics questions. It is about a fundamental fairness question that every large-scale examination must answer:

    When two students of equal ability receive papers of different difficulty, how do you ensure they receive comparable marks?

    Paper-based systems answer this question poorly — with delayed data, coarse adjustments, and politically charged decisions about grace marks. The result is a system where students' futures are affected by the luck of which paper set they receive.

    Digital evaluation does not eliminate the possibility of difficulty disparity in paper setting. But it provides the infrastructure to detect disparity early, quantify it precisely, and correct it systematically — before results are declared, before PILs are filed, and before students' futures are compromised.

    For Students and Parents

    If you are concerned about paper set difficulty affecting your marks:

  • Digital evaluation systems auto-total marks — There will be no totalling errors regardless of which set you received
  • Moderation workflows exist in well-designed digital platforms to detect and correct difficulty disparity
  • Per-question data means adjustments can be precise, not blanket grace marks that do not address the actual problem
  • Ask your board whether they have a documented moderation policy for paper set disparity. If they do not, they should
  • For Institutions

    If your institution uses multiple paper sets (and most should, for security reasons):

  • Ensure your evaluation platform tracks marks per question, not just per paper — This is the foundation of difficulty disparity detection
  • Configure anomaly alerts for set-level and question-level score distributions
  • Establish a documented moderation policy before the evaluation cycle begins — not as a reactive measure after controversy
  • Train your chief examiners on interpreting real-time evaluation analytics
  • Maintain an audit trail of all moderation decisions — this protects the institution legally and demonstrates fairness to accreditation bodies
  • Related Reading

  • CBSE Introduces On-Screen Marking for Class 12 — The digital evaluation system at the centre of this controversy
  • CBSE's OSM Hit by Technical Glitches — What else went wrong with the March 2026 rollout
  • Understanding Double Valuation — Another moderation mechanism for ensuring fair evaluation
  • Ready to digitize your evaluation process?

    See how MAPLES OSM can transform exam evaluation at your institution.