Jodhpur's AI Breakthrough: 70,000 Subjective Answer Sheets Graded in 3 Days
Rajasthan's Jodhpur district completed India's first large-scale AI evaluation of subjective answers — 70,000 students, 1,000 schools, 5 subjects — in three days. Here is what universities should take from this pilot.

A Pilot That Changes What Is Possible
For most of Indian education's history, the phrase "subjective evaluation" has meant one thing: a qualified human examiner reading a written answer and applying professional judgement to assign marks. The process is inherently slow, variable, and difficult to scale. A university with 3 lakh answer books to evaluate in 30 days has no alternative but to distribute that workload across thousands of evaluators and hope the variability averages out.
An April 2026 pilot from Jodhpur, Rajasthan challenges that assumption in ways that Indian universities cannot afford to ignore.
What Jodhpur Did
Under the second phase of the Competency-Based Census Assessment and School Reporting Pilot Project, Jodhpur district deployed an AI-driven evaluation system across more than 1,000 schools. The pilot covered 70,000 students in Classes 6 through 9, evaluating responses in five core subjects: English, Hindi, Mathematics, Science, and Social Science. The system handled both Hindi-medium and English-medium schools simultaneously.
Teachers scanned student answer sheets using an AI-enabled application. The system then graded subjective responses — not multiple-choice questions, but written answers — and generated detailed report cards in seconds per script.
The result: an evaluation exercise that would ordinarily take weeks was completed in three days.
This is not the first time AI has been used for assessment in India. Automated scoring of objective and structured responses has existed for years. What makes Jodhpur significant is the phrase "subjective answers." The project is being characterised as one of the first large-scale deployments of AI to evaluate open-ended, written responses in India — moving beyond bubble sheets and MCQ formats into the territory that has long been considered the exclusive province of human judgement.
The Three-Day vs Three-Week Gap
The speed improvement deserves examination because it is not merely an operational convenience — it has downstream academic consequences.
When an evaluation cycle takes three weeks, school administration receives aggregate results that reflect average performance at the class or school level. By the time the data is actionable, the academic calendar has often moved on. Teachers design next-term instruction based on memory and intuition rather than fresh evidence.
When the same evaluation completes in three days, the data can be used in near-real time. Jodhpur's system generated reports not just at the student level but at the teacher, principal, and school level — identifying learning gaps topic-by-topic and flagging students who needed remedial intervention before the gaps widened.
This is the distinction between evaluation as a gatekeeping function and evaluation as a learning instrument. Gatekeeping asks: who passes? The learning instrument asks: what did every student understand, and what did they not? The second question requires speed that manual evaluation cannot deliver at the scale of 70,000 students.
What the AI Is Actually Doing
It is worth being specific about what AI-based subjective evaluation involves, because the terminology is sometimes used loosely.
In systems like Jodhpur's, AI evaluation of written answers typically involves one or more of the following techniques:
Natural Language Processing (NLP) for keyword and concept extraction. The system identifies whether the student's response contains the key concepts, terminology, or reasoning steps expected for a given question. This works well for factual and explanatory answers at the school level — "list the causes of X" or "explain how Y works" — where correct answers have identifiable components.
Similarity matching against model answers. Student responses are compared against a verified model answer or a set of high-scoring exemplar responses. The system assigns marks proportional to the semantic overlap.
Structured rubric application. For multi-part questions with defined marking schemes, the AI applies the rubric mechanically, checking whether each criterion has been addressed.
Confidence scoring and human escalation. Mature systems attach a confidence score to each AI-assigned mark. Responses where the AI confidence falls below a threshold are flagged for human review. This hybrid model — AI handles clear cases, humans handle ambiguous ones — is the architecture that most credible deployments use.
For Classes 6-9 in school-level subjects, the current generation of AI evaluation systems is reliable enough to produce actionable results. The system does not need to approach human accuracy for every question; it needs to be accurate enough that the aggregate insights — this school has weak conceptual understanding of fractions; this class shows strong vocabulary but poor paragraph structure — are trustworthy.
Implications for Indian Universities
The Jodhpur pilot is a school-level deployment. University examination involves more complex, domain-specific reasoning — graduate-level analysis, legal arguments, clinical case interpretations — that the current generation of commercial AI grading systems handles with far less reliability. A direct lift-and-shift of the Jodhpur model to a university examination centre is not yet feasible for most disciplines.
That said, universities should be paying close attention for several reasons.
The technology trajectory is steep. The same models that are now reliably scoring Class 8 Science responses were, three years ago, only credible for MCQ. Each improvement in large language model capability and training data availability narrows the gap between school-level and university-level complexity.
The hybrid model is viable today. Even if full AI evaluation of university-level subjective answers is premature, AI-assisted evaluation is not. Systems can pre-process answer books — flagging blank pages, detecting scanning anomalies, identifying structural issues, pre-scoring objective sub-sections within a paper — before the human evaluator sees the script. This reduces the cognitive load on evaluators and shortens the total cycle time.
Data generation for institutional improvement. The most undervalued output of Jodhpur's pilot is not the speed — it is the data. Topic-level competency maps for 70,000 students across 1,000 schools, generated in three days. Universities that adopt evaluation platforms with similar analytics capabilities would have access to programme-level outcome data that NAAC, NIRF, and NBA assessors currently cannot find in most institutional self-study reports.
Under NAAC's revised framework, Criterion 2 — Teaching-Learning and Evaluation — asks for evidence of continuous assessment, feedback mechanisms, and learning outcome measurement. The kind of granular performance data that emerges from AI-assisted evaluation systems is precisely the evidence that moves an institution from assertion to proof.
The Competency-Based Assessment Connection
The Jodhpur pilot is framed explicitly as a competency-based assessment exercise. This is significant in the context of NEP 2020, which calls for a shift from rote memorisation toward competency-based evaluation at all levels of education.
NEP 2020 envisions assessments that test "higher-order cognitive skills such as analysis, critical thinking and conceptual clarity" rather than the ability to reproduce memorised content. Operationalising that vision at scale has been one of the implementation challenges confronting PARAKH, the national assessment body created under NEP.
AI systems that can evaluate subjective responses in terms of concept application and reasoning quality — rather than keyword matching — offer a plausible path toward competency-based assessment at volume. The Jodhpur pilot does not solve this problem completely, but it demonstrates that the direction is technologically viable.
What Universities Should Do Now
Universities watching the Jodhpur pilot should take three concrete steps.
First, pilot AI-assisted evaluation for a bounded subset of their examination — an internal unit test, a mid-semester assessment, or a specific department's papers. The goal is not to replace human evaluation but to build familiarity with the workflow and the quality of AI outputs before they are applied to high-stakes examinations.
Second, ensure that their digital evaluation infrastructure — scanning, cloud storage, evaluator interface — is capable of integrating with AI-assisted evaluation layers. A platform that treats each answer sheet as a disconnected image file will not support the structured data extraction that AI evaluation requires.
Third, begin documenting performance outcome data today. Institutions that have three years of competency-level assessment data when the next NAAC peer visit arrives will be in a materially different position from those that have only aggregate pass percentages.
The gap between Jodhpur's three-day evaluation cycle and India's typical three-to-four-week university evaluation season is not a technology gap. It is an adoption gap — and it is closing.
Related Reading
Ready to digitize your evaluation process?
See how MAPLES OSM can transform exam evaluation at your institution.