Mathematics8 min read

How We Clustered JMO Written Problems Into Practice Sets


Published on

31 May 2026

Contributors

G
George Ionitsa

Quant Developer and Olympiad Coach

How We Clustered JMO Written Problems Into Practice Sets

We treated JMO preparation as a research problem: historical written questions, manual tagging, clustering, and curated practice sets — not random past papers.

Most JMO preparation still looks like this: print a past paper, hope the right type appears, repeat.

We think that is the wrong unit of study.

At Exact Science we treat competition preparation as a research problem. We start with historical questions, read what they actually demand, cluster them by reasoning habit, and only then build practice sets. This article walks through how we did that for the Junior Mathematical Olympiad — and what it tells us about how students should prepare for a fully written paper from 2025 onwards.

The paper we were modelling

UKMT changed the JMO structure in 2025:

EraStructureTotal marks
Up to 2024Section A: 10 numeric answers (1 mark each). Section B: 6 written questions (10 marks each).70
From 2025One section: 6 written questions only (10 marks each). No Section A.60

For the current exam, every question is a written proof-style task. When coaches still say “Section B”, they usually mean the legacy written half (old questions 11–16). That corpus is still the best historical match for the modern paper — which is why our analysis drew on legacy Section B and the six-question papers from 2025 onward.

Step 1: Define the corpus

We pulled every uploaded JMO written problem from our historical archive on problems.cc:

  • legacy papers: the six proof-style questions at the end of each paper (Section B, years up to 2024)
  • modern papers: all six questions, each worth 10 marks (2025 onward)

That gave us 125 written problems across roughly two decades of papers — far more than any student will sit in one season, but exactly the right scale for pattern detection.

We did not include Section A numeric questions. They train a different skill (exact answer under time pressure) and are irrelevant to the 2025 format.

Step 2: Extract structured metadata

For each problem we recorded:

  • year and paper edition
  • question number on the paper
  • full statement text
  • whether the task was short-answer numeric or written proof
  • marks available
  • which section it belonged to on legacy papers (A, B, or the modern single section)
  • whether a model solution was available for review

We organised this in a structured catalogue on problems.cc. Clean metadata matters: clustering on miscounted or mislabelled questions would produce the wrong practice.

Step 3: Quality filter

We excluded problems that were not safe to classify:

  • missing or incomplete statements
  • editions where the paper layout did not match the official UKMT format (wrong section labels, questions missing from the published paper)

We kept problems where a student could attempt a written solution from the statement alone. Solution text was not required for inclusion in a practice set, but we noted when official solutions existed for later review.

Step 4: Read the task, not just the keywords

Keyword matching is how content farms build “algebra worksheets”. It fails on olympiad problems.

A grid puzzle might need combinatorics, invariants, or case analysis. A geometry diagram might finish with divisibility. So we read each statement as a mathematical task:

  • What must the student produce? (a number, a proof, all possibilities)
  • What reasoning habit is central? (angle chase, digit constraint, recurrence, invariant)
  • How long is the expected written argument?

This step is human mathematical review on top of structured data. We are not claiming an LLM magically sorted the papers. We are claiming that preparation design should be informed by how problems behave, not by topic labels on a school scheme of work.

Step 5: Assign reasoning features

We tagged each problem with a small feature set (a problem can carry several):

Reasoning habitTypical signal in the statement
Geometry proofangles, circles, polygons, collinearity
Number theorydivisibility, digits, primes, “find all integers”
Algebra and sequencesratios, ages, recurrences, word setup
Combinatorics and gridscounting, paths, colourings, grids
Invariants and logic“prove no matter how…”, symmetry, parity
Longer proofmulti-step, late-paper, combined ideas

Tags were applied manually with consistency checks across years. Where two coaches disagreed, we defaulted to the habit the problem was trying to train, not the syllabus chapter it resembled.

Step 6: Cluster into preparation families

Aggregating tags across 125 problems, the same six preparation families kept appearing — stable enough to build curriculum around:

  1. Geometry proofs — diagram discipline, angle chasing, explicit reasons
  2. Number theory — divisibility, digits, systematic case checks
  3. Algebra and sequences — translate words into equations, finish with justification
  4. Combinatorics and logic — organised cases, grids, invariants
  5. Proof push — longer arguments, combined ideas, late-paper stamina
  6. Mixed — deliberate cross-topic sessions for students new to written JMO work

These are not UKMT’s official categories. They are Exact Science preparation clusters: groupings that predict what a student must practise to stop freezing when the route is not obvious.

Step 7: Curate balanced practice sets

Clustering is not the end. Students need ordered, publishable sets on problems.cc.

We applied simple curation rules:

  • 8–14 problems per set (enough variety, not overwhelming)
  • Spread of years where possible (avoid only one edition)
  • Allowed overlap when a problem genuinely trains two habits (same as our JMC practice design)
  • At least one modern (2025+) problem in the mixed set so the format change is visible

The output is six published collections:

SetProblemsWhat it trains
JMO Mixed Practice10Balanced entry across types
JMO Geometry Proofs13Angles, polygons, diagram proofs
JMO Number Theory14Divisibility, digits, integers
JMO Algebra & Sequences13Setup, equations, recurrences
JMO Combinatorics & Logic14Counting, grids, invariants
JMO Proof Push10Longer multi-step proofs

74 problems appear across those sets (with intentional overlap). Full past papers remain on the JMO past papers hub.

What the data said about difficulty

Parents often ask whether JMO is “impossible”. Our read of the corpus is more precise:

  • The syllabus is still junior. Angles, fractions, factors, basic algebra.
  • The demand is proof structure: choosing a route, justifying each step, writing so a marker can follow.

Students who excel on the JMC via recognition often stumble here — not because they lack knowledge, but because they have not practised thinking in public on paper.

A 10-mark question may need eight lines or eighteen. Markers punish unexplained leaps, not hard topics.

How we think students should use this

This is the study sequence we would use in the lab:

  1. One problem, one session — full written solution before reading any model answer.
  2. Start with Mixed Practice if JMO is new.
  3. Move to the cluster that matches repeated mistakes (geometry gaps, weak divisibility arguments, etc.).
  4. Return to a full timed paper for stamina once one cluster feels stable.
  5. Mark your own work on logic gaps, not just the final number.

That is a different model from “do every past paper in order”. It is closer to how we would diagnose a musician: isolate the habit, drill it, then perform the whole piece.

Why we publish the method, not just the sets

Exact Science is building preparation as a maths education lab, not a worksheet factory. When we release practice on problems.cc and interpret it here, we want parents and serious students to see:

  • where the problems came from
  • how they were grouped
  • what habit each set is meant to train

The next generation of study, as we see it, is data-informed and diagnostically sequenced — still taught by humans, still judged by markers, but no longer random.

For pathway context see olympiads.co.uk. For how we teach around competitions, see our Junior Mathematical Olympiad page and the route from the Junior Mathematical Challenge.

If your child has a JMO invitation, start with one mixed problem this week. Write the proof. Then pick the cluster that hurts most. That is how the sets were built — and how we think they should be used.