Methodology

How the rankings are built

This page documents how the Top 100 list is constructed, what's in the data, and what's deliberately out.

Data sources

SourceWhat it givesLimitations
arXiv (math.NT, math.CO)Preprint-level: titles, abstracts, authors, dates, co-author graphBiased toward people who post preprints. Senior figures who publish only in journals are undercounted.
OpenAlexAuthor-level: paper count, citations, affiliations, countryConcept tagging is noisy in math; surname-only matching can misidentify.
Math Genealogy ProjectAdvisor-student treesDissertation-era affiliations only; gaps for some non-Western mathematicians.

Pipeline

  1. arXiv pull: 17 search terms (Goldbach, prime gap, twin prime, Vinogradov, Hardy-Littlewood, sum of two primes, etc.) restricted to math.NT and math.CO categories. Co-authorship graph computed; eigenvector centrality is the second factor in an arXiv composite of 0.60 * pr(papers) + 0.40 * pr(eigen). Authors with at least 3 topical papers qualify. Result: 155 names.
  2. OpenAlex pull: same 17 terms, then trimmed to the 13 Tier 2-and-up terms after a per-term audit. The standalone term Goldbach was replaced with three phrase variants (Goldbach conjecture, Goldbach problem, Goldbach's conjecture) to remove surname-collision noise (a Wageningen virologist named Rob Goldbach was dragging his collaboration cluster into the list). Author cap of 10 to remove physics megapapers. Composite: 0.60 * pr(topical_papers) + 0.40 * pr(topical_citations). Result: 1,113 ranked authors.
  3. Merge: surname-deduplicated arXiv 155 plus OpenAlex top 200, joined by surname. Each surname has a sum_rank = arx_rank + oa_rank. Names not in arXiv get arx_rank = 156; names not in OA get oa_rank = 1114. Sort by sum_rank ascending and take the top 100.
  4. Hand-curated edits: an exclusions file removes three known false positives (Smarandache, Carbó-Dorca, Vega). A name-aliases file forces correct MGP id lookups for Robert Vaughan and Harald Helfgott.
  5. Genealogy reseeded: the Top 100 becomes the canonical seed list for the MGP graph. Five close-relations surface from the network analysis.

Audit decisions

Excluded

NameReason
Florentín SmarandacheSelf-published paper mill, not real research.
Ramon Carbó-DorcaChemistry (quantum similarity), not number theory.
Frank VegaCrank: 113 OA works at 0.17 cites per paper. Known false positive.

Name aliases (forced MGP ids)

Display nameForced MGP record
R. C. VaughanRobert Charles Vaughan, MGP id 27012 (surname-only ambiguates with Charles Vaughan id 225220)
H. A. HelfgottHarald Andres Helfgott, MGP id 69999

What's not in this list