Methodology
How the rankings are built
This page documents how the Top 100 list is constructed, what's in the data, and what's deliberately out.
Data sources
| Source | What it gives | Limitations |
|---|---|---|
| arXiv (math.NT, math.CO) | Preprint-level: titles, abstracts, authors, dates, co-author graph | Biased toward people who post preprints. Senior figures who publish only in journals are undercounted. |
| OpenAlex | Author-level: paper count, citations, affiliations, country | Concept tagging is noisy in math; surname-only matching can misidentify. |
| Math Genealogy Project | Advisor-student trees | Dissertation-era affiliations only; gaps for some non-Western mathematicians. |
Pipeline
- arXiv pull: 17 search terms (Goldbach, prime gap, twin prime, Vinogradov, Hardy-Littlewood, sum of two primes, etc.) restricted to math.NT and math.CO categories. Co-authorship graph computed; eigenvector centrality is the second factor in an arXiv composite of
0.60 * pr(papers) + 0.40 * pr(eigen). Authors with at least 3 topical papers qualify. Result: 155 names. - OpenAlex pull: same 17 terms, then trimmed to the 13 Tier 2-and-up terms after a per-term audit. The standalone term
Goldbachwas replaced with three phrase variants (Goldbach conjecture,Goldbach problem,Goldbach's conjecture) to remove surname-collision noise (a Wageningen virologist named Rob Goldbach was dragging his collaboration cluster into the list). Author cap of 10 to remove physics megapapers. Composite:0.60 * pr(topical_papers) + 0.40 * pr(topical_citations). Result: 1,113 ranked authors. - Merge: surname-deduplicated arXiv 155 plus OpenAlex top 200, joined by surname. Each surname has a
sum_rank = arx_rank + oa_rank. Names not in arXiv get arx_rank = 156; names not in OA get oa_rank = 1114. Sort by sum_rank ascending and take the top 100. - Hand-curated edits: an exclusions file removes three known false positives (Smarandache, Carbó-Dorca, Vega). A name-aliases file forces correct MGP id lookups for Robert Vaughan and Harald Helfgott.
- Genealogy reseeded: the Top 100 becomes the canonical seed list for the MGP graph. Five close-relations surface from the network analysis.
Audit decisions
Excluded
| Name | Reason |
|---|---|
| Florentín Smarandache | Self-published paper mill, not real research. |
| Ramon Carbó-Dorca | Chemistry (quantum similarity), not number theory. |
| Frank Vega | Crank: 113 OA works at 0.17 cites per paper. Known false positive. |
Name aliases (forced MGP ids)
| Display name | Forced MGP record |
|---|---|
| R. C. Vaughan | Robert Charles Vaughan, MGP id 27012 (surname-only ambiguates with Charles Vaughan id 225220) |
| H. A. Helfgott | Harald Andres Helfgott, MGP id 69999 |
What's not in this list
- Researchers without an OpenAlex profile. OA missed several Russian and Chinese number theorists from the 1960s-90s era. The genealogy supplement partially covers this gap but the list is biased toward digitally-indexed publication output.
- Subjective importance. A theorist whose entire body of Goldbach work is one influential paper may rank lower than a productive researcher with many adjacent papers. We rank by output, not by depth.
- Goldbach-specific topical filtering. OA's concept tagging is too noisy for math, so we filter by phrase match. Some of the 100 work mostly on adjacent topics (twin primes, prime gaps, sieve theory) rather than directly on Goldbach.