In my last post I mentioned the issue of shorter branches for contemporary Africans in the Y-chromosome phylogenetic tree. This means that starting from the fork that leads on one side to Africans, and the other to non-Africans, the latter contains more mutations than the former, but we are all the same age and equally distant from our common ancestor. So why do the Africans have fewer mutations? Do Eurasians accumulate more mutations? Are the branches built incorrectly? This post will try to shed some light on this matter.
Y-chromosomes and haplogroups
The accepted haplogroup structure for chromosome Y, just like that of mtDNA, is rooted in Africa, where the most basal lineages are found.
Using the phylogenetic tree analogy, all other variants, found outside of Africa are branches that stem from this African origin. Outlier branches, even closer to the root, include our ancestor-relatives, the Denisovans and Neanderthals.
Back in 2014 I posted about Neanderthal Y chromosomes, and used the following image, which I have updated to add Denisovans.
The Denisovan and Neanderthal Y-chromosomes were studied by Martin Petr et al. (2020) in their paper The evolutionary history of Neanderthal and Denisovan Y chromosomes (Science 369, 1653-1656 (2020). doi:10.1126/science.abb6460 🔒- free access on Biorxiv🔓), which I will comment in depth in a future post. The authors of this paper mention that their Y-chromosome phylogenetic trees display shorter branch lengths for Africans.
This is interesting! They state "Importantly, we discovered that the branch-lengths in Africans are as much as 13% shorter compared to non-Africans (Figure S7.3), which is consistent with significant branch length variability discovered in previous studies and suggested to be a result of various demographic and selection processes."
Below is Figure S7.3 mentioned above. You can see that all these African samples have ratios, except for the S_Mbuti_1 sample, that are lessr than 1, meaning the branches are shorter than the European ones. Furthermore, the most diverged samples (A00) are even shorter :
The branch lengths refer to the number of accumulated mutations in the branches of phylogenetic-trees. Africans have fewer mutations than non-Africans, so their branches are shorter, yet they are supposedly older! This is an anomaly, because it impliles a slower mutation rate in Africa, or a quicker one outside of Africa. The explanation offered by the authors is a classic one. This explanation is that leaving Africa caused population bottlenecks and forced adaptation to new environments which speed up mutations, or so the theory goes! Below is Fig. S1.7 from this paper.
The values of the branches a, d, e, and f are given in the paper's Table S7.1 and are the following (I adapted the image and included a new column, a+d the branch leading to non-Africans, which, as you can see, has more mutations than the African ones -compare the values of a+d with f.
The difference seems small but it is significant. Furthermore since Ust'Ishim, who died 45,000 years ago, non-Africans added an average of d-e mutations, ~200 of them. Africans added ~180-190 mutations. Hence, the "shorter branch" issue.
Shorter or Longer?
However, an earlier paper that studied Neanderthal and H. sapiens Y chromosomes by Mendez F, Poznik G, Castellano S, Bustamante C, (2016) (The Divergence of Neandertal and Modern Human Y Chromosomes. The American Journal of Human Genetics, 98, 728-734) showed different branch lengths, but with an opposite skew! This work included two figures (Fig. 1B, and Fig. 2) which I have combined and adapted in the image below. (the filters are different regions used to compare the DNA strands, some are more restrictive than others).
The branch lengths leading to the most divergent Africans with haplogroup A00, Mbo people from Cameroon, has a length e, which is longer than the one leading to the Reference (European men), branch d. But both share the same root. Why have the Mbo men accumulated more mutations than Europeans during the same time span?
This paper calculates the split age for both Modern Human branches (Mbo and Europeans) at 280 thousand years ago (kya), and dates the Neanderthals split at ∼588 kya. The Neanderthal man that was analyzed, died ∼49,000 years ago, in El Sidrón, Spain, and is located on branch f. His lineage contains 49,000 years of fewer mutations because we mutated while he remained static, yet, the total line f contains far more mutations than either modern human line: the A00 (a+e) or European lineage (a+d), who, by the way have had an added 50 ky of mutations on them!
This shows that the Neanderthal Y chromosome mutated faster than Homo sapiens Y chromosome, or that the timeline calculated in the paper is inaccurate.
Back and Recurring mutations
The paper noted that "The 17 sites that are incompatible with the tree are principally due to recurrent and back mutations". So these are not as infrequent as imagined.
Reference Bias
Janet Kelso, co-author of Petr et al.'s paper investigated branch lengths and published her research in 2024: Resolving the source of branch length variation in the Y chromosome phylogeny, Yaniv Swiel, Janet Kelso, Stéphane Peyrégne. bioRxiv 2024.07.05.602100; doi: https://doi.org/10.1101/ 2024.07.05.602100.
This paper admits that population size, and reproductive age, accumulated deleterious mutations due to bottlenecks in the out of Africa group, may play a role, but the main cause of branch length differences is the reference human Y chromosome used for comparison, that lacks mutations that appear in more diverged haplogroups: "branch length variation amongst human Y chromosomes cannot solely be explained by differences in demographic or biological processes. Instead, reference bias results in mutations being missed on Y chromosomes that are highly diverged from the reference used for alignment."
Reference bias is an error caused by using a certain benchmark (in this case the reference haplogroup, which is European, known as the Homo sapiens (human) genome assembly GRCh37 (hg19) from the Genome Reference Consortium), that favors genetic "reads" that match it, over those in alternative alleles. The reference Y haplogroup is R1b.
Comment on A00, the most ancient Y chromosome
For those interested in the deepest root of Y-chromosomes, the one named A00, you can find the original paper describing it by Melendez F., et al., (2013) (An African American Paternal Lineage Adds an Extremely Ancient Root to the Human Y Chromosome Phylogenetic Tree. AJHG, Vol 92:3 3, 7 March 2013, pp 454-459, https://doi.org/10.1016/j.ajhg.2013.02.002). An interesting critique to the findings, especially the extreme old age of this "basal" root, can be found in this paper: Elhaik E, Tatarinova TV, Klyosov AA, Graur D., (2013). The 'extremely ancient' chromosome that isn't: a forensic bioinformatic investigation of Albert Perry's X-degenerate portion of the Y chromosome. (Eur J Hum Genet. 2014 Sep;22(9):1111-6. doi: 10.1038/ejhg.2013.303. Epub 2014 Jan 22. PMID: 24448544; PMCID: PMC4135414).
Sometimes the media, and websites mention "the oldest" or "the earliest" people pointing at the Mbo or the Khoisan (San) groups, but in fact nobody alive nowadays is "older" than other populations. We have all been evolving since the first Homo sapiens appeared. We are all equally distant from him or her, nobody is closer or more similar to those original modern humans.
This is why I dislike phylogenetic trees like the one shown below (source) that implies a direct link from the ancient root to nowadays for the San people, and a series of steps to a short fork for Asians and Europeans. (Hss: H. sapiens, Hsnn: Neanderthals, Hsnd: Denisovan)
When I read that the Khoisan separated from all other humans 150,000 years ago, I get the impression that it is a false statement. The Khoisan were not isolated since then, they also have admixture of other humans, but having lived in isolation in the deep past, and admixing with other diverse, divergent, isolated groups, they acquired a higher diversity themselves, as a population, while humans living outside of Africa lost diversity due to bottlenecks and founder effects. But the genes we retained in America, Asia, Oceania and Europe are mostly as old as the ones found in Africans.
Back to differing branch lengths
Hallast P, Batini C, Zadik D, et al. (2015). (The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades. Molecular Biology and Evolution. 2015 Mar;32(3):661-673. DOI: 10.1093/molbev/msu327. PMID: 25468874; PMCID: PMC4327154. 🔓) mentioned that "Different clades within the tree show subtle but significant differences in branch lengths to the root." Fig. 3 in this paper (above is part of the figure) gives a clear image on how the branch lengths differ.
The tips of all haplogroups should all align, justified on the right side, as all the tips are contemporary, however, they have different lengths. I took R2 as the reference and drew a black vertical line. This makes the shorter branches stand out: haplogroups A, B, H, I1, Q, and R, and also the longer ones like C, G, J, or T. As you can see in the image above (I recommend visiting Fig 3 following the link, because it has far more detail than the simplified version I included above.)
Replication timing
A very thorough analysis on the causes of branch length differences can be found in Qiliang Ding , Ya Hu , Amnon Koren , Andrew G Clark, (2021). Mutation Rate Variability across Human Y-Chromosome Haplogroups. Molecular Biology and Evolution, Vol 38:3, March 2021, pp 1000–1005, https://doi.org/10.1093/molbev/msaa268.🔓.
The paper used data from over 1,700 men and "uncovered substantial variation (up to 83.3%) [in the] mutation rate among haplogroups. This rate positively correlates with phylogenetic branch length, indicating that interhaplogroup mutation rate variation is a likely cause of branch length heterogeneity."
The authors remarked that "Previous studies suggested that branch length heterogeneity might be caused by nongenetic factors, for example, paternal age variation across populations, acting over many generations. Another possibility is variation in mutation rate among Y-chromosome haplogroups.... [but] It was suggested that variation in Y-chromosome mutation rate across haplogroups was unlikely (Jobling and Tyler-Smith 2017)."
They disagree with the nongenetic factors and with Jobling and Tyler-Smith's dismissal of varying mutation rates, and prove that both are mistaken. This paper confirms that something known as replication timing varies across haplogroups, and this difference is linked to higher mutation rates (later replication causing more mutations than early replication timing).
Replication timing is the sequence in which the DNA of a chromosome is duplicated during cellular division. It involves unwinding and unzipping the DNA strand in a specific orer, in different places, some of them simultaneously.
Due to these differing mutation rates, branch lengths are different, and this impacts on the timing and dating of haplogroups. The paper's supplementary file states that the divergence time of haplogroups E1b, R1a, and R1b may be underestimated, while that of haplogroup B is overestimated, as the former have shorter branches, and the latter, longer ones. See Fig. 3 C and D in the paper.
The explanation sounds good, but why do different haplogroups have different replication timing? Alas, no answer is provided!
Population factors
Nevertheless, Barbieri, C., Hübner, A., Macholdt, E. et al. (2016) (Refining the Y chromosome phylogeny with southern African sequences. Hum Genet 135, 541–553 (2016). https://doi.org/10.1007/s00439-016-1651-0 🔓) attribute branch length in Southern African haplogroups to paternal age: "there is pronounced variation in branch length between major haplogroups; in particular, haplogroups associated with Bantu speakers have significantly longer branches. Technical artifacts cannot explain this branch length variation, which instead likely reflects aspects of the demographic history of Bantu speakers, such as recent population expansion and an older average paternal age. The influence of demographic factors on branch length variation has broader implications both for the human Y phylogeny and for similar analyses of other species." (Sure! it affects the calculation of dates along the branches of phylogenetic trees!).
This paper finds "The shortest branches in the Y chromosome phylogeny are for haplogroups A and B... E1b1a lineages have significantly longer branches than E1b1b or E2 lineages." Taking a look at the mutations marked along the phylogenetic tree shown in the paper's Fig 1, it confirms the comment branch lengths variability (below is the number of mutations from the tip to the root at the A2—T node).
- A2a: 17
- A2b: 7
- A2c:22
- A3b1b: 21
- B2B1: 113
- E1b1a: 208
- E1b1b: 138
- E2: 105
These people, living today have an extremely wide variation in mutation numbers between their common ancestor at the A2—T root and themselves: 7 to 208 mutations!! They are all Africans, and should be equally distant to the R1b reference genome, meaning that Kelso's reference bias does not apply in this case. This could be due to paternity age (older men have more mutations in their sperm as they sire children and pass on mutations in their Y chromosomes to their sons), or to the different replication times of different haplogroups.
T Naidoo et al., (2020) in their analysis of Khoe-San men in South Africa also found the branch issue: " Branch Length Heterogeneity Several earlier studies (Scozzari et al. 2014; Hallast et al. 2015; Barbieri et al. 2016) found evidence of branch length heterogeneity among Y-chromosome haplogroups, and provided possible reasons for its occurrence. We also noted significant differences in branch length heterogeneity among the major African haplogroups (supplementary tables S2 and S3, Supplementary Material online). A reduced mean branch length for haplogroup A, noted previously by Scozzari et al. (2014), was again apparent from our data. Although most major haplogroups differed significantly (with the exception of the E1b1a subclades), we found that haplogroup B did not appear to have as reduced a mean branch length, relative to haplogroup E, as found previously (Hallast et al. 2015; Barbieri et al. 2016). Within haplogroup E, E1b1b1 was found to have the highest mean branch length; though this may have been due to a lower sample size compared with haplogroup E1b1a." It seems to me, as a layman, that the branch length issue perplexes even the smartest scholars.
Closing comments
This post shows that scholars don't agree on why the African branches, the most diverged, and "archaic", leading to the root, and origin of our H. sapiens species, contain fewer mutations than those found in Eurasian people. Since the basis of calculating the splits between modern humans and archaic relatives like Neanderthals and Denisovans is the assumption that there is a "mutation clock" that ticks at a regular pace, so if we know the ticking rate, and the number of mutations, we can calculate when species split from others, and people diverged from others. Short branches on supposedly ancient lineages are incongruent.
We are all equally ancient, Africans, Eurasians, and Americans, yet we have accumulated mutations in our Y chromosome at different rates. This is something that should be clearly analyzed. Software issues, methodology, sampling, reference bias, replication times, older reproductive ages, larger population sizes, bottlenecks, etc. have been put forward to explain this anomaly. None of these answers seems satisfactory. Chromosome Y is peculiar, it is small, and critical; any mutations here can have disruptive effects. We are overlooking something. When we find it, we will know why some branches are longer than others.
Patagonian Monsters - Cryptozoology, Myths & legends in Patagonia Copyright 2009-2026 by Austin Whittall ©






















