Patagonian monsters: Y Chromosome mutation rates

In my previous post I pointed out the differences found between the ages of different branches of the Y chromosome's Q haplogroup, and how papers tend to date Native American lineages so that they coincide with the date that mainstream science considers the correct one for peopling America: ca. 15 kya.

This made me wonder what certainty do we have of their accuracy, or how precise are these dating methods. The answer is surprising!: not very precise.

The complexity encountered in dating Y chromosome lineages is summarized very well by Chuan-Chao Wang and Li Hui (2014): "... Different time estimation methods use different algorithms and assumptions, thus alternative methods probably fit more or less well with sequence data in time estimations. In addition, the best-fit mutation model might vary for different STRs... some specific lineages might have their own unique best-fit STR mutation rates for time estimation." [4]

In other words, it is extremely fuzzy. Today's post will look into the issue of "dates", the calculations of haplogroup ages, of TMRC, and the fallacy of a "clock" behind Y chromosome mutations.

The complexities behind the Y chromosome

The Y chromosome is particular because it is passed along from father to son, basically unchanged -excepting random mutations- in a long line that links all modern men to an "ancestral Adam" who lived in Africa in the distant past of mankind and from which all Y chromosome haplogroups derive.

Y is one of the sex chromosomes found in mammals, and obviously, humans; the other is the X chromosome. An X chromosome and a Y chromosome ("XY") pair detrmine a male, a double X ("XX"), a woman.

Just like all other chromosomes, the Y chromosome also mutates: chance mutations and natural selection act upon it and create small differences that, if not negative (that is, killing its carrier), are passed on to the next generations.

Y chromosome's high mutation rates

The Y chromosome mutation rate is much higher than that of autosomes because it is restricted to the male germ line, and there, most cell divisions occur by meiosis [1]: Sperm is formed in a process of cellular division known as gametogenesis, inside the testis, and it is during this process, where mutations may take place in a way that can affect the future generations (if a Y chromosome in any other cell of the body mutates -i.e. a cell in the liver-, it will have no impact on the offspring of the bearer of the mutation).

Men produce sperm from pubrerty to death, while women are born with a given number of ovum one of which matures monthly from puberty till menopause. This means that sperm are subjected to many more rounds of cell divisions and may accumulate more chance mutations. If the men are older, the chances are even higher.

Unlike the X chromosome, excepting small regions at the telomers (tips), the Y chromosome cannot undergo recombination (where mutated parts are replaced with other "healthier" ones). This means that most of the Y chromosome (95% of it) forms a non-combining region where Single Nucleotide Polymorphisms (SNP) mutations accumulate without being "repaired".

This non-recombining situation arose because X and Y chrmosomes do not recombine among each other (as do the X chromosome pairs in women), to preserve them from gaining harmful genes from the opposite sex. Allowing Y chromosome to preserve male-specific genes.

These mutations allow geneticists to trace lineages and paternity by comparison. They would also allow calculating the age of lineages by comparing differences that accumulated in each line and the rate at which they accumulate. But this is easier said than done.

Calculating ages of Y-chromosome lineages

The key element in dating lineages or haplogroups is to know the mutation rate, and there are basically two methods for calculating it:

I. Direct Measurement or Pedigree estimates: Take two individuals, related by descent and identify the mutations in their Y-chromosome. As the time span that separates them is known (either in years or in generations), the mutation rate can be calculated directly. (it is a value given in: mutations per nucleotide per generation).

The direct measurement (Yali Xue et al., 2009) [1] of the substitutions in the Y chromosome of two related men, separated by 13 generations gave a "mutation-rate measurement of 3.0 × 10^-8 mutations/nucleotide/generation... 1.0 × 10^-9 mutations/nucleotide/year " [1].

The published human-chimpanzee comparisons are "2.3 × 10^-8 – 6.3 × 10^-8 mutations/nucleotide/generation... depending on the generation and split times assumed" [1].

The uncertainty is highlighted by the very ample confidence interval values (95% CI) 8.9 × 10^-9 – 7.0 × 10^-8 mutations/nucleotide/generation obtained.

II. Evolutionary estimates: they use STR polymorphisms or Microsatellites (defined by SNPs). These can be easily genotyped. So taking the microsatellite variation within Y chromosome lineages and knowing the historical dates of certain key events in these lineages history, a mutation rate can be calculated.

As an example, I will folllow the very cited paper by Zhivotovsky et al., (2004) [2], which has a lot of assumptions, plenty of formula and maths. I am an engineer and love maths, but I will spare you the details. Those interested can check the paper (see Statistical Analysis in [2]).

The calculated average "effective mutation rate" (w), was between 0.000312 and 0.000454 per 25 years for Polynesians and Gypsies respectively, however (and these are the things that surprise me!), these values are "adjusted" because they were considered underestimates. The correcting factor ASD₀ or average squared difference was applied and voilá, a mutation rate w of 0.000705±0.000332 and 0.000725±0.000187 is obtained for Maori - Cook islanders and Bulgarian Gypsies respectively.

As can be seen the "adjustment" roughly doubled w (it increased 2.25 times in Polynesians and 1.59 times in Gypsies). Furthermore, the error bars are enormous (47% and 26% for each population).

These two values and another one estimated for "global" loci were then averaged resulting in the "magic number" most quoted, cited and used in current genetics papers: "an effective mutation rate at an average Y chromosome short-tandem repeat locus as 6.9×10-4 per 25 years" [2].

Different mutation rates

As we can see the values calculated with each method (Pedigree and Evolutionary) are very different, and applying them to calculate ages of lineages will give very differing results.

Being an engineer with a scientific point of view, I believe that the real values are those that are measured, and that the theory should provide a good model that explains reality and sets of equations or formulae that can be applied with some simple parameters to obtain results that are very similar to reality.

The "laws" of mechanics are used because they are a reasonable model that fit the every day world and gives accurate predictions and practical results (you can design a car or a plane to withstand stress and accelerations, calculate the trajectory of a missile with precision, etc). But when it comes to "laws" in genetics, it seems things are much more blurred and lack precision.

Let's look at possible factors that may explain these differences in mutation rates:

Frequent mutations might occur within the few generations used in pedigree studies, while slowly mutating loci only become significant over a longer time interval. [2]
Evolutionary calculations use statistics of current variation which include reverse mutation of old alleles as well as forward mutation to new alleles; and these reverse mutation would reduce the number of alleles. On the other hand, Pedigree estimates count mutations on a per-meiosis basis so reversals are counted as new alleles. [2]

I would add that the mechanisms working here are not clearly understood so the model fails to replicate reality.

Factors that distort the estimations

Software and assumptions

Another factor to take into account when calculating the age of different haplotypes are the assumptions behind the calculations.

Modern geneticists employ software that runs simulations (i.e. rho statistics with Network, Bayesian analysis with Batwing), which are fed with these assumptions: weight assigned to different STR variants, exclusion of certain loci (those considered ambiguous or with multi nucleotide repeats), generation time, population sizes, mutation rates (which as seen above are also shrouded in uncertainties), and "others". [3]

Among these "others" are the assumptions that, after populations split, no further migration occurs between them, [3] or, for instance, that there is an exponential growth from an initally population with a constant size "N" [4]. These may not be true, as we will see below, together with other causes

Evolutionary Rate and Repeat unit size

Evolution rate is lower for STRs that have an increased repeat unit size (that is, "n" has more nucleotides).[6][7] In other words, penta or hexa nucleotides mutate slower (3.45 x 10^-4 per 25 year generation) than tri or tetra markers (6.9×10^-4 per 25 years -the figure given by Zhivotovsky et al., (2004) [2]). [7]

This is because (Dupuya et al., 2004) [8] there are "relatively more gains in short alleles and more losses in long alleles.". [8]

These mutation rates yields different coalescence dates for haplogroups; for instance the age estimate for haplogroup CF clade based on tri/tetra marker results is 42.2 ky which is much lower than 64.7 ky estimated with penta/hexa markers. [7]

The fact that mutation rate depends on allele size means that the different haplogroups (which are characterized by different and specific STRs) will mutate at different rates when compared to each other. [8] Yielding incorrect coalescence dates when compared.

More Factors that influence Y chromosome estimates

When comparing mtDNA timelines (these are based on women) and the male Y chromosome datings, differing patterns appear. These are due to:

1. Genetic drift. It acts strongly upon Y chromosomes: many males don't have sons (they may have only daughters, or die before reproducing) so their Y chromosome is not passed on, and is lost from the gene pool, reducing diversity. [9]

2. Polygyny (having more than one wife at a time) This custom would lead to a small number of males to spread their genes (including their Y chromosome) among a disproportionately large number of children. While others are excluded from the reproductive cycle and their Y chromosomes are lost. [9]

In our recent evolutionary past, humans lived in polygynous, extended families. Where male longevity (>50) would allow them to reproduce up to high ages via younger women, situation which is not found in monogamous societies where menopause effectively cuts off older men's reproductive cycle. Older male sperm may also accumulate more mutations than younger sperm, adding more diversity to the gene pool.

3. Lower effective Male Population Size. The higher Male mortality Rate and the reproductive sucess of males (i.e. due to polygyny) are factors that reduces Y chromosome diversity in populations compared to mtDNA and autosomes. [9] This is seen in the higher level of X chromosome (females) variability compared to that of Y chromosome (males). [10]

In their estimate, Zhivotovsky et al., (2004) [2] consider male and female population as equal, but they are not. And this influences the data on ratio of variance at Y chromosome STRs to that of autosomal STR loci. This ratio varies from 1.14 in "sub-Saharan African hunters" to 0.51 among "American farmers" (the global average is close to 1); and this is due to less males per female in the latter population. This lower ratio leads to a lower mutation rate.

4. Migration. Is an important cause of gene flow within a population. It will lead to overestimation of the accumulated STR variance used in evolutionary calculations.

If migrants admixing with a population are of the same haplogroup they cannot be told apart from the original population, so mutation rates would be overestimated for the admixed population.

The gender mix is also important: if more men migrate than women, this will influence the Y to autosomal STR variance as discussed above. [2] Patrilocality (the residence of a newly married couple with the husband's family or tribe) and Matrilocality (the opposite situation) also alters mtDNA to Y chromosome variance.

5. Generation times. "In present-day hunter-gatherer societies generation time is estimated to be approximately 32 and 26 years for males and females, respectively" [11] which is different to the 25 years postulated by Zhivotovsky et al., (2004) [2]. It may seem trivial but if a generation is 32 years instead of 25, the estimates will vary considerably 10 ky can actually mean 12.8 ky. Historical generation times as calculated by pedigree estimates may be very different from those of our evolutionary past.

My next post "Generation time is not 25 years", gives some sources and data to prove it is at least 30 years for males.

6. Variation in founding populations. The Y-STR variation of the founding population at time of arrival in a geographic region is taken into account in evolutionary estimates [2], if variation is lower, the mutation rate will increase and, for higher variation mutation rate will be lower. So if the founding male population has a substantial diversity it will lead to an incorrect (lower) divergence time calculation. [2]

7. Positive Selection. Natural selection also acts upon men, and will increase frequency of a given lineage if it is more benefical for those carrying it. [9] Or, may I add, it will also benefit Y chromosomes piggybacking on individuals with some other allele favored by selection.

8. Expansion and bottlenecks. Genetic diversity between two populations that shared the same original genetic structure may be due to expansion of one of them: because random mutations will arise more frequently in a larger population simply because there are more sperm cells in which they can arise. This will increase the diversity of the larger population. [9]

A bottleneck will have exactly the opposite effect: a paucity in genetic diversity of the decreasing population as lineages become extinct. [11]

Amerindians

When considering Native Americans we must look back towards their Paleo-Indian ancestors and see how some of the assumptions mentioned above apply to them:

They were not small isolated groups with a closed-shared ancestry. Instead they were dinamic groups that had fluid contacts and exchange between each other and their ancestral populations back in Asia. [12]

They were not a "neutral" system where mutations accrete regularly, they were instead subject to positive selection, war, disease, famine which modified the clock's rate of ticking. [12]

Last but not least is the sampling bias when studying populations. Most are not drawn in a random manner from large populations. Instead they come from tiny samples from small villages where the groups are mostly composed by relatives with shared ancestry. This of course modifies the basic premises of coalescent methods and leads to shorter coalescence times than the actual ones.

Another factor is that the current genes found in a population may not actually represent the historic or even the prehistoric mix of that population [12]. Amerindians suffered a severe bottleneck after the discovery and conquest of America (after 1492 CE) which wiped out many lineages (who knows how many Y chromosome or mtDNA haplogroups disappeared during this period?).

Anzic-1 remains

The remains of a Clovis youth from Montana, US (Anzic-1), which are 12.6 ky old, were typed (Rasmussen et al., 2014) [5] and found to belong to Q-L54*(xM3).

The paper indicates that they then calculated the date of divergence between haplogroups Q-L54*(xM3) (Anzic-1) and Q-M3 of contemporary Native Americans. It is a simple rule of three calculation:

They notice that Anzic-1 had 12 traversions (mutations) while modern ones have on average 48.7, then these 36.7 additional traversions must have arisen during the 12,600 years that elapsed between Anzic-1's death and today: so 12.6 x 48.7 ⁄ 36.7 = divergence date, which happened 16.72 kya.

Of course, to make it statistically neater for the paper, they then "implemented a Poisson process model for mutations on the tree and used the constrOptim() function in R to compute a maximum likelihood TMRCA estimate of 16.9 ky. We then repeated this for 100,000 bootstrap simulations to yield a 95% confidence interval of 13.0–19.7 ky." [5]. The outcome ratified their previous simple calculation.

Below is part B of their Extended Data Figure 2: [5]

Fig 2. Adapted from [5]

The figure's original caption reads: "Each branch is labelled by an index and the number of transversion SNPs assigned to the branch (in brackets). Terminal taxa (individuals) are also labelled by population, ID and haplogroup. Branches 21 and 25 represent the most recent shared ancestry between Anzick-1 and other members of the sample. Branch 19 is considerably shorter than neighbouring branches, which have had an additional ~12,600 years to accumulate mutations."

Cross checked and doubts

I checked this value using the transversions indicated in their figure.

So I took the values in brackets and added them up for each individual, the sum is shown on the far right in green (Q-M3 individuals) and red (Q-L54 ones). At the top is an example of the calculation. The sum is referred to the split that takes place at branch 26 (marked with the vertical green line).

As an example individual at branch 0, MXL NA19682, has 8 + 2 + 8 + 3 + 21 = 42 transversions.

For Q-M3 individuals I calculate an average of 40.33 extra transversions in moderns vs. Anzik-1 and an age of 17.96 ky. Using only the Q-L54 individual's values the average is 44.3 transversions and the age is 17.29 ky, using all modern values the figurs are 41.54 transversions and 17.72 ky. They differ slightly from the 16.9 calculated in [5].

Weird Maths or incorrect assumptions

The odd thing is that when the same methodology is applied to the Saqqaq remains (Branch 27), the age estimation goes awry!:

The paper mentions the Palaeo-Eskimo Saqqaq "sequence had a relatively high missing rate of 0.24 and is divergent with respect to the other hgQ lineages in the sample, its singleton branch should more properly be considered to be of length 71 (54 / 0.76) transversions" [5].

So when we take the age of Saqqaq (4 ky), its transversions from the root at the split of branch 28 (which are 71), and calculate the amount of transversions for modern samples (by adding the 31 that correspond to branch 26, to the previously calculated figures), we obtain an average for all moderns of 72.93 transversions, so the difference that accumulated over 4,000 years is only 1.93 transversions, which leads to: 4.0 x 72.93 ⁄ 1.93 = divergence happened 152 kya! Yes, one hundred and fifty two thousand years ago.

Furthermore the distance in transversions from the baseline (the green line in the figure above) ranges from 26 (on branch 3) to 52 (on branch 12), that is, twice the amount. But all belong to modern human populations, why would one group accumulate twice the quantity of transversions than another? the difference of 26 is 26/36.7 = 70.8% of those accumulated by Anzic-1, and if we apply the same criteria 0.708 x 12,600 y = 8,926 years should separate these populations. But no, they are contemporary. In other words, the amount of transversions does not reflect age as a direct proportion.

This clearly indicates that better calculation methods for Y chromosome lineage dating are necessary.

Subjects for future posts: no Y chromosome from Neanderthals is found in modern humans. Did Q haplogroup originate in America?. Where did the Q hg found in UK and Scandinavia come from?

Sources

[1] Yali Xue et al., (2009). Human Y Chromosome Base-Substitution Mutation Rate Measured by Direct Sequencing in a Deep-Rooting Pedigree. Curr Biol. Sep 15, 2009; 19(17): 1453–1457, doi: 10.1016/j.cub.2009.07.032
[2] Lev A. Zhivotovsky, et al., (2004). The Effective Mutation Rate at Y Chromosome Short Tandem Repeats, with Application to Human Population-Divergence Time. Am J Hum Genet. Jan 2004; 74(1): 50–61. doi: 10.1086/380911
[3] Matthew C. Dulik, et al., (2012). Mitochondrial DNA and Y Chromosome Variation Provides Evidence for a Recent Common Ancestry between Native Americans and Indigenous Altaians. Am J Hum Genet. Mar 9, 2012; 90(3): 573. doi: 10.1016/j.ajhg.2012.02.003
[4] Chuan-Chao Wang and Li Hui, (2014). Comparison of Y-chromosomal lineage dating using either evolutionary or genealogical Y-STR mutation rates. bioRxiv posted online May 3, 2014. doi: http://dx.doi.org/10.1101/004705
[5] Morten Rasmussen, et al., (2014). The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 506, 225–229 (13 February 2014) doi:10.1038/nature13025
[6] Mari Järve, Lev A. Zhivotovsky, et al., (2009). Decreased Rate of Evolution in Y Chromosome STR Loci of Increased Size of the Repeat Unit. PLoS One. 2009; 4(9): e7276. doi: 10.1371/journal.pone.0007276
[7] Järve M, Zhivotovsky LA, Rootsi S, Help H, Rogaev EI, et al. (2009). Decreased Rate of Evolution in Y Chromosome STR Loci of Increased Size of the Repeat Unit. PLoS ONE 4(9): e7276. doi:10.1371/journal.pone.0007276
[8] B. Myhre Dupuya, M. Stenersena, , A.G. Flønesa, T. Egelandb and B. Olaisena, (2004). Y-chromosomal microsatellite mutation rates: differences in mutation rate between and within loci. International Congress Series 1261 (2004) 76 – 78 doi:10.1016/S0531-5131(03)01791-6
[9] Cuan-Chao Wang, Li Jin, Hui Li1, Natural selection on human Y chromosomes. arxiv.org
[10] Michael F. Hammer, Fernando L. Mendez, Murray P. Cox, August E. Woerner, Jeffrey D. Wall, (2008). Sex-Biased Evolutionary Forces Shape Genomic Patterns of Human Diversity. PLoS Genetics doi:10.1371/journal.pgen.1000202
[11] Labuda D, Yotova V, Lefebvre J-F, Moreau C, Utermann G, et al., (2013). X-Linked MTMR8 Diversity and Evolutionary History of Sub-Saharan Populations. PLoS ONE 8(11): e80710. doi:10.1371/journal.pone.0080710
[12] Peter N. Jones, American Indian mtDNA, Y Chromosome genetic data and the peoping of North America, Bauu Institute, 2004.

Patagonian monsters

Pages

Thursday, May 22, 2014

Y Chromosome mutation rates

No comments:

Post a Comment