Patagonian monsters: Alcohol, genes and human migrations... Part 1

I am Always on the look out for interesting papers that identify crucial differences between Native Americans and Asians: since Asia is the alleged homeland of Amerindians we would expect Asians and Americans to be similar, not different, so discrepancies between them are interesting (because they must be explained to justify the theory of a Beringian migration of Asians into America).

So, the other day, when I came across a map which depicted the global distribution of the ADH1B*47His allele (more on it later), I was delighted. The map, shown below, depicts a contrast between America is shaded in white and East Asia shaded dark.

Furthermore, America is similar to Subequatorial Africa, Western and Northern Europe and the Arctic region of Central Siberia, and very different from Asia, PNG and Australia. [1]

Global contour plot of ADH1B*47His Allele. From Fig. 2 in [1]

So as clear as Black and White, America (white) is different to Asia (Black). Something is going on. Apparently America, Europe and Africa share a common trait not found in the rest of Eurasia and Oceania. This is indeed another case of an Amerindian gene not shared by their alleged Siberian or Asian "relatives".

I decided to take a deep look into the matter. And first of all, find out what was this ADH1B*47His Allele is all about... and the story is quite interesting.

On food, alcohol and genes

Fruit is a primary source of energy for many insects and animals, including our primate ancestors. Overripe fruits, in a warm and humid tropical environment can ferment and attain a considerable level of alcohol (even as high as 8.1% - about half way between wine and beer), this is quite intoxicating and has evolutionary implications.

Monkeys eating this kind of fruit would get drunk and unless they developed some mechanism to get rid of the alcohol, would face serious problems in the wilderness: a tottering and drunk or even a hung-over ape would be an easy prey for a sober lion or leopard.

Since our hominid ancestors also ate a considerable quantity of fruit, they too would have had to deal with alcohol in their systems. Somehow they would have to cope with optimizing the fruit as a food resource, with the alcoholic consequences of its ingestion: Alcohol metabolization is something that over the course of thousands of years would selected for, or against, by the forces natural selection.

It appears that the common ancestor of both humans and chimpanzees developed the ability to metabolize alcohol about 10 mya [2]. We still carry this adaptation in our genes. But not all humans have the same set, there are differences, and their effects are noticeable.

About metabolizing alcohol

The alcohol we ingest is absorbed into our blood stream and besides giving us an "alcoholic high", it goes through our liver which breaks it down into other compounds.

There are two kinds of enzymes that metabolize alcohol: ADH, or Alcohol Dehydrogenase and ALDH or Aldehyde Dehydrogenase. They work in tandem to rid us of alcohol:

ADH tuns ethanol (ethyl alcohol) into acetaldehyde, a nasty toxic substance which which provokes nausea, hedaches, hangover and flushing. The aldehyde in turn is metabolized by ALDH into acetic acid (actually, acetate, the ion) which is finally eliminated as waste.

The reactions are the following:

H₃C - CH₂-OH (ethanol) -- ( ADH ) --> H₃C - CH=O (acetaldehyde)

H₃C - CH=O (acetaldehyde) -- ( ALDH ) --> H₃C - COOH (acetic acid)

The interesting part of this is that there are different alleles of the genes that code for these enzymes, and that these are found at different frequencies among human populations around the world (like shown in the maps above and below).

ALDH alleles and aldehyde

It appears that people who carry a mutated allele of ALDH2*2 gene, the ALDH2*487Lys, are less likely to become alcoholics (Huai-Rong, Luo et al, 2009) [3], and the reason is very straightforward: the mutation reduces the enzyme's ability to convert acetaldehyde into acetic acid, therefore acetaldehyde accumulates in the body, dilates capillaries provoking a flush, and other negative hang-over consequences such as headaches and nausea. Since drinking becomes unpleasant, the carriers of this mutation tend to avoid drinking and remain sober.

Other people, lacking this mutation, turn aldehyde into acetic acid more efficiently, so drinking, for them, is pleasant, which increases the risk of alcoholism.

It appears too, that the "deficiency allele is of interest because natural selection in the form of conferring resistance to parasite infection may have preserved this allele in Asia" [4], so besides keeping its carriers sober, it kept them healthy too. This would also be something that natural selection processes could act upon, reinforcing the presence of this allele.

This ALDH2*2 mutation is predominant among people of East Asian descent. (between 26 and 46% of Chinese, Japanese and Koreans carry it), it falls to 13-6% in the surrounding areas and to less than 3% in India, Europe and Papua New Guinea; it is absent among Native Americans. So we can guess that the latter populations will tend to be more alcoholic than the former.

Map showing the global distribution of ALDH2* alleles. The red segments indicate the "Asian" mutation. Adapted from [3]

The paper (Huai-Rong, Luo et al, 2009) [3], contends that the "limited distribution of atypical allele ALDH2*487Lys indicated a recent expansion event."; and they believe that it originated recently in the Pai-Yuei tribe in Southern China 2 to 3 kya., maybe because they cultivated rice. Which may be the case. They do not present any proof regarding their dating, so I will take it as an educated guess.

The interesting part is the following:

The weight of the "other" alleles in America also differ from those in Asia:

There is no GCCTA or "Asian allele" (shaded red in the map) in America (or elsewhere). This is exclusively Southern and Eastern Asian.

The ATCTG type (shaded pale violet in the map) is predominant in Central and Southern America (45 - 90%) whose average is the same as that of Europe (66.8%). It drops off in the rest of the world: South West Asia (47.3%) and PNG (45%), and is lowest in Africa (3%) [5].

It is also found in low frequencies in North America (avg. 33%), and Northern Asia (less than 25%, in Siberia: 13%). It is even lower in East Asia (4.9%). [5] The fact that it has a global distribution points at an ancient origin, and being present in Africa at such low frequencies, indicates, in my opinion, that it probably back-migrated into Africa after originating out of Africa somewhere from where it dispersed globally (the Middle East?).

However (Oota et al., 2004)[5] despite being the most common global haplotype, it is recent. They give two reasons:

"The HaeIIIc site [which defines this allele ATCTG] is not polymorphic in five sub-Saharan Africans [...] The results indicate the HaeIIIc site [is] relatively young polymorphism.". This is so, despite the fact that " the ages of the other polymorphic sites we examined are as old as modern humans’ expansion" [5]
The inferred evolution of this allele (ancestral --> ACCTG --> ATCTG) or sequential pattern of mutations also indicates that the HaeIIIc polymorphism is relatively young". Oota et al., argue that it is two mutations away from the ancestral form and therefore young. But, so is the GCTCG allele (pale blue on map) yet they belive that it is as old as modern human expansion.

There is also the ancestral lineage, which in Asia and Africa accounts for 33 - 55% of alleles but is virtually absent in Europe and Southwest Asia (1.4% and 7.2%, respectively) yet it is present in America at frequencies of 5 - 33% (lower in South America, higher in North America). It is, in general quite common (10.0%– 53.9%) elsewhere. [3][5].

The GCTCG (pale blue in the map) haplotype is "ubiquitously distributed, it would appear to have arisen in Africa and drifted to an appreciable frequency prior to the expansion of modern humans out of Africa" [5]. This too is two mutations away from the ancestral version and there are two possible variants, involving two rare alleles:
ancestral --> GCTTG (found among the Han Chinese) --> GCTCG
ancestral --> GCCCG(found among the Ugyurs) --> GCTCG

Despite being two mutations away from the ancestral line, it is deemd as ancient! While the "Asian" allele, (red in map) GCCTA supposedly only one mutation away from the ancestral allele is deemed to be recent!.

STRP D12S1344 and some genetic theory

Oota et al., 2004) [5] dug deeper in the geographic variation of these alleles. They included the short tandem repeat polymorphism (STRP) STRP D12S1344 and graphed the distribution of the 5-SNP haplotypes (the ones mentioned above: GCCTG, CGTCG, ACCTG, ATCTG, GCCTA) according to the STRP's alleles.

Let me explain this first (It took me some time to understand what they did and its implications), below is some theory.

STRP stands for Short Tandem Repeat Polymorphism.

The DNA sequence of different people varies in some parts of our chromosomes. These variations are known as "Polymorphisms". There are some special kinds of polymorphisms known as "Short Tandem Repeats", which are interesting for the study of genetics, so the study of these STRPs is important.

Four nucleotides: guanine (G), adenine (A), thymine (T), and cytosine (C) make up the nucleic acid of DNA.

STRPs are relatively short sequences of DNA (hence the "Short" part of the name"), comprising between 2 and 5 base pairs that are repeated (hence the "Repeat" part of the name) one after the other in "Tandem". The result is a sequence of the repeated unit. In the case of STRP D12S1344, the repeat is a two base pair (therefore, a dinucleotide), "CA" (cytosine and adenine) which is repeated n times. For instance: "CACACACACA" is a 10 bp sequence of a 5 tandem repeat of the dinucleotide "CA".

Since different individuals carry different repeat numbers, these are "polymorphisms" that differentiate one person from the next: for instance: One individual may have 5 repeats, the other 6, and yet another 8. These repeats arose from mutations in their ancestors.

STRs are usually considered “junk DNA” because they are introns and do not code for protein, changes in them do not affect the people carrying them. They are neutral to the forces of natural selection.

In the case of our STRP D12S1344, it was chosen because it is downstream of the ALDH2 gene locus. This STRP was "typed" (that is, it was sequenced) and its repeats were found to vary between 11 and 25 among all humans. These represent different polymorphisms. And each different repeat value indicates a different allele. These were given names: for repeats n=11 the allele was named "allele 222", n=12 was named "allele 224" and so on, until n=25 which was named "allele 250". [7]

By "repeat" we mean exactly that; so in allele 222, the "CA" dinucleotide is repeated eleven times. Below we see the eleven repeats flanked by the remaining sequence common to all humans:

... TCGTTTTCTGGGATACACACACACACACACACACACA TTCTGTCCTTCTTTT...

What causes the appearance of different polymorphisms in a given population? (Why does Joe have n=14, Jane have n=22 and Wang have n=15?).

They arise due to chance mutations that happen at a very low rates: between 1:100 and 1:1,000,000 per generation. [6]

They are mostly formed due to "replication slippage", whereby the DNA strands are mismatched during DNA replication and a repeat is added (or removed) from the resulting duplicate.

Since slippage is a symmetrical process, repeats are added and also removed. The outcome is that new alleles with a higher number of repeats are added and others are lost by removal.

Nevertheless, they are, in general, conserved over long evolutionary time spans and it seems that long repeats tend to shorten while short ones lengthen [6] even though insertions might tend to be self-accelerating and grow as the probability of a future mispairing increases.

Since natural selection operates on other parts of our DNA (those that code proteins), but not on the "junk DNA " of STRPs, the origin and evolution of different frequencies of repeats is due to chances (which adds or removes repeats), also to selection operating on the coding part of DNA and finally from the admixture with other populations having a different repeat mix in their genes.

We can imagine a group of people sharing a certain coding DNA due to their common ancestry and also the same repeat in an STRP. Should selection select against or for the trait coded by that DNA, then the STRP would be favored or disadvantaged because it shares the fate of the evolutionary pressures on the DNA it is piggy-backing on.

So there are three factors that interplay in the repeat alleles: chance, selection and admixture.

STRP D12S1344 continued

When comparing the STRP among different human groups, Oota et al., (2004) [5] found the following distribution of alleles (remember, the x axis indicates the number of repeats 222 is n=11 and 250 is n=25). The color depicts the relative frequencies of the 5-SNP haplotypes (the ones mentioned above: GCCTG, CGTCG, ACCTG, ATCTG, GCCTA) for each repeat allele. The number beside each region is the amount of populations sampled (yes, South America is always under-sampled, only three populations were considered: Karitiana, Surui and Ticuna [7]):

ALDH-2 5-SNP haplotypes of each repeat Allele at STRP D12S1344. Adapted from Fig. 4 Oota et al., [5]

So, what does it mean?

Let's take a look at it with a regional perspective:

Africa, as expected, has a predominance of the Ancestral alleles (in black). The frequency histogram has a bimodal distribution with two clusters, one between 226 and 230, the other from 234 to 250. With the highest values at 238 and 240. In total, 12 alleles.
There is a touch of GCTCG (pale blue) and similar amount of ACCTG (green), both widely spread.
No "Asian" GCCTA (red), a clear indication of its East Asian origin.
Very little ATCTG, (violet), at allele 236.
Following the imagined tracks of an Out of Africa migration, we can see the shared traits of both South Western Asia (it has 11 alleles) and Europe (ten alleles), which are the closest to Africa.
We see, with surpirse how the ancestral (black) lineage is virtually gone.
The prevalence of ATCTG (violet), in contrast with Africa, and a similar content of GCTCG or (Blue). ACCTG (green) is also similar to Africa.
The prevalence of repeat 240 has decreased and repeat 236 has grown to a similar frequency as 240. While the second "peak" at 226-230 is still there.
East Asia, with is peculiar mutation, maintains a very high proportion of the ancestral lineage, though it has lost some longer repeats (246 to 250). It kept the second the peak at 226-230.
It also mantains a predominance of the 240 and 238 repeat like Africa, and it is there that the GCCTA, Asian mutation (in red) appears.
Repeat 240 reached a +50% frequency, fifty percent higher than the 30% frequency of this repeat found in other regions. Nine alleles are found in this region.
Siberia. Not graphed by Oota. The Siberian Yakut, [7] are quite unlike the North and South American natives: they have a maximum of 42.2% at repeat 226. The highest in the World for this allele. Are these the ancestors of the Amerindians? It seems unlikely
North America. The histogram shifts to the right: 242 and 244 repeats grow considerably:15% frequency at 244. The second "peak" at 226-230 still appears.
The ATCTG, (violet) allele is found in much higer frequencies than in East Asia or Africa, but lowe than S.W. Asia and Europe. And the ancestral allele is also found at high frequencies, higher than S.W. Asia and Europe, but lower than East Asia or Africa. There are 9 alleles in this region.
South America. The "second peak" has disappeared (226-230 repeats). And also, there is an increase of higer repeat frequencies, even more than those found in North America: there is a peak at repeat 244 of 25%, highest globally (mostly ATCTG - violet), and also at 238 with 20%, also a global high.
The ancestral (black) allele is lower than N. America, East Asia and Africa. Only 7 alleles are present in this region.
South America has the "two pronged" high frequencies on the right side of the histogram, just like Europe and S.W. Asia. All other regions have only one maximum on the right side.

Neanderthal admixture?

Since non-Africans admixed with Neanderthals, you might expect that the characteristic alleles of Neanderthals would appear in humans. I have not found any paper regarding this to be able to quantify it.

Nevertheless, the signal should be there. The sharp difference between European and S.W. Asian alleles on one side and the Asian, African and North American on the other is noticeable. The former have a one prong maximum at 240 while the latter have a two prong maximum in that area.

To me this spells "Admixture": mixing of two populations makes certain frequencies grow: those in which both populations overlap will grow, where no overlap exists, the frequency will decline. The exact amounts depend on the histograms of each population and the admixture ratio. I will do some simple simulations in part 2 of this post.

The double prong maxima is basically made up of ATCTG, the violet color mallele. Could this be a Neanderthal allele?

The Native Americans in South America have lost part of their alleles (the second peak centered on 226): perhaps due to a bottleneck in the population as it moved into South America, or maybe even later, 500 years BP, during the conquest, when millions of natives died due to disease brought by the European conquerors.

The alleles that did survive were those shared with Europeans (ATCTG, violet) -which may even be of Neanderthal origin- perhaps because they were related to coding areas common to Europeans and therefore granting protection against disease brought by them to America.

At repeat 240, compared to North America, the ancestral allele decreased at the expense of a growth in the ATCTG - violet one.

ACCTG (green) also survived, in lower frequencies than in North America and, again, at those frequencies common to Europeans: 236 to 242.

I wonder what would the DNA sequences of prehispanic natives show? Was the repeat diversity richer? Did they (and in this I include North American Natives), have a wider dispersion? Other haplotypes now extinct?

Continued in Part 2...

Sources

[1] Hui Li, et al., (2007). Geographically Separate Increases in the Frequency of the Derived ADH1B*47His Allele in Eastern and Western Asia. Am. J. Hum. Genet. 2007;81:842–846. DOI: 10.1086/521201
[2] Carrigan, M. A., et al., (2012). The Natural History of Class I Primate Alcohol Dehydrogenases Includes Gene Duplication, Gene Loss, and Gene Conversion. 7, s.l. : PLoS ONE, 2012, Vol. 7.
[3] Huai-Rong Luo, (2009). Origin and dispersal of atypical aldehyde dehydrogenase ALDH2*487Lys. Gene 435 (2009) 96–103
[4] Raymond J. Peterson, David Goldman and Jeffrey C. Long, (1999). Effects of Worldwide Population Subdivision on ALDH2 Linkage Disequilibrium. doi:10.1101/gr.9.9.844 Genome Res. 1999. 9: 844-852
[5] Oota,H., Pakstis, A.J., Bonne-Tamir, B.,Goldman,D., Grigorenko, E., Kajuna, S.L., Karoma,N.J., Kungulilo, S., Lu, R.B.,Odunsi, K.,Okonofua, F., Zhukova,O.V., Kidd, J.R., Kidd, K.K., (2004). The evolution and population genetics of the ALDH2 locus: random genetic drift, selection, and low levels of recombination. Ann. Hum. Genet. 68, 93–109. doi: 10.1046/j.1529-8817.2003.00060.x
[6] Christian Schlötterer, (2000). Evolutionary dynamics of microsatellite DNA. Chromosoma, September 2000, Volume 109, Issue 6, pp 365-371
[7] The allele frequency database ALFRED

Patagonian monsters

Pages

Monday, April 28, 2014

Alcohol, genes and human migrations... Part 1

No comments:

Post a Comment