Y chromosome haplogroup C Part 1

C hg. in America - seeking a link with Homo erectus

The most predominant Y chromosome haplogroup among Amerindians is haplogroup Q (92.9% frequency). But it is not the only one to be found among Native Americans, there is another one, haplogroup C, which is found at a much lower 7.1% frequency among indigenous American men. [6]

C haplogroup was initially deteced in America among six populations; the North American Tanana, Navajo, Apache, Cheyenne and Sioux, and the South American Wayuu of Colombia (n = 2) [6]. And, interestingly, the haplotypes of these groups reflected their peculiarly "patchy" geographical distribution and with two very distinct clades (Patchiness appears to be characteristic with hg C., its distribution in Indonesia, New Guinea and Melanesia is also patchy):

  • C3b is the sub-clade that is defined by SNP P-39 and is only found in North America which means that it did not arrive "early" (otherwise it would be uniformly distributed across the New World like Q hg.), this is supported by the fact that it is found among the "latecomer" Na-Dene speakers. Nevertheless, these North American haplotypes differ among the Cheyenne, Apache and Navajo, suggesting that they had plenty of time to evolve. They have been dated to a mean age of 13,900 years [6].
  • C3*, Paragroup. (C-M217, which lacks P-39) is the other sub-clade, and appears in Northwestern South America and, an individual of Tlingit origin in Southwestern Alaska (presumed to be of native ancestry).

C3* is not found elsewhere (i.e. Central, North or Southern South America) and has many mutational differences separating it from the C3b haplotype, "reflecting its marked divergence". [6]

The Odd South American distribution of C3*

C3* has only been detected in three populations in South America, [2] and at low frequencies: Gepper (2011) [3] detected only 6.2% (4 out of n=65):

  • Wayuu Colombia. (n=2) [6]
  • Waorani tribe, Ecuador, (n = 3) identical haplotypes but the men were from different families.
  • Kichwa speakers from Pastaza province (close to the Waorani villages) (n = 11).

Close but distant. Despite proximity the Waorani were a ferocious people and until the late 1950s, they did not mix with their neighbours. The haplotypes of Kiwcha and Waorani differ by many mutations which shows that both groups "survived for a long time in isolation from each other." [2]

The patchy distribution of the C3* paragroup among isolated tribal groups and the differences that exist within these groups is suggestive. Below is a table showing the C3* among the populations mentioned above, from [2]:

Wayuu, Tlingit and Waorani differ. Kiwcha and Waorani are similar except for last three individuals (whose divergence in repeat numbers I shaded darker).

The Asian link

C3* is found at relatively high frequencies all across East Asia: Koryaks of Kamchatka (38%), Mongolia (36-38%) and it drops to 10% in Korea and 3% in Japan, yet is high among the aboriginal Ainu of Hokkaido (15%). [2]

But the Asian and Amerindian C3* are not identical: a median-joining network analysis of the Y-STR haplotypes showed that the South American C3* carriers "belonged to separate and rather distant clusters at the periphery of the network [which included Asians] , suggesting that the time of the last contact between these two groups predated the time of the initial colonization of the Americas". In other words: an ancient common ancestor back in Asia. [2] (Also see Fig. 4 in [6]).

Furthermore the Tlingit C3* from Alaska belonged to haplotype H166, the Ecuadorian Kiwcha had H7 as the most frequent haplotype followed by H162, H163, H22, but had a "substantial distance to common Asian types". The Colombian Wayuu belonged to yet another haplotype, H165, "only distantly related to the Ecuadorians" [2].

Comment. C3* is a paragroup, which means that it clumps together all C3 haplotypes that do not belong to known haplotypes for which specific markers have been identified. Since we cannot tell them apart because the chips that analyze haplotypes do not recognize them as distinct, they are all clumped together as C3*.
In other words an individual belonging to C3* in Siberia may one day be assigned to a currently unknown C3x haplotype while the Amerindians may belong to completely different yet still unknown haplotypes C3y and C3z.

Nevertheless, Roewer et al., (2013) [2] propose that the C3* found in South America is of a recent Asian origin. They believe that its limited range within America is due to a late migratory event that crossed the Pacific in water crafts and reached the Ecuadorian coasts (a boat full of Japanese shipwrecked in Ecuador that marry into the Waorani and Kiwchas...).

But first, let's look at the phylogenetic tree from Roewer et al., is very interesting [2]:

C3* phylogenetic tree
Phylogenetic tree for hg. C3*. From [2]

Some interesting points: This is a paragroup (C3*) so it encompasses all that does not fit into the know haplotypes. In other words it may include yet undiscovered haplotypes. Having said this, some relationships are clear:

  • The Japanese - Korean cluster (at the bottom of the tree in shades of blue). Is quite diverse and definitively separate from the American cluster (red and pink dots). Within the Japanese - Koreans we find Chinese (orange), Tibetans (brown), Indonesians (black) and Mongolians (yellow). No siberians or Northern Asians. This may indicate relationship between Southern and Eastern Asian clades concealed within the C3* paragroup.
  • The Mongolians appear in two distinct groups, one at the bottom, under the Japanese - Korean cluster. The other rising from the central part on the right side. This one is mixed with Altaian, Siberian, Chinese (Anatolians - white dot) and includes the Colombian natives.
  • The Amerindian (and Alaskan) line appears on the left, in several branches born in the central cluster it appears related to the Siberians

It seems to me that there are several different haplotypes hidden in the C3* paragroup that have yet to be identified, but the Americans, but, it clearly indicates no relationship between Japanese or Koreans and the Amerindians which is what Roewer's team suggests below.

An improbable Transpacific origin

Let's follow their arguments:

As usual the methods employed are based on the expected orthodox assumptions: 15 kya migratory event into America and a 12 kya entry into South America. [2]

Limited Distribution. "If haplogroups Q and C3* both entered the American continent from Asia at the same time 15,000 YBP, then C3* would have been expected to be more widespread than has been reported so far." [2]. My counter argument is: it reached America in a very ancient peopling wave, long before the major 15 kya event and was overlaid by the Q hg. which proved more succesful and replaced it. Another option: as it is so ancient, it is found at extremely low frequencies and that is the reason for it being bypassed by the samplings performed so far in other parts of America, a clear bias (see my post on Biases in genetic models).

However, disagreeing with Roewer' team, Zegura et al., [6] do not support the notion of two separate founding events. Instead they attribute the lower frequency of C hg. to it being a "minor component of the Y chromosomes in the single founding population" and that its frequency and patchy distribution are due to "successive episodes of intragenerational and intergenerational genetic drift" [6] in other words, they were few to start with and became fewer as time passed (more on this in my post on genetic models).

Isolated range. Since the population that peopled America came from Asia and marched in a North to South direction, if any C3* was present in that founding group, it would have to also ppear in North America and Central America; it would also be expected to have spread across South America too, like the other mtDNA and NRY hgs. too.

Since it is not found in Mesoamerica or the rest of South America, and only one case has been found in North America, yet it is very frequent in East and Northeast Asia, they conclude that it is unlikely that the initial peopling wave or America carried C3* with it. They believe that it would not get lost in the North but survive in the South if genetic drift was the cause of its loss. They also consider it unlikely that the wave carrying C#* marched quickly across North America (leaving no trace there) to settle in the south.

They summarize it as follows: the very low frequencies observed in South America "are hardly compatible with a long period of joint immigration from Asia" [2]. See above for a counter argument: two separate migratory events where the new one overlays the older one and replaces it. This is not unknown and has been pointed out in Melanesia, precisely for another C paragroup found in Indonesa and Melanesia: mentioning its ancient roots and a lower frequency as due to a "... later waves of (partial) replacement." [4].

So as an explanation for C3* being found in Northewestern South America, they propose a transpacific nautical event c. 6,000 ya, but ignore the Tinglit C3* individual in Alaska... did the Asians cross the Ocean to Peru and then go up to Alaska? or was it the other way round? Why didn't they touch land in other places and leave their imprint there too?

The 6 ky date is based on a "comparatively recent coalescence of the C3* haplotypes from the present study". [2] As expressed in my critical post on genetic models), the whole issue of dating is, in my opinion pretty feeble, so I would place a big quesiton mark on that date estimate.

They do however leave a window open when they note that the Tinglit C3* may "mean that a North American origin of the Ecuadorian C3* haplotypes, albeit less likely prima facie, cannot be ruled out" [2]. This fits in with the scenario that I propose: Tinglit is an isolated relict of the once Pan-American C3* population, later overlaid by Q hg. males of a more recent migration.

They support the transpacific contacts with similarities in the ceramics from Japan and Valdivia in Ecuador (I edited my following comment based on an input from a Reader I had mistaken Valdivia in Chile for Valdivia in Ecuador. in Chile and "the close proximity of the spotty C3* cluster to the Valdivia site".Allow me to point out that the 5,200 km or 3,200 mi. that separate the C3* in Ecuador from Valdivia are, in my opinion not a Close proximity). By the way, how did the ancient Japanese mariners get to Colombia and admix with the Wayuu? That is not explained either.

New (02 July 2014), I posted on these Jomon and Valdivian contacts today so that you can make up your own minds.

Finally, and to overcome the notable differences between Amerindian and Asian haplotypes they argue that it was provoked by limited gene flow between populations and also point out that "The striking differences observed between the Y-STR haplotypes of Ecuadorian and Asian C-M217 (C3*) carriers would be explicable in terms of a long divergence time after the arrival." [in America] [5], which to me seems incompatible with the 6 ky date mentioned by them. It is not enough time to justify such diversity.

Summary: the Waorani and Kichwa in Ecuador and the Wayuu in Colombia carry a C3* paragroup not found anywhere else in America. This is an oddity that requires explaining other than a one-way-expedition across the Pacific in a boat that set off from Japan. Lets see what the Waorani and Wayuu can tell us:

C3* haplogrop map America
Map showing where C3* hg. is found in South America. Copyright © 20143 by Austin Whittall

The Waorani people

The Waorani or Wao people, are a small hunter-gatherer and horticultural tribe of about 1,750 individuals, that live in the Amazon jungle in Central Ecuador, South America. They are not a coastal group (which means the Japanese sailors would have had to trek inland, across the Andes into the Amazon to mate with them and get their C3* into the genome of the Wao people).

Their habitat is particular (a Pleistocene forest refugium) and is protected as a national forest. The Waorani are a highly inbred group due to their social customs: marriage with cousins is an accepted practice, [6] and social violence is the major cause of adult death (reducing availabile males for mating).

Their language -Waso Tiriro- seems to be unrelated to other regional languages, which also favors their isolation.

Besides this unique C3* NRY paragroup, they also carry a "Waorani-specific" mtDNA: A2s [6]. By the way, A2 mtDNA hg. has a strong North to South cline, similar to what would be expected for a Beringian entry and dispersal toghether with the NRY C3* hg. It is therefore likely that both maternal and paternal lineages are local and not recent arrivals from Asia.

Their isolation led them to retain this very rare paragroup which was lost elsewhere.

More on the Waorani

The Kichwa

Don't mistake them for the Quechua or Quichua, of Peru and Bolivia. These people are a native group of the Ecuadorian Amazonian jungles. Over 100,000 survive today. They knew the Quechua language after contact with the Inca empire and used it for trading purposes. The Spaniards dominated them and the missionaries imposed the Quechua language used in Peru on their new subjects.

They too live on the eastern side of the Andes, in the jungle.

More on the Kichwa

The Wayú o Wayuu people

They are a Arawakan speaking group. The Arawakans once peopled the Northern area of South America from the Orinoco River to the Caribbean Islands (Taino).

The Arawakans were the first group of Amerindians to meet the Europeans (1492) and died massively as a result. Fortunately, the Wayuu lived in the arid Guajira Peninsula shared by Colombia and Venezuela, on the Caribbean Sea; the harsh territory and the bellicose nature of the natives kept the Spaniards out. It was not until the mid 1800s that they were subdued and incorporated by both countries. This allowed them to survive without becoming extinct like many other native groups did.

They number over 300,000 individuals. The males can form polygamous families.

More on the Wayuu


All three populations have remained relatively isolated, the Ecuadorians due to their fierceness and jungle habitat. The Colombians due to their geographical isolation on an arid peninsula.

This allowed them to survive the massive deaths that followed contact with Europeans after the discovery of America and maintain rare C3* paragroup haplotypes in their genomes, haplotypes which have become extinct elsewhere. (Perhaps the sequencing of Caribbean Taino remains may yield more C3* sequences).

Their location seems to rule out admixture with Transpacific navigators, one is beyond the Andes in the Amazonian rainforest, the other is on the Caribbean sea, both are over 500 miles from the Pacific Ocean's shores.

It is extremely likely that the current patchy distribution and the extremely low frequences of C3* paragroup in South America reflects the remains of a once widespread lineage later overlaid by more recent migrants from Asia and which was seriously reduced due to the bottleneck provoked by the Conquest of America that began in the Sixteenth Century.

The few samples detected may preclude identifying special markers that would allow the definition of a new haplogroup within C3 and removing them from the C3* paragroup. But there is always the chance that wider sampling may yield more individuals that may alow a better typing.

In the meantime, we have the phylogenetic tree shown above which clearly indicates a very ancient origin of C. Notice the deep mixture of people from very distant geographic regions sharing the same haplogroup: Indonesian and Mongol at hg. H37, Ecuadorian Kiwcha and Siberian Koryak at H7, Korean and Indonesian at H158, arising from Chinese H64. Siberian and Indonesian (H1+H4+H5+H769). This is, in my opinion, a clear indicator of antiquity.

Let's remember that Homo erectus survived until quite recently in China and Indonesia, so it is not improbable that C hg. and H. erectus are related, especially since the root of the Y chromosome tree has been proved to be older than H. sapiens at 271 - 581 ky. (Mendez et al., 2013) [7]. Of course this date is probably too recent to mark the H. erectus input into our NRY lineages, but given my doubts regarding the calculations of divergence dates, it may be 4 times that value and therefore fit with the H. erectus OoA event.

As we will see in the next post, haplogroup C also has very deep roots in Asia, and C* also contains haplotypes that have yet to be identified.


  1. "Valdivia" is NOT the city in Chile, but the famous archeological site in Ecuador where the "Valdivia culture" originates (for a popular description see wikipedia). This site near Guayaquil is very near to the locations where the C3* chromosomes were found

  2. Dear Lutz, thanks for your comment, I was indeed mistaken and have corrected my error above in the text and in the next post today July 2, where I give some more details on the Valdivian Culture of Ecuador and the possible Jomon contact.
    Thanks for pointing out the mistake.


