On the direction and root of phylogenetic trees

When I see a phylogenetic tree (also known as an evolutionary tree), I always wonder why do we believe that those branches, trunk and the root which anchors it, are correct. I ask myself why is it assumed that the mutation took place in one direction and not the other. And this trivial question is fundamental because the branches open up from other branches based on the differences between the DNA as you move along them.

Below is a very simple example of what I mean. Imagine we reach a planet, and come across a species whose DNA is sequenced and reveals the folowing genes: A, B and C.

We then take a sample of individuals, and sequence their genome. The nine individuals in our sample come from different continents and the "order" of the genes is different in each individual:


We assume that mutations take place at random so a B can spontaneously mutate into C or A, an A into B or C and a C into A or B. So, a group of scientists after looking at the genomes assumes that AAAAA is the oldest group of that species and that the other populations are the result of mutations that modified the original genome. They build tree (1) shown below. The most distant population is the one with the BABAB genome.

The red arrow marks the "founding" population and the green arrow the "newest" group, descended from them.

But another group of scientist based on some ancient remains and other assumptions, says "No, the original population is not AAAAA, it is the people carrying the BABAB genes" (exactly the opposite to what the first group of scientists have proposed and proven in Tree (1).

The second group builds Tree 2, where as we can see (the green arrow shows the original population and the red one shows us where they place the population AAAAA. For this second group of scholars, populations AACAB and CACAA are the "most recent" populations. The tree below shows the mutated gene in red:

Two different trees built from the same genome samples. Copyright © 2018 by Austin Whittall

The scholars could then identify haplogroups where the A to C or the B to A mutation marks a haplogroup and theorize on how these haplogroups evolved one from the other... Does this sound familiar? Yes, it is how the mtDNA and the Y chromosome DNA haplogroups were created -by adopting certain mutations as key indicators for branches and defining that it took place in a certain way (in our DNA, for instance, an aadenine (A) switch for a cytosine (C) may mark a haplogroup. A for C but, we could also -as in our thoretical planet imagine that the C switched for an A and that the supposed parent genome is actually the child and not the other way round.

Thus the "new" American genomes could actually be the oldest and the African ones the youngest (like switching from tree 1 to 2 above).

This is of course an oversimplification, but we do have the DNA of Neanderthals, Denisovans, Homo sapiens from different sites around the world, and anchors from our ape relatives, the chimps. But often, when I look at the sequences (CGACGGAATACG... and so on - see this image below (from Nature where a standard human sequence -top row "Reference"- is compared to Neandertal sequences in the bottom two rows), I wonder how true and accurate are our "reconstructions". Which base mutated first, which later?...

And also See this image, which compares Denisovan, Neandertal and some apes and monkeys.

Trees are created by computer programs that use "assumptions" and theoretical considerations built into them by the scientists that programmed them. They supposedly work using statistically sound calculations, which are so complex that I doubt anyone can verify them without the help of computer software... so maybe some bias is built into them, for instance assuming that AAAAA is the "original" genome in our distant planet, or here, assuming that the DNA of an African is "older" than that of an Amerindian...

I don't believe in snake oil, but I do believe that we should look at facts with open eyes (like the two trees that can be built using those nine sequences in our distant planet. Same data different conclusions) and not be biased by a prejudice (prejudice = Pre Judgement, we use the data to prove what we believe to be true, not to prove the facts...).

