Friday, July 3, 2015

Admixture Analysis Of Global and Ancient Genomes

What Is Admixture Analysis?

A computer program called Admixture uses a mathematical algorithm whose application to genome data includes a little art as well as a lot of science, to fit data from large numbers of genomes as well as possible into a model.  This model assumes that each person's genome is mix of different proportions of a preset number of hypothetical ancestral populations determined in a manner that maximizes the quality of the fit using linear algebra.  It is also possible to tweak the program, for example, by designating some real individual as the exemplar of an ancestry component, rather than having the computer derive its clusters entirely without outside input.

I don't know precisely which choices were used to generate the latest and greatest result from Eurogenes analyzing a sample of more than 2059 modern and ancient genomes that maximally capture all varieties of modern human genetic variation in an Admixture run at K=10 (i.e. requiring the program to fit the individuals in the sample into percentage contributions from ten ancestral populations generated by the computer).  This sample includes almost every available complete ancient genome (which number in the hundreds) and some global databases of genomes that are widely used in the professional literature (such as the thousand genomes database) to represent the rest of the world.

An Example Of Admixture Analysis

For example, the first ten samples in his analysis are African-Americans from Denver (because AA for African-American comes first in the alphabetical listing).  For each individual a percentage of ancestry from each of ten groups that have been labeled for convenience after the fact to give a sense of where that component is most often found.  These categories are (with abbreviations spelled out):

1. Middle Eastern
2. San Bushman
3. American Indian
4. Northern Siberian
5. East Asian
6. Hindu Kush
7. Sub-Saharan African
8. European Hunter-Gatherer
9. Oceanian.
10. East Siberian.

For example, individual number 15 from the African-Americans from Denver sample included in his Admixture run is determined to be:

88.6% Sub-Saharan African
8.5% European Hunter-Gatherer
1.6% Middle Eastern
0.5% San Bushman
0.4% American Indian
0.4% Hindu Kush

The proportion of the other four ancestral components is negligible.

In terms an average person would understand, this individual is 89% black, 10.5% white, and 0.4% American Indian, these proportions reflect the American reality that African-Americans typically have higher proportions of African ancestry, and lower proportions of non-African ancestry, than is typical of people with some African ancestry in Latin America and the Caribbean.

This individual's African ancestry overwhelmingly from populations more like typical Niger-Congo language speaking West Africans and much less like "Paleo-African" populations like the Khoi-San bushmen of the Kalahari desert and the Pygmies of the Congo jungle.  This reflects the typical sources of individuals in the American slave trade.

The mix of "European hunter-gatherer" ancestry, "Middle Eastern" ancestry, and "Hindu Kush" ancestry in the "white" component of his ancestry is roughly in line with what you would expect in someone of Scottish origins (which is typical of Southern whites in the U.S., many of whom were Scotch-Irish).

Small but measurable amounts of Native American ancestry are common in African Americans.

All of this is exactly what one would expect from other data in a typical African-American from Denver.  One of the African-Americans from Denver in the sample, however, who is an exception, is almost half-white and not quite half-black, and is probably light skinned relative to a typical African-American in Denver.

Similar break downs are available for all 2059 people, modern and ancient alike, in the database, although it takes a certain amount of familiarity with how the individuals are identified to know which are modern, which are ancient, and what modern ethnic groups or his archaeological cultures are represented by the label given to an individual in the spreadsheet.

Insights: Genetic Variation Is Highly Structured And Far From Maximal

The fact that a reasonably accurate description of someone's ancestry, relative to seven billion or so living people and untold numbers of deceased individuals who preceded us, can be summed up with a fair degree of specificity with percentages of ten ancestral components, is itself remarkable.

The reality of human genetic variation observed in the real world is dramatically narrower than the default assumption that each SNP is random relative to the entire human population, in which each individual would be their own "special snowflake".  Each individual is unique, but the differences within ethnic communities often colloquially described in terms of race, linguistic affiliation and ancestral religious identification, are often quite subtle.

Indeed, vast areas of the human genome are totally ignored by people interesting in genealogy, forensic applications, or ancient DNA research, because all modern humans are identical in that part of the genome.  Indeed, a significant component of the part of the genome that has reached fixation in modern humans has also reached fixation in archaic hominins (like Neanderthals and Denisovans for which we have ancient DNA to compare to), for primates, for mammals, and for vertebrates.  Indeed, all multi-celled animals, no matter how primitive, share more than 40% of their DNA at locations that are so functionally important that they have reached fixation.  The more that parts of a person's genome are ancestry informative and variable within modern humans, the more likely it is that those parts of a person's genome are not important to evolutionary fitness.

Every Ethnicity Has At Least One Distinctive Genetic Profiles

Still, a person's genomes are ancestry informative and can often pin down a person's likely self-identified ethnicity, race, ancestral religious affiliation and familial place of origin with great specificity, in Europe, for example, pinning down the likely place of origin of someone with ancestors all from the same region, to a location within a hundred miles or so.

For example, when I compared the mix of "white" ancestral components of the African-American individual from Denver described above, it was possibly to obviously rule out a white ancestor from the Near East, Southern Europe or Iceland (because the Middle Eastern component was proportionately too small compared to the other ancestral components), or from Russians (who generally have a significant Northern Siberian component).

Similarly, a recent African immigrant from Somolia would have about ten times or more as much of the San Bushman ancestry component as someone descended from slaves from the American Southeast, as is typically the case with African-Americans, even though the former would not be unheard of in Denver's African-American population.

There are some cases where a population that culturally is just one ethnicity, such as "African-Americans" in the United States, can actually have several distinct genetic profiles (e.g. Ethiopian-Americans and other Africa-Americans in Denver might be classified socially as both being African-American, but would have different genetic profiles).

This reflects that fact that the history of human migration and diversification through mutations, isolation into separate reproductive populations, adaptation to new environments, and admixture, has been highly structured and has involved a finite number of populations, that these populations had enough time to homogenize while isolated from other populations, and that there have been a modest and finite number of significant admixture events in modern human history.

Few People Are Pure Types

Another descriptive observation of the data set is that at the K=10 level, few individuals are "pure types" with 99%+ ancestry from a single ancestry category.

There is no one in the sample with more than 83.1% Middle Eastern ancestry. There is no one with more than 86.6% Hindu Kush ancestry (a component that would be more accurately described as Kalish).

There are 2 of 2059 individuals with more than 99% European hunter-gatherer ancestry from many thousands of years ago (both of whom are ancient DNA samples with sequences released within the last couple of years).

There are 8 of 2059 individuals with more than 99% San Bushman ancestry.  There are 35 of 2059 individuals with more than 99% American Indian ancestry.  There are 6 of 2059 individuals with more than 99% North Siberian ancestry.  There are 19 of 2059 individuals with more than 99% East Asian ancestry.  There are 79 of 2059 individuals with more than 99% Sub-Saharan ancestry.    There are 14 of 2059 individuals with more than 99% Oceanian component (a component that would be more accurately described as Papuan).  There are 2 of 2059 individuals who are more than 99% East Siberian.

Thus, in a sample of 2059 modern and ancient individuals, only 163 are "pure type" individuals (less than 8% of the sample), while the rest have at least two measurable ancestral components in their genomes.  Two of the ten ancestral populations have no "pure type" representative (Middle Eastern and Hindu Kush), and a third has no modern "pure type" representatives.

Also, it is worth recognizing that representation in the sample is not proportionate to modern population size, and indeed, is deliberately chosen to over represent genetically distinct populations.  This is a maximally diverse sample, rather than a representative sample of human genetic diversity.

The populations that are pure types for the San Bushman, for Northern Siberians, and East Siberian (three of the seven ancestral types with modern representatives) are tiny relict populations that subsist in large part on hunting and gathering.

Pure type Papuans are present only on an island between Australia and China that has little contact with the outside world and uses traditional indigeneous non-mechanized agriculture.  Only a very small percentage of Native Americans a "pure blooded" and those who are generally live in economically marginal reservations or remote jungles or mountain villages.  The "pure type" individuals in all of these populations combined in the entire world alive today make up considerably less than 1% of the world's entire population.

Only the Sub-Saharan component and East Asian components have pure type individuals who are present in modern populations that are not tiny and marginalized.

No One Has Measurable Amounts Of All Components:

While few individuals are "pure types" almost no one has measurable contributions to their genome (defined as more than one part per 100,000) from all ten of the globally determined ancestral components.

I was able to identify only six individuals who had measurable amounts of nine of the ten ancestral components in the sample: Turkish4BA57, SaudiA7, HGDP00148 (Makrani, a South Asian Muslim ethnicity), Jordan646, usb25 (an Uzbek) and Yemenese1529.  Given that all of these individuals are from predominantly Muslim areas, it is plausible to infer that global, religiously mandated pilgrimages to Mecca have led to trace admixture in many Muslim populations from all over the Muslim world, and indirectly from almost everywhere around the globe.

Everyone in the sample of 2059 individuals (modern and ancient), except the 163 pure types and 6 nine type individuals, had two to eight ancestral components, and the lion's share have fewer than eight.

With a cutoff that excludes negligible contribution from an ancestral component (say, e.g., less than 0.1%), there would be no individuals with nine ancestral components, and the average number of ancestral components per person would be much smaller.

In any given region or ethnicity, individuals typically have slightly varying percentages of just a few components.

For example, of the seven Finnish people in the sample, all have generally similar percentages of four of the ten ancestral components (Middle Eastern 9.1%-14.8%, Northern Siberian (4.6%-9.7%), Hindu Kush (4.8%-13.0%), European Hunter-Gatherer (66.6%-76.2%).  Six of the seven had small amounts of Eastern Siberian 0.7%-2.5% ancestry, and four also having trace amounts of American Indian ancestry (0.2%-1.2%) including the one with non East Siberian ancestry.  None of the Finnish individuals had any San Bushman, East Asian, Sub-Saharan African, or Oceanian ancestry.

No comments: