Scientists have created the first-ever worldwide family tree

They used DNA from 3,600 people to figure out how we're all related.
Grant Currin
The map visualizes ancestral lineages that stretch across the world. Wohns et al., iStock/adam kaz

Researchers have solved a tremendously complex puzzle that explains a massive chunk of the human story.

They describe their findings in a paper published on Thursday in the journal Science.

"It's a single genealogy that traces the ancestry of all of humanity," Anthony Wilder Wohns, a postdoctoral researcher at the Broad Institute of MIT and Harvard and first author, tells IE.

His team used cutting-edge genomics methods and examined thousands of human genomes to explain a tremendous amount of the genetic variation across our extensive human family.

"It shows how we're all related to one another and who our common ancestors are back in time," he says.

Anthropologist Agustín Fuentes told IE it's "a really cool way to visualize these data and to analyze them." Still, he says calling it a genealogy of all humanity is going too far.

"They're using [the term] genealogy when they're talking about particular patterns of genetic ancestry," he says. For anthropologists, those aren't the same thing.

Genomics shows how closely all life is related

This remarkable research would not have been possible without an explosion of progress in genomics and related disciplines over the last couple of decades, and especially the last ten years. Scientists had never seen a (more or less) complete human genome until 2003 when, after 13 years and $3 billion, an international team completed the painstaking task of unraveling a person's DNA and recording the exact sequence of four molecules, called base pairs, that encoded most of that person's genetic information. (The sequence elucidated in 2003 was not truly complete - around 15% was missing due to technological limitations at the time that made it difficult to work out stretches of DNA with a large number of repeating base pairs.)

In humans, those four base pairs make up the instructions our cells call on for the know-how it takes to build and maintain our bodies. Most processes only use a small section of our DNA for the instructions relevant to that particular process at any particular moment, but the genome itself is enormous, with a total length of around 3 billion base pairs.

While the first genome sequenced came from an anonymous donor, that sequence of base pairs is 99.9 percent identical to every other person's genome. That's because "we're ridiculously [closely related] compared to almost all other mammals," Fuentes says.

"Humans are like the same damn thing everywhere."

Some of that overlap enables fundamental biological processes that are shared widely across the entire tree of life. That's why we share about 35 percent of our genome with daffodils. For chimpanzees, whose ancestors took a different path from our own around just six to seven million years ago, the overlap is a stunning 98.8 percent.

Most Popular

Sequencing an entire human genome and making sense of the data still isn't easy, but a handful of well-funded labs with advanced, high-throughput equipment have become good at extracting and handling that kind of data.

Being "related" is sort of complicated

Wohn's team started with digital files containing the genomes for 3,609 people.

To figure out how all those people could be related, the researchers used algorithms to make sense of the specific differences in the 0.1 percent of the genomes that vary from human to human.

"It all starts with that variation in the data," Wohn says.

The vast majority of the samples came from modern humans living worldwide. Four are from ancient humans who lived on the Eurasian steppe about 4,000 years ago. Three are from Neanderthals. One comes from a Denisovan, a now-extinct human relative that researchers only recently discovered.

"Theoretically, we [knew] that a genealogy exists that explains all of human history, but it's been very difficult to estimate what that looks like," he said.

That's because good data has been scarce, and it's not apparent how to go about building a family tree that can account for all the variation across many genomes.

One of the biggest challenges is the scale.

If you lay out the genomes of two siblings side by side, it would be apparent exactly where their DNA was different.

For identical twins, the sequences would be the same, minus roughly five early developmental mutations that likely occurred in the womb, according to a study published last year.

For other siblings, the genomes would be similar, with some noticeable differences. That's because siblings "inherit some of the same genes and some different ones," Wohns says.

"If you went along the genome[s], potentially even at each base pair, you could see that exactly how you're related to your sibling," Wohns says. By "related," he means genetically related. Do you have the same genes that code for eye color? For the ability to digest lactose? In many cases, full siblings would almost certainly share genes like these. In others, it's up to chance whether they do or not.

No matter how different two siblings might be, side-by-side analysis of their genomes would show that it takes just two ancestors — their parents — to account for the differences between the two samples.

The researchers needed a lot of data — and a ton of computing power

In the new paper, researchers did something similar, but for more than 3600 people.

"We put them together [and] looked at every position in the genome," Wohns says.

"We have algorithms that look at those patterns of genetic variation and reconstruct who the ancestors [must've been] that generated that pattern of variation," he says.

For a trait like blue eyes, for example, "we can see exactly who descended from the first person who had the mutation for blue eyes, and how everybody in our dataset is related to one another [on that part of the genome]."

Those conclusions made it possible for the researchers to infer how millions and millions of human ancestors were related.

The method allowed the investigators to trace lines of descent through the generations, tracking how mutations emerged and moved, to "create this single genealogy that describes all the human genetic variation in our dataset," he says.

The team turned to data about where and when the samples had come from to estimate where and when specific ancestors with particular genomes had lived. For some, it might have been 500 years ago. For others, it was more like 500,000 years.

In the end, the researchers needed to infer the existence of 27 million people to explain the patterns of genetic variation in their 3609 samples.

Telling the human story is a complicated undertaking

Agustín Fuentes, a researcher at Princeton whose work spans biological and cultural anthropology, says the paper, while very interesting, doesn't offer much new information about the history of our species.

"[For] those of us who study human evolutionary processes… not just the genetics, but the archeology, the paleoanthropology, and evolutionary modeling… it's just reinforcing what we already know," he says.

It's helpful to put all of this data into one giant "explanatory frame," but doing so also highlights some of the problems with the academic study of the past, he says.

For one thing, the genome is just part of the story, but genomic approaches like this paper get the lion's share of attention — and funding.

He says that while labs capable of "industrial-grade production of DNA data" produce "amazing" results, the handful of scholars who control those resources end up influencing which questions about our collective past are taken most seriously.

They're also forced to work within a set of institutions and incentives that structure the production of new knowledge in a particular way. This causes them to "[produce results], get published as fast as possible, get the preprint up, get the major journal article out, get the next funding, [and] move through higher throughput [and] larger analytic capacities," Fuentes says.

That's not a problem in and of itself. Still, it does tend to push researchers outside of those institutions — particularly those in formerly colonized countries — toward the outskirts of the discipline.

"What questions do different scientific communities around the world have? Do they always map to the ones that are coming out of Harvard, for example? I'm going to say probably not," he says.

Fuentes also raised some questions about how the current study's authors present their work.

"Any individual person today only has genetic material [from] a small subset of all of their actual genealogical ancestors," he says. "Genetic ancestry is a small subset of our overall genealogy, and they use those two words interchangeably. That's a problem."

He says it's more accurate to say the researchers constructed an ancestral recombination graph.

"Many of our very critical and important ancestors contributed no DNA to us whatsoever, and yet, they're still our ancestors," he says

But there's a big takeaway

What's not in dispute is that all humans alive today are very, very closely related to each other.

"In the same way that there's a family tree that shows how I'm related to my siblings [and] my parents," this new research is the "first draft" of a tree that shows "how I'm related to you and everybody else who ever lived," Wohns says.

According to Fuentes, we don't have to look very far into the past to see just how close those relations are.

"Most people share common ancestors in the last couple thousand years," he says.

If you cast your gaze even deeper into the past — say 15,000 to 25,000 years — it becomes clear that we humans are a big family, though not always a happy one.


The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history.

message circleSHOW COMMENT (1)chevron