The version I know: phylogenetic analysis for the Decoding Hidden Heritages project

Monica Marion is a graduate student at Indiana University, studying Folklore and Complex Networks and Systems, under the supervision of Dr Barbara Hillers. She studied Folklore as an undergraduate at Harvard College, where she also worked in Archaeology and Human Evolutionary Biology. Her research utilises biological and computational techniques to study storytelling, from traditional folktales to Internet communities.

To discuss stories as tradition, or heritage, is to imagine generations of storytellers, polishing and adapting tales over time, giving stories their own lineage. When I talk about comparing global folktales, I’m often asked if I believe that the stories are truly “related,” and can be traced back to a “common ancestor,” or if they might just be similar stories reinvented in different places. It’s true that certain themes are common to the human experience, but when one examines the details of an international folktale, it’s difficult to justify that they were independently invented.

As an example, take this short story about a hunchback, collected in 1937 by Willie Toohil as part of the Irish Schools’ Collection:

The Schools’ Collection, Volume 0146, Page 545

Compare that with the story entitled “The Man with the Wen”, collected in the early 20th century in Japan.

The two can be summarized by the same description: a deformed man comes across a group of supernatural performers. He ends up performing for them himself, and as a reward, they remove his deformity. A second man seeks to have his deformity removed the same way, but performing poorly, the supernaturals instead leave him twice deformed.

Of course, the stories also include important differences: in the Japanese story, the man’s deformity is a facial lump, while in the Irish version it’s a hunchback. In some cases, the different details work to “localize” the story to its cultural context: the supernatural performers are fairies in the Irish version, goblins in the Japanese version.

Stories as DNA:

One way to find the “lineages” of stories and get at their history is by tracing these differences over time and space. Like language or genetic code, a narrative is constructed out of smaller parts that change over time, and some of the changes persist, adapting into new story variants. So the same phylogenetic methods that trace lineages of species or linguistic change, can also be used on the more complicated data of stories.

Some of the “adaptations” are obvious, such as the changing supernatural creature, but in most cases it’s difficult to distinguish regional variation from the idiosyncrasies of an individual storyteller, or when and where the changes might have occurred. But when we look at a significant number of stories, the broad patterns of overlapping change begin to emerge.

The idea behind the Decoding Hidden Heritages project is to understand the histories and connections of Gaelic traditional narratives. With the rich folklore archives in Ireland and Scotland, we have enough versions of the “same” story to perform useful phylogenetic analysis.

For this part of the project, we’ve chosen to focus on the hunchback story I linked above. The story seems simple, and is widely told (and still shared today). However, even a short story contains a large amount of complexity.

Building Tales:

In order to compare, we need to understand how the story is constructed. What are its “genes” which vary across versions? The breakdown we have assembled has 62 traits and over 300 possible trait states. In the breakdown spreadsheet, the first part of the story looks like this:

Every plot point offers possible variation to the tale teller, and so we need to focus on the traits that would be passed down, or that have some specific relevance. This includes major and minor plot points, like whether or not the hunchback sings a certain song, or if the teller includes a “fairy queen” in charge of the hump removal. Published literary versions of the story often include a pedagogical moral that’s much rarer in oral versions. To test for this, and to trace literary influence in the oral tradition, we are noting with each version whether the first hunchback is described in positive terms (Crofton Croker’s version lauds him as “ever a good-natured little fellow”) and the second hunchback negatively (“a peevish and cunning creature from his birth”). Another feature of this story is that it’s sometimes told as a legend, a short story that could possibly be true, and sometimes as a folktale, a more fantastical story set in another world. To look at this distribution, we’re marking whether the version includes specific place names, such as the name of a nearby rath.

The work of this project begins by assembling all the available Gaelic versions of the story. For this step, we are helped by the work of earlier collectors and archivists who sorted the incoming stories by Tale Type. Invaluable indices point to all the available versions: some published, others in the digitized archives, and some still only available as the handwritten manuscripts. One of my duties this summer was to spend a week in the Irish National Folklore Archives reading fairy tales, an experience I would highly recommend.

Many members of this project are working to help process each of the 312 Irish and Scottish versions of the hunchback story, “coding” each one by manually filling out the spreadsheet for each of the traits.


The statistics and computation come in once all the stories have been transformed into strings of numbers that represent the trait states. With this data, we can do simple evaluations, such as crafting a basic family tree, or highlighting different traits on a map of the stories, to see which have regional variation. This dataset will allow folklorists to visualize the stories in a different, comparative way that would make patterns emerge more obviously.

More complex phylogenetic analysis will be able to incorporate different rates of trait variation, and reveal more about the inheritance process, as well as constructing likely relationships between the story variants. Hopefully we’ll see distinct relationships emerge between Scottish and Irish, as well as more localized branches. The analysis will provide a better understanding of which changes matter, and the unconscious processes of a storyteller changing and preserving particular details.

After tackling this “simple” hunchback story, the next step will be looking at some of the more complex stories in the Gaelic repertoire. This is (for now) the last step for these stories which have passed from spoken word to recording to manuscript to digital image to digital text to string of code. Each representation of the story gives us something different, so we can return to the storyteller’s performance with better understanding.

Author: Monica Marion


