Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics

This project was funded by UKRI-AHRC and the Irish Research Council under the ‘UK-Ireland Collaboration in the Digital Humanities Research Grants Call’ (grant numbers AH/W001934/1 and IRC/W001934/1).

August 2021–July 2024


This project will fuse deep qualitative analysis with cutting-edge computational methodologies to decode, interpret and curate the hidden heritages of Gaelic traditional narrative. In doing so, it will provide the most detailed account to date of convergence and divergence in the narrative traditions of Scotland and Ireland and, by extension, a novel understanding of their joint cultural history. Leveraging recent advances in Natural Language Processing, the consortium will digitise, convert and help to disseminate a vast corpus of folklore manuscripts in Irish and Scottish Gaelic.

The project team will create, analyse and disseminate a large text corpus of folktales from the Tale Archive of the School of Scottish Studies Archives and from the Main Manuscript Collection of the Irish National Folklore Collection. The creation of this corpus will involve the scanning of *c.*80k manuscript pages (and will also include pages scanned by the Dúchas digitisation project), the recognition of handwritten text on these pages (as well as some audio material in Scotland), the normalisation of non-standard text, and the machine translation of Scottish Gaelic into Irish. The corpus will then be annotated with document-level and motif-level metadata.

Analysis of the corpus will be carried out using data mining and phylogenetic techniques. Both the data mining and phylogenetic workstreams will encompass the entire corpus, however, the phylogenetic workstream will also focus on three folktale types as case studies, namely Aarne–Thompson–Uther (ATU) 400 ‘The Search for the Lost Wife’, ATU 425 ‘The Search for the Lost Husband’, and ATU 503 ‘The Gifts of the Little People’. The results of these analyses will be published in a series of articles and in a book entitled Digital Folkloristics. The corpus will be disseminated via Dúchas and Tobar an Dualchais, and via a new aggregator website (under construction) that will include map and graph visualisations of corpus data and of the results of our analysis.

Project team


  • Principal Investigator: Prof. William Lamb, The University of Edinburgh (School of Literatures, Languages and Cultures).
  • Co-Investigator: Prof. Jamshid Tehrani, Durham University (Department of Anthropology).
  • Co-Investigator: Dr Beatrice Alex, The University of Edinburgh (School of Literatures, Languages and Cultures).
  • Co-Investigator: Dr Barbara Hillers, Indiana University (Folklore and Ethnomusicology).

The University of Edinburgh

  • Postdoctoral Researcher: Julie-Anne Meaney
  • Technical Supervisor: Gavin Willshaw.
  • Scottish and University Collections Archivist: Kirsty Stewart.
  • Language Technician: Michael Bauer.
  • Digitisation and Data Entry Technician: Cristina Horvath; Catherine Banks.
  • Copyright Administrator: Louise Scollay.


  • Co-Principal Investigator: Dr Brian Ó Raghallaigh, Dublin City University (Fiontar & Scoil na Gaeilge).
  • Co-Investigator: Dr Críostóir Mac Cárthaigh, University College Dublin (National Folklore Collection).
  • Co-Investigator: Dr Tiber Falzett, University College Dublin (School of Irish, Celtic Studies and Folklore).

Dublin City University

  • Postdoctoral Researcher: Dr Andrea Palandri.
  • Research Assistant: Kate Ní Ghallchóir.
  • Research Assistant: Tiernan Gaffney.
  • Research Assistant: Monica Marion.

Other collaborators

  • Postgraduate Research Fellow: Monica Marion.
  • Academic Advisors: Úna Bhreathnach, Kevin Scannell.
  • Other members of the project Steering Group: Melissa Terras (Chairperson), Rachel Hosker, Floraidh Forrest (Tobar an Dualchais).



Publications (alphabetic order)

  • Junfan Huang, Beatrice Alex, Michael Bauer, David Salvador-Jasin, Yuchao Liang, Robert Thomas, William Lamb. 2023. ‘A transformer-based standardisation system for Scottish Gaelic’. In Proceedings of SIGUL 2023, Special Session on Celtic Languages. [pdf]
  • Will Lamb, Natasha Sumner, Gordon Wells. 2024. ‘Digital Developments in Scottish Studies’, Scottish Studies, 40, 65–82. DOI: 10.2218/ss.v40.9290.
  • Brian Ó Raghallaigh, Andrea Palandri, Críostóir Mac Cárthaigh. 2022. Handwritten Text Recognition (HTR) for Irish-Language Folklore. In Proceedings of the CLTW 4 @ LREC2022, 121–126. [pdf]
  • Mark Sinclair, William Lamb, Beatrice Alex. 2022. Handwriting Recognition for Scottish Gaelic. In Proceedings of the CLTW 4 @ LREC2022, 60–70. [pdf]

Conference papers (chronological order)

  • William Lamb & Brian Ó Raghallaigh, ‘Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics’, Scottish-Irish Cultural Diplomacy and Cultural Relations: Arts and Humanities in Focus, Edinburgh, 11 October 2022.
  • Brian Ó Raghallaigh, Tiber Falzett et al., ‘Providing full-text access to Scottish and Irish folklore archives: Decoding Hidden Heritages’, ICA-SUV, Dublin, 29–31 May 2023.
  • Tiber Falzett, Monica Marion et al., ‘Enhancing access to Scottish and Irish traditional narrative: Decoding Hidden Heritages’, SIEF2023, Brno, 7–10 June 2023.
  • Barbara Hillers, Monica Marion et al., ‘Mining the Celtic Folklore Archives: Decoding Hidden Heritages in Gaelic Traditional Narrative’, ICCS, Utrecht, 24–28 July 2023.
  • Beatrice Alex, ‘AI-driven language technologies and digital collections: the need for interdisciplinary communication, co-design and training’, Keynote, The 27th International Conference on Theory and Practice of Digital Libraries, Zadar, Croatia, 28 September 2023.
  • Barbara Hillers, Monica Marion et al., ‘Decoding Hidden Heritages in Gaelic Traditional Narrative’, American Folklore Society Annual Meeting 2023, Portland, Oregon, 1–4 November 2023.
  • Beatrice Alex, ‘How to run a successful (Digital Humanities) research project: Mining Celtic Folklore Archives’, Invited Lecture to the MSc course on Research Methods and Problems in English Literature, University of Edinburgh, Edinburgh. 29 January 2024.