Séminaires du CENTAL: Quantitative approaches to historical texts: some (non-)issues and how to tackle them


Quantitative methods for historical text analysis offer exciting opportunities for researchers interested in gaining new insights into long studied texts. However, the methodological underpinnings of these methods remains under-explored. In the first part of the talk I will show and discuss, through the use of a case study, the (non-)effect the OCR process has on a range of quantitative text analyses. In the second part of the talk, I will present a novel and totally unsupervised OCR post-correction method on the same dataset, as well as its most recent evolution on a highly-inflected language, Finnish.

Mar 8, 2023 1:00 PM — 3:00 PM
Louvain-la-Neuve, Belgium