Large-scale computational approaches to evolution and change prospects and pitfalls

Abstract

A first workshop on Large-scale computational approaches to evolution and change will be held at Evolang XV, Madison US. We aim to bring together language evolution, cutting-edge NLP, and LLM-driven approaches, to critically discuss novel opportunities of large-scale empirical approaches to language evolution and change.

Date
May 18, 2024 9:00 AM — 3:00 PM
Event
Evolang Workshop
Location
Evolang XV, Madison, US

To understand how and why a complex system like language works, understanding how it changes is key. Among the multitude of possible approaches to studying language change dynamics, the focus in this workshop is on the empirical study of large collections of linguistic data such as corpora and lexical databases. Scale necessitates computation: no human alone could read through billions of words fast enough. But machines can.

There is no time like now to apply machine learning to language. Advances in the NLP fields of semantic shift and lexical semantic change detection yield increasingly accurate automated inferences (Schlechtweg et al. 2020; Montanelli & Periti 2023, Tahmasebi et al 2021), primarily driven by various (large) language models. Machine-readable diachronic data at both short and long time scales has become abundant thanks to corpus building and digitization efforts, and gold standard test sets are available to objectively evaluate different approaches (e.g. Schlechtweg et al. 2021). Developments in generative LLMs have also come to a point where previous complex NLP pipelines and costly supervised learning architectures can often simply be replaced with zero-shot LLM queries without loss in performance (Ziems et al. 2023; Karjus 2023).

These approaches however don't come without limitations. While the availability of pretrained LLMs opens up new and simplifies previously complex research avenues, it is important to take into account and mitigate their biases (Dubossarsky et al. 2017) (which are inherent to any pretrained model). While research rooted in NLP often focuses rather on the what (has changed), the what can inform the how and why. It is therefore crucial to embed them in theoretically meaningful frameworks, and furthermore, delineate which detected changes may be driven by inherent linguistic mechanisms like selection and drift (Montero et al. 2023) or general cognitive factors like optimal encoding strategies in our mental categories (Dubossarsky et al. 2015), in contrast to those reflecting changes in the socio-cultural aspects of the language communities and their communicative needs (Kemp et al. 2018; Karjus et al. 2020; De Pascale & Marzo 2023).

This workshop seeks to bring together language evolution, cutting-edge NLP, and LLM-driven approaches, to critically discuss novel opportunities of large-scale empirical approaches to language evolution and change (cf. Hartmann 2020), but also the aforementioned issues. Submissions will be short abstracts, assessed by rigor and relevance to the following questions and themes:

  • How to combine large-scale computational language change detection with meaningful inference frameworks, and cognitively plausible theories of language evolution mechanisms?
  • To what extent are LLMs as zero-shot classifiers and inference engines applicable to the study of language change?
  • How to tease apart linguistic and the socio-cultural drivers of change at scale?
  • If and how can applied NLP benefit from evolutionary thinking?
  • How to evaluate and mitigate bias in pretrained language models? This includes issues of applying models trained on modern data to historical material, as well as potentially harmful social and other biases that a model may propagate from (at times unknown) training data.
  • How can experimental methods from cognitive science, psychology or social sciences be used to study machine bias or “behavior”? What are possible pitfalls of such method transfer?

Schedule

TIMEEVENTAUTHORSABSTRACTS
09:15-09:30IntroductionAndres Karjus & Nina Tahmasebi
09:30-10:15Plenary talk What's the "language" in Large Language Models?Claire Bowern
10:15-10:30Coffee break
10:30-11:00Modeling semasiological mechanisms of change by means of onomasiological comparisonsStefano De Pascale & Nina Tahmasebi
11:00-11:30Evolution in morphological complexity and word order rigidityJulie Nijs, Freek Van de Velde & Huybert Cuyckens
11:30-12:00Using large-scale computational approaches to reconstruct the evolutionary dynamics of lexical meaning and genderGerd Carling, Noor Efrat-Kowalsky, Marc Allassonière Tang, Lev Michael, Filip Larsson, & Niklas Erben Johansson
12:00-13:15Lunch break
13:15-14:30Plenary talk Computational Modeling of Linguistic LeadershipSandeep Soni
14:30-15:00Instructable LLMs for scaling data-driven language and culture researchAndres Karjus
15:00-15:30Information-theoretic measures to study change in language use: modeling socio-cultural up to local linguistic contextStefania Degaetano-Ortlieb
15:30-16:00General discussion & closure

References

  • De Pascale, S., Marzo, S., 2023. Lexical coherence in contemporary Italian: a lectometric analysis. Sociolinguistica 37, 145–166. https://doi.org/10.1515/soci-2022-0027
  • Dubossarsky, H., Weinshall, D., & Grossman, E., 2017. Outta control: Laws of semantic change and inherent biases in word representation models. In Proceedings of the 2017 conference on empirical methods in natural language processing.
  • Dubossarsky, H., Tsvetkov, Y., Dyer, C., & Grossman, E., 2015. A bottom up approach to category mapping and meaning change. In NetWordS (pp. 66-70).
  • Hartmann, S., 2020. Language change and language evolution: Cousins, siblings, twins? Glottotheory 11, 15–39. https://doi.org/10.1515/glot-2020-2003
  • Karjus, A. 2023. Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence. arXiv:2309.14379
  • Karjus, A., Blythe, R.A., Kirby, S., Smith, K., 2020. Quantifying the dynamics of topical fluctuations in language. Language Dynamics and Change 10, 86–125. https://doi.org/10.1163/22105832-01001200
  • Kemp, C., Xu, Y., Regier, T., 2018. Semantic Typology and Efficient Communication. Annual Review of Linguistics 4, 109–128. https://doi.org/10.1146/annurev-linguistics-011817-045406
  • Montanelli, S., Periti, F., 2023. A Survey on Contextualised Semantic Shift Detection. arXiv.2304.01666
  • Francesco Periti and Nina Tahmasebi. 2024. A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change. arXiv:2402.12011.
  • Francesco Periti, Sergio Picascia, Stefano Montanelli, Alfio Ferrara, and Nina Tahmasebi. 2023. Studying Word Meaning Evolution through Incremental Semantic Shift Detection: A Case Study of Italian Parliamentary Speeches. (2023) TechRxiv.
  • Montero, J.G., Karjus, A., Smith, K., Blythe, R.A., 2023. Reliable Detection and Quantification of Selective Forces in Language Change. arXiv.2305.15914
  • Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., Tahmasebi, N., 2020. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation. Presented at the SemEval 2020, International Committee for Computational Linguistics, Barcelona (online), pp. 1–23. https://doi.org/10.18653/v1/2020.semeval-1.1
  • Schlechtweg, D., Tahmasebi, N., Hengchen, S., Dubossarsky, H., McGillivray, B., 2021. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2021, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 7079–7091. https://doi.org/10.18653/v1/2021.emnlp-main.567
  • Tahmasebi, N., Borin, L., Jatowt, A., Xu, Y., Hengchen, S., 2021. Computational approaches to semantic change. Language Science Press, Berlin. https://doi.org/10.5281/zenodo.5040241
  • Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., Yang, D., 2023. Can Large Language Models Transform Computational Social Science? arXiv.2305.03514