A first workshop on Large-scale computational approaches to evolution and change will be held at Evolang XV, Madison US. We aim to bring together language evolution, cutting-edge NLP, and LLM-driven approaches, to critically discuss novel opportunities of large-scale empirical approaches to language evolution and change.
To understand how and why a complex system like language works, understanding how it changes is key. Among the multitude of possible approaches to studying language change dynamics, the focus in this workshop is on the empirical study of large collections of linguistic data such as corpora and lexical databases. Scale necessitates computation: no human alone could read through billions of words fast enough. But machines can.
There is no time like now to apply machine learning to language. Advances in the NLP fields of semantic shift and lexical semantic change detection yield increasingly accurate automated inferences (Schlechtweg et al. 2020; Montanelli & Periti 2023, Tahmasebi et al 2021), primarily driven by various (large) language models. Machine-readable diachronic data at both short and long time scales has become abundant thanks to corpus building and digitization efforts, and gold standard test sets are available to objectively evaluate different approaches (e.g. Schlechtweg et al. 2021). Developments in generative LLMs have also come to a point where previous complex NLP pipelines and costly supervised learning architectures can often simply be replaced with zero-shot LLM queries without loss in performance (Ziems et al. 2023; Karjus 2023).
These approaches however don't come without limitations. While the availability of pretrained LLMs opens up new and simplifies previously complex research avenues, it is important to take into account and mitigate their biases (Dubossarsky et al. 2017) (which are inherent to any pretrained model). While research rooted in NLP often focuses rather on the what (has changed), the what can inform the how and why. It is therefore crucial to embed them in theoretically meaningful frameworks, and furthermore, delineate which detected changes may be driven by inherent linguistic mechanisms like selection and drift (Montero et al. 2023) or general cognitive factors like optimal encoding strategies in our mental categories (Dubossrsky et al. 2015), in contrast to those reflecting changes in the socio-cultural aspects of the language communities and their communicative needs (Kemp et al. 2018; Karjus et al. 2020; Pascale & Marz 2023).
This workshop seeks to bring together language evolution, cutting-edge NLP, and LLM-driven approaches, to critically discuss novel opportunities of large-scale empirical approaches to language evolution and change (cf. Hartmann 2020), but also the aforementioned issues. Submissions will be short abstracts, assessed by rigor and relevance to the following questions and themes:
- How to combine large-scale computational language change detection with meaningful inference frameworks, and cognitively plausible theories of language evolution mechanisms?
- To what extent are LLMs as zero-shot classifiers and inference engines applicable to the study of language change?
- How to tease apart linguistic and the socio-cultural drivers of change at scale?
- If and how can applied NLP benefit from evolutionary thinking?
- How to evaluate and mitigate bias in pretrained language models? This includes issues of applying models trained on modern data to historical material, as well as potentially harmful social and other biases that a model may propagate from (at times unknown) training data.
- How can experimental methods from cognitive science, psychology or social sciences be used to study machine bias or “behavior”? What are possible pitfalls of such method transfer?
A tentative schedule
The workshop will be organized into 20+10 minute talk slots, with interleaved talks by a few select invited speakers and the organizers, and open call submissions. We anticipate the following schedule, which leaves room for a total of 14 talks.
- 08.50-11.00 Opening, talks block 1
- 11.00-11.30 Coffee break
- 11.30-13.00 Talks block 2
- 13.00-14.00 Lunch
- 14.00-16.00 Talks block 3
- 16.00-16.15 Short coffee break
- 16.15-17.45 Talks block 4
- 17.45-18.00 Closing discussion
ReferencesDe Pascale, S.D., Marzo, S., 2023. Lexical coherence in contemporary Italian: a lectometric analysis. Sociolinguistica 37, 145–166. https://doi.org/10.1515/soci-2022-0027Dubossarsky, H., Weinshall, D., & Grossman, E., 2017. Outta control: Laws of semantic change and inherent biases in word representation models. In Proceedings of the 2017 conference on empirical methods in natural language processing.Dubossarsky, H., Tsvetkov, Y., Dyer, C., & Grossman, E., 2015. A bottom up approach to category mapping and meaning change. In NetWordS (pp. 66-70).Hartmann, S., 2020. Language change and language evolution: Cousins, siblings, twins? Glottotheory 11, 15–39. https://doi.org/10.1515/glot-2020-2003Karjus, A. 2023. Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence. arXiv:2309.14379Karjus, A., Blythe, R.A., Kirby, S., Smith, K., 2020. Quantifying the dynamics of topical fluctuations in language. Language Dynamics and Change 10, 86–125. https://doi.org/10.1163/22105832-01001200Kemp, C., Xu, Y., Regier, T., 2018. Semantic Typology and Efficient Communication. Annual Review of Linguistics 4, 109–128. https://doi.org/10.1146/annurev-linguistics-011817-045406Montanelli, S., Periti, F., 2023. A Survey on Contextualised Semantic Shift Detection. arXiv.2304.01666Montero, J.G., Karjus, A., Smith, K., Blythe, R.A., 2023. Reliable Detection and Quantification of Selective Forces in Language Change. arXiv.2305.15914Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., Tahmasebi, N., 2020. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation. Presented at the SemEval 2020, International Committee for Computational Linguistics, Barcelona (online), pp. 1–23. https://doi.org/10.18653/v1/2020.semeval-1.1Schlechtweg, D., Tahmasebi, N., Hengchen, S., Dubossarsky, H., McGillivray, B., 2021. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2021, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 7079–7091. https://doi.org/10.18653/v1/2021.emnlp-main.567Tahmasebi, N., Borin, L., Jatowt, A., Xu, Y., Hengchen, S., 2021. Computational approaches to semantic change. Language Science Press, Berlin. https://doi.org/10.5281/zenodo.5040241Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., Yang, D., 2023. Can Large Language Models Transform Computational Social Science? arXiv.2305.03514