A first workshop on Large-scale computational approaches to evolution and change will be held at Evolang XV, Madison US. We aim to bring together language evolution, cutting-edge NLP, and LLM-driven approaches, to critically discuss novel opportunities of large-scale empirical approaches to language evolution and change.
To understand how and why a complex system like language works, understanding how it changes is key. Among the multitude of possible approaches to studying language change dynamics, the focus in this workshop is on the empirical study of large collections of linguistic data such as corpora and lexical databases. Scale necessitates computation: no human alone could read through billions of words fast enough. But machines can.
There is no time like now to apply machine learning to language. Advances in the NLP fields of semantic shift and lexical semantic change detection yield increasingly accurate automated inferences (Schlechtweg et al. 2020; Montanelli & Periti 2023, Tahmasebi et al 2021), primarily driven by various (large) language models. Machine-readable diachronic data at both short and long time scales has become abundant thanks to corpus building and digitization efforts, and gold standard test sets are available to objectively evaluate different approaches (e.g. Schlechtweg et al. 2021). Developments in generative LLMs have also come to a point where previous complex NLP pipelines and costly supervised learning architectures can often simply be replaced with zero-shot LLM queries without loss in performance (Ziems et al. 2023; Karjus 2023).
These approaches however don't come without limitations. While the availability of pretrained LLMs opens up new and simplifies previously complex research avenues, it is important to take into account and mitigate their biases (Dubossarsky et al. 2017) (which are inherent to any pretrained model). While research rooted in NLP often focuses rather on the what (has changed), the what can inform the how and why. It is therefore crucial to embed them in theoretically meaningful frameworks, and furthermore, delineate which detected changes may be driven by inherent linguistic mechanisms like selection and drift (Montero et al. 2023) or general cognitive factors like optimal encoding strategies in our mental categories (Dubossarsky et al. 2015), in contrast to those reflecting changes in the socio-cultural aspects of the language communities and their communicative needs (Kemp et al. 2018; Karjus et al. 2020; De Pascale & Marzo 2023).
This workshop seeks to bring together language evolution, cutting-edge NLP, and LLM-driven approaches, to critically discuss novel opportunities of large-scale empirical approaches to language evolution and change (cf. Hartmann 2020), but also the aforementioned issues. Submissions will be short abstracts, assessed by rigor and relevance to the following questions and themes:
Schedule
TIME | EVENT | AUTHORS | ABSTRACTS |
---|---|---|---|
09:15-09:30 | Introduction | Andres Karjus & Nina Tahmasebi | |
09:30-10:15 | Plenary talk What's the "language" in Large Language Models? | Claire Bowern | |
10:15-10:30 | Coffee break | ||
10:30-11:00 | Modeling semasiological mechanisms of change by means of onomasiological comparisons | Stefano De Pascale & Nina Tahmasebi | |
11:00-11:30 | Evolution in morphological complexity and word order rigidity | Julie Nijs, Freek Van de Velde & Huybert Cuyckens | |
11:30-12:00 | Using large-scale computational approaches to reconstruct the evolutionary dynamics of lexical meaning and gender | Gerd Carling, Noor Efrat-Kowalsky, Marc Allassonière Tang, Lev Michael, Filip Larsson, & Niklas Erben Johansson | |
12:00-13:15 | Lunch break | ||
13:15-14:30 | Plenary talk Computational Modeling of Linguistic Leadership | Sandeep Soni | |
14:30-15:00 | Instructable LLMs for scaling data-driven language and culture research | Andres Karjus | |
15:00-15:30 | Information-theoretic measures to study change in language use: modeling socio-cultural up to local linguistic context | Stefania Degaetano-Ortlieb | |
15:30-16:00 | General discussion & closure |
References