Over the past decade the study of lexical and semantic change has attracted attention by research groups across disciplines. Historical linguists, computational semanticists, corpus linguists, lexical typologists, and cognitive scientists increasingly share data, methods, and theoretical frameworks that have contributed to vibrant progress in this domain. The aim of this workshop is to bring together the latest research of these communities on data-driven approaches to lexical and semantic change, with a particular focus on tackling theoretically grounded research questions.
We observe three methodological developments that are already driving the convergence and cooperation among research traditions, and are likely to make a lasting impact on the study of lexical and semantic change. First, the large-scale digitization of historical archives has dramatically expanded access to textual data, with projects such as Impresso (2025) for Swiss and Luxemburgish newspapers, Delpher for Dutch, and Gallica for French. Second, advances in distributional semantics and NLP (Gemma Team et al. 2024; Grattafiori et al. 2024) have enabled scholars to model meaning variation across diverse historical corpora. Third, large-scale lexical databases (Rzymski et al. 2020; Carling et al. 2023; Dehouck et al. 2023; Bocklage et al. 2024) provide shared infrastructures for crosslinguistic and diachronic research.
In comparative lexical semantics, exemplified by the work on colexification (François 2008), the wealth of databases and machine learning tools has already been exploited for large-scale and crosslinguistic studies in semantic change (Brochhagen et al. 2023; Xu, Malt & Srinivasan 2017; Xu et al. 2020). In this interdisciplinary strand of work, the explicit goal is to find higher-order patterns of language change. Synchronic and diachronic datasets are linked in complex and innovative way, and analyzed through advanced computational modelling. This has brought to light, among others, the commonalities between children’s semantic extension in language development, or salient cognitive mechanisms, such as affect and conceptual associativity, with the world-wide diversity of lexicalization strategies of concepts .
In historical linguistics, on the other hand, an extensive literature exists on the importance of prototypes, analogy, metonymical and metaphorical relations as drivers of semantic shifts (Traugott & Dasher 2001; Juvonen & Koptjevskaja-Tamm 2016). Corpus-driven historical semantics has often favored individual, but more in-depth case studies (Fonteyn & Manjavacas 2021), where annotation at the token-level of individual and contextualized occurrences is much more common (Geeraerts et al. 2024).
Topics and research questions
Building on these points this workshop seeks to connect data-driven methodologies with foundational questions in lexical and semantic change. We welcome contributions on:
We invite contributions that address the following interconnected issues, which together define a shared research agenda across historical linguistics, typology, corpus studies, cognitive semantics, and computational modeling. The following are examples questions, and by no means an exhaustive list:
1. From data points to meaning dynamics
The growing availability of multilingual corpora and lexical databases enables us to trace meaning change across thousands of words and several languages. Yet large-scale methods often overlook the fine-grained distinctions that emerge in specific contexts, while qualitative studies rarely scale beyond individual case studies. How can corpus-based and computational approaches be combined to bridge these levels of analysis? What token-level properties of meaning, such as prototypical versus peripheral uses, can be captured using broad datasets? Conversely, how can higher-level semantic categories such as affect or aspect be made operational for annotating individual word uses?
2. Rethinking the mechanism of semantic change Traditional classifications (specialization, generalization, metaphor, metonymy, pejoration, amelioration) remain the foundation of historical semantics. Though long unchallenged, recent corpus-based research (Ceuppens & De Smet 2025) and computational studies (Cassotti, De Pascale & Tahmasebi 2024) have begun to reassess these mechanisms and their interactions.Do these classical distinctions still capture the main forces of lexical change, or should we redefine them in light of quantitative evidence? Are there mechanisms overlooked by earlier theory, such as (literal) similarity-based change? And how can NLP tools help evaluate or refine the descriptive framework inherited from traditional semantics?
3. Connecting the lexicon’s two perspectives on change Meaning change operates both semasiologically (a word’s senses diversify or narrow) and onomasiologically (concepts are re-labeled or reorganized). The corpus-based integration of these two perspectives is now enjoying active exploration (De Smet 2019; Cai & De Smet 2024; Geeraerts et al. 2024), but the full breadth of this interaction still awaits thorough examination. Can large-scale data link changes in word meaning with changes in lexical organization across languages? Do processes like borrowing or word formation correlate systematically with particular semantic drifts, and how might typological diversity shape these interactions?
4. Making meaning change measurable Annotation frameworks for syntax and morphology are well developed, but semantic change still lacks standardized, reproducible measures (Van de Velde & Petré 2020). Advances in distributional semantics and large language models create new opportunities, but also raise issues of interpretability and reproducibility (Karjus 2025). How can historical linguistics and NLP jointly develop transparent procedures for quantifying semantic change? Can LLM-based annotation complement manual approaches without obscuring linguistic interpretability?
5. Language systems and their cultural environments Historical corpora encode not just linguistic structure but also the sociocultural contexts of communication: ideologies, practices, and topical interests. These contexts influence how meanings shift, yet teasing apart linguistic from social dynamics remains difficult. Which interdisciplinary strategies (quantitative baselines, contextual annotation, or comparative modeling) best help separate system-internal trends from socially contingent variation? How might sociolinguistic modeling clarify the balance between semantic innovation and pragmatic adaptation?
6. The time depth of modern models Transformer-based language models trained on contemporary text are increasingly used for diachronic analysis, yet they may project present-day biases onto historical materials (Manjavacas & Fonteyn 2022). How can computational linguists and digital humanists quantify and mitigate such temporal bias? Does it distort certain semantic domains more than others, and how can domain adaptation or calibration ensure that models trained today remain faithful to past language states?
References