5th International Workshop on Computational Approaches to Historical Language Change 2024 (LChange'24)

Abstract

LChange'24 is the fifth workshop for computational approaches to historical language change with the focus on digital text corpora. Come join us for this exciting adventure!

Date
Aug 15, 2024 10:00 AM — 6:00 PM
Event
5th International Workshop on Computational Approaches to Historical Language Change 2024 (LChange'24)
Location
Bangkok

The workshop builds upon its first iteration in 2019, and the subsequent events (2021, 2022, 2023). It will be colocated with ACL 2024 in Bangkok (Thailand), as a hybrid event. The workshop will take place on Thursday 15 August 2024.

Start-EndTitleAuthor(s)Link(s)
9.15-9.30INTRODUCTION
9.30-10.30KEYNOTE Antske Fokkens, Moderator: Nina Tahmasebi
10.30-11.00COFFEE BREAK
SESSION 1 Chair: Francesco Periti
11.00-11.20A Semantic Distance Metric Learning approach for Lexical Semantic Change DetectionTaichi Aida, Danushka Bollegala
11.20-11.40Towards a GoldenHymns Dataset for Studying Diachronic Trends in 19th Century Danish Religious HymnsEa Lindhardt Overgaard, Pascale Feldkamp, Yuri Bizzoni
11.40-12.00Definition generation for lexical semantic change detectionMariia Fedorova, Andrey Kutuzov, Yves Scherrer
12.00-13.00LUNCH BREAK
13.00-13.45KEYNOTE Johann-Mattis List, Moderator: Andrey Kutuzov
SESSION 2 Chair: Pierluigi Cassotti
13.45-14.05Towards an Onomasiological Study of Lexical Semantic Change Through the Induction of ConceptsBastien Liétard, Mikaela Keller, Pascal Denis
14.05-14.25Towards a Complete Solution to Lexical Semantic Change: an Extension to Multiple Time Periods and Diachronic Word Sense InductionFrancesco Periti, Nina Tahmasebi
14.25-14.45AXOLOTL’24 Shared Task on Multilingual Explainable Semantic Change ModelingMariia Fedorova, Timothee Mickus, Niko Tapio Partanen, Janine Siewert, Elena Spaziani, Andrey Kutuzov
POSTER PITCH
15.30-16.30POSTER SESSION
16.30-17.30ROUND TABLE, Moderator: Andrey Kutuzov
17.30-17.45CLOSING REMARKS

Poster presentations:

  • TartuNLP @ AXOLOTL-24: Leveraging Classifier Output for New Sense Detection in Lexical Semantics -- Aleksei Dorkin, Kairit Sirts
  • Deep-change at AXOLOTL-24: Orchestrating WSD and WSI Models for Semantic Change Modeling -- Denis Kokosinskii, Mikhail Kuklin, Nikolay Arefyev
  • Can political dogwhistles be predicted by distributional methods for analysis of lexical semantic change? -- Max Boholm, Björn Rönnerstrand, Ellen Breitholtz, Robin Cooper, Elina Lindgren, Gregor Rettenegger, Asad B. Sayeed
  • EtymoLink: A Structured English Etymology Dataset -- Yuan Gao, Weiwei Sun
  • Similarity-Based Cluster Merging for Semantic Change Modeling -- Christopher Brückner, Leixin Zhang, Pavel Pecina
  • Historical Ink: Semantic Shift Detection for 19th Century Spanish -- Tony Montes, Laura Manrique-Gómez, Ruben Manrique
  • Complexity and Indecision: A Proof-of-Concept Exploration of Lexical Complexity and Lexical Semantic Change -- David Alfter
  • Exploring Sound Change Over Time: A Review of Computational and Human Perception -- Siqi He, Wei Zhao
  • A Few-shot Learning Approach for Lexical Semantic Change Detection Using GPT-4 -- Zhengfei Ren, Annalina Caputo, Gareth J. F. Jones
  • A Feature-Based Approach to Annotate the Syntax of Ancient Chinese -- Chenrong Zhao
  • Exploring Diachronic and Diatopic Changes in Dialect Continua: Tasks, Datasets and Challenges -- Melis Çelikkol, Lydia Körber, Wei Zhao
  • Improving Word Usage Graphs with Edge Induction -- Bill Noble, Francesco Periti, Nina Tahmasebi
  • Presence or Absence: Are Unknown Word Usages in Dictionaries? -- Xianghe Ma, Dominik Schlechtweg, Wei Zhao

Keynote Talks

This year we are happy to welcome Antske Fokkens and Johann-Mattis List as keynote speakers.

Antske Fokkens (Computational Linguistics & Text Mining Lab, Vrije Universiteit Amsterdam - Algorithms, Geometry & Applications, Eindhoven University of Technology)

Title of talk: What Changes in Language Modeling mean for Modeling Language Change

Language change detection has emerged as a subdomain that has caught the interest (computational) linguistics, historians, social scientists and computer scientists. Despite this enthusiasm and stable attention from the NLP community over multiple years, our methods keep on having difficulties in distinguishing valid signals of change from noise. This holds both for methods using static word embeddings as well as for more recent explorations with methods that make use of contextual embeddings. The question of how to distinguish true signal from noise has received substantial attention from the field, with the design of benchmarks, control tests and artificially created samples and data. An aspect that has, to my knowledge, received less attention is the fundamental differences between most methods using static on the one hand, and most methods using contextualized embeddings on the other hand. Mainly, methods that make use of static embeddings involve creating new embeddings for the full vocabular creating general shifts in space. Methods using contextualized embeddings on the other hand mostly make use of pretrained language models, either as is or with some continual training on the target corpus. Change is then studied by comparing instances including target terms from different corpora. In this talk, I will explore what these fundamental differences mean when carrying out methodological checks and balances for studying language change with the aim of answering the question: how can we find meaningful change and know that is meaningful.

Johann-Mattis List (Chair of Multilingual Computational Linguistics, University of Passau)

Title of talk: New Approaches in Computer-Assisted Language Comparison

The field of computer-assisted language comparison seeks to develop interactive computational workflows that facilitate those tasks that linguists working in the field of historical or typological language comparison usually carry out manually. While the field has substantially grown over the past decade, with new tools and new workflows that support computer-assisted analyses, there remain many challenges that have so far not yet been addressed in computer-assisted approaches. In this study, three new approaches that facilitate detailed comparative analysis will be presented. The first approach allows for an efficient manual labeling of correspondence patterns in comparative wordlists, the second approach allows to group sounds in phonetically transcribed wordlists and to segment words into morphemes. The third approach allows to correct individual word forms in comparative wordlists, by contrasting the reflexes of a proto-form that one would expect under the assumption of regular sound change with the reflexes that are attested in the data. All approaches are implemented in an interactive web-based tool that is freely available and integrated with previous computer-assisted tools and workflows.

We hope to make this fifth edition another resounding success!

The main topic of the workshop remains the same: all aspects around computational approaches to historical language change with the focus on digital text corpora. LChange'19 resulted in a book on Computational approaches to semantic change.

Important Dates

  • May 17, 2024: Paper submission
  • June 26, 2024: Notification of acceptance
  • July 5, 2024: Camera-ready papers due
  • August 15, 2024: Workshop date

Workshop Topics

This workshop explores state-of-the-art computational methodologies, theories and digital text resources on exploring the time-varying nature of human language.

The aim of this workshop is three-fold. First, we want to provide pioneering researchers who work on computational methods, evaluation, and large-scale modelling of language change an outlet for disseminating cutting-edge research on topics concerning language change. We want to utilize this workshop as a platform for sharing state-of-the-art research progress in this fundamental domain of natural language research.

Second, in doing so we want to bring together domain experts across disciplines by connecting researchers in historical linguistics with those that develop and test computational methods for detecting semantic change and laws of semantic change; and those that need knowledge (of the occurrence and shape) of language change, for example, in digital humanities and computational social sciences where text mining is applied to diachronic corpora subject to e.g., lexical semantic change.

Third, the detection and modelling of language change using diachronic text and text mining raise fundamental theoretical and methodological challenges for future research.

Besides these goals, this workshop will also support discussion on the evaluation of computational methodologies for uncovering language change. SemEval2020 Task1 on unsupervised detection of lexical semantic change attracted three figure submission numbers and a total of 21 submitted system papers. Since then, three more tasks have been completed in Italian, Russian, and Spanish.

We invite original research papers from a wide range of topics, including but not limited to:

  • Novel methods for detecting diachronic semantic change and lexical replacement
  • Automatic discovery and quantitative evaluation of laws of language change
  • Computational theories and generative models of language change
  • Sense-aware (semantic) change analysis
  • Diachronic word sense disambiguation
  • Novel methods for diachronic analysis of low-resource languages
  • Novel methods for diachronic linguistic data visualization
  • Novel applications and implications of language change detection
  • Quantification of sociocultural influences on language change
  • Cross-linguistic, phylogenetic, and developmental approaches to language change
  • Novel datasets for cross-linguistic and diachronic analyses of language

Submissions

URL for submissions: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/LChange.

We accept two types of submissions, long and short papers, following the ACL 2024 style (see eg the Overleaf template), and the ACL submission policy.

Long and short papers may consist of up to eight (8) and four (4) pages of content, respectively, plus unlimited references; final versions will be given one additional page of content so that reviewers' comments can be taken into account.

LChange’24 also welcomes papers focusing on releasing a dataset or a model; these papers fall into the short paper category. To encourage model and dataset sharing at the reviewing phase, model and dataset papers do not need to be anonymous.

Accepted papers will be presented orally or as posters and included in the workshop proceedings. Submissions are open to all, and are to be submitted anonymously. All papers will be refereed through a double-blind peer review process by at least three reviewers with final acceptance decisions made by the workshop organizers.

Shared Task

This year, and echoing LChange 22 in Dublin, we are happy to host a shared task within LChange: the AXOLOTL-24 Shared Task on Explainable Semantic Change Modeling. AXOLOTL-24 stands for “Ascertain and eXplain Overhauls of the Lexicon Over Time at LChange'24” and is organised by Mariia Fedorova and Andrey Kutuzov (University of Oslo), Timothee Mickus, Niko Partanen and Janine Siewert (University of Helsinki), and Elena Spaziani (Sapienza University Rome).

The shared task presents two subtasks:

  • Finding the target word usages associated with new, gained senses
  • Describing these senses in a way that facilitates understanding and lexicographical research.

More information, including timeline and instructions, is available on https://github.com/ltgoslo/axolotl24_shared_task/.

Contact

Contact us if you have any questions.

If you have published in the field previously, and are inrerested in helping out in the PC to review papers, send us an email.

Organisers: Nina Tahmasebi, Syrielle Montariol, Andrey Kutuzov, David Alfter, Francesco Periti , and Pierluigi Cassotti.

Anti-Harassment Policy

Our workshop highly values the open exchange of ideas, the freedom of thought and expression, and respectful scientific debate. We support and uphold the ACL Anti-Harassment policy, and any workshop participant should feel free to contact any of the workshop organisers or ACL (acl@aclweb.org), in case of any issues.

References:

  • Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg, Haim Dubossarsky. Challenges for Computational Lexical Semantic Change. Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, Simon Hengchen (eds). Computational Approaches to Semantic Change. Berlin: Language Science Press.
  • Nina Tahmasebi, Adam Jatowt, Lars Borin. Survey of Computational Approaches to Lexical Semantic Change Detection. Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, Simon Hengchen (eds). Computational Approaches to Semantic Change. Berlin: Language Science Press.
  • Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. 2018. Diachronic word embeddings and semantic shifts: a survey. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1384–1397, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  • Francesco Periti and Nina TahmasebiA Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change. (2024) arXiv:2402.12011.
  • Stefano Montanelli and Francesco Periti, A Survey on Contextualised Semantic Shift Detection. (2023) arXiv:2304.01666.
  • Pierluigi Cassotti, Lucia Siciliani, Marco DeGemmis, Giovanni Semeraro, Pierpaolo Basile, XL-LEXEME: WiC Pretrained Model for Cross-Lingual LEXical sEMantic changE. (2023) In Proc. of ACL2023
  • Francesco Periti, Sergio Picascia, Stefano Montanelli, Alfio Ferrara, and Nina Tahmasebi, Studying Word Meaning Evolution through Incremental Semantic Shift Detection: A Case Study of Italian Parliamentary Speeches. (2023) TechRxiv.
  • Baayen, R. Harald. 2001. Word Frequency Distributions. Dordrecht: Kluwer Academic Publishers.
  • Koplenig, Alexander. 2015. The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Digital Scholarship in the Humanities fqv037. https://doi.org/10.1093/llc/fqv037.
  • Koplenig, Alexander, Sascha Wolfer & Carolin Müller-Spitzer. 2019. Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size. Entropy 21(5). https://doi.org/10.3390/e21050464. http://www.mdpi.com/1099-4300/21/5/464.
  • Labov, William. 1994. Principles of linguistic change (Language in Society 20). Oxford, UK ; Cambridge [Mass.]: Blackwell.
  • Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Verses, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, et al. 2010. Quantitative Analysis of Culture Using Millions of Digitized Books (Supporting Online Material II). Science 331(14). http://www.sciencemag.org/content/331/6014/176/suppl/DC1 (5 March, 2014).
  • Pechenick, Eitan Adam, Christopher M. Danforth & Peter Sheridan Dodds. 2015. Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution. (Ed.) Alain Barrat. PLOS ONE 10(10). e0137041. https://doi.org/10.1371/journal.pone.0137041.
  • Szmrecsanyi, Benedikt. 2016. About text frequencies in historical linguistics: Disentangling environmental and grammatical change. Corpus Linguistics and Linguistic Theory 12(1). 153–171. https://doi.org/10.1515/cllt-2015-0068.