Change is Key! is a research program in which we aim to create computational tools to turn text into a story of both our language, our societies and culture and how these have changed over time.
Firstly, we will develop corpus-based methods for detecting semantic change (over time) and variation (across social groups and media types). This will create general tools for the study and detection of language change at large-scale and directly benefit historical linguistics and lexicography. Secondly, in collaboration with researchers from each field, we aim to answer research questions in social sciences, gender studies, and literary studies.
The program spans six years (2022 - 2027) with a total of 11 researchers, one research engineer and six partner universities.
In 2023, we will co-organize the fourth edition of International Workshop on Computational Approaches to Historical Language Change, (LChange'23) that will be co-located with EMNLP'23. In 2024, we will co-organize the first workshop edition of Large-scale computational approaches to evolution and change at Evolang XV.
This research program is funded by the Riksbankens Jubileumsfond under reference number M21-0021 for a total of 33.5 Million SEK.
We are giving a course at the LOT Winter School 2024. We are offering a hands-on advanced course open to second-year RM students and PhD students on the topic of computational modeling on semantic change.
The programme offers a wide variety of relevant topics in linguistics, taught by national and international researchers. The levels of the courses range from introductory (RM1 courses) to intermediate and advanced (regular courses). More details on winterschool wepage: https://lotschool.nl/events/lot-winter-school-2024/
Language changes over time in processes that often span long time periods. However, modern events like the current situation around Covid-19 has stressed that the cultural aspects of words and their meaning can change radically over short periods of time as well: isolation today carries a stronger sense of hopelessness and an extreme negative connotation. Vaccine, while also previously having both a positive and negative connotation, today carries with it a sense of hope; Once the vaccine is in place, life will go back to normal. Our linguistic resources, and our cultural existence are intertwined and must be studied as a single whole. To understand our contemporary and historical societies, we must understand the language used to describe them.
Researchers in text-based humanities and social sciences have always faced hurdles caused by semantic change and linguistic variation on a regular basis (words acquire new subtle meanings, or are replaced by other, more prominent words). Despite technological breakthroughs to alleviate the problems, they are still left to handle these changes on their own using resources like dictionaries, that are slow to update and cover little of our language and its actual use. They risk missing out on important textual clues and are limited to small-scale manual analysis. In addition, many humanities and social science researchers are interested in changing phenomena portrayed in language.
While acknowledging that textual resources do not have representation of all parts of society, with the socio-economically weak being significantly less represented, these resources are never-the-less reaching an impressive and unprecedented part of our society. By opening up modern and historical Swedish textual resources, social media included, we have enormous possibilities to study our world with reasonable efforts in data collection, and minimal interference to the objects we study.
The program will run over six years (2022--2027) and is funded by Riksbankens Jubileumsfond for a total of 33.5 Million SEK. It and constitutes a core language technology research team. In addition, we have researchers from (historical) linguistics, lexicography, analytical sociology, gender studies and literary studies. The program comprises five subprojects, out of which three are core language technology and NLP, one relates to reevaluating existing change hypotheses proposed by historical linguists, and the final consist of four humanities and social science projects including lexicography.
We will identify and eliminate linguistic barriers caused by language change to open up our textual accounts of the world to researchers from a wide range of fields; sociology, cultural studies, history, literature, journalism and religion. We will also apply our change detection methodology directly to answer different HSS research questions in our application projects.
Historical Linguistic: There are many open questions in the burgeoning field of quantitative semantics, which we cannot currently answer with existing computational methods; How do lexical change and semantic change interact? Why do different parts of the vocabulary change at different speeds? How does change spread throughout the lexicon? High quality case studies of change often produce hypotheses, and we will provide tools to test and quantify these hypotheses using large-scale methods developed within this program.Our corpus-based studies will feed insights into our models, thus improving both modeling and theory of meaning, senses, and language change.
Lexicography: Using computational methods, we will advance our understanding of the semantic structures underlying textual data. We will integrate recent advances in computational linguistics into the lexicographic process, transforming it from manual lexicography into a semi-automatic, and empirically-based work flow. This work will be done together with the lexicographic group at the University of Gothenburg that develops the dictionary Svensk ordbok utgiven av Svenska Akademien (“The Contemporary Dictionary of the Swedish Academy”), and will directly improve their workflow.
Advancing Natural Language Processing and Machine Learning: We will extend the state-of-the-art in lexical semantic change with respect to both theoretical and methodological aspects. In addition, we will adapt our methods to be applicable to the needs of HSS. A large focus will be on synchronic variation as a complement to diachronic change, stemming from our work on sense-aware models, and in part on the comparison across contemporary corpora. We will advance the state-of-the-art in NLP in several ways:
We will have four HSS projects within the program. Each collaboration partner brings research question/s and data, and we collaborate around methods to help incorporate expert knowledge and provide answers. These application projects provide us a chance to conduct high-quality research that will benefit both parties, and leave behind tools and methodology useful beyond the scope of the program.
We believe that putting a group of NLP experts in close collaboration with HSS experts, within the field of semantic change, will lead to research results only achievable through collaboration across fields. Secondly, method development is radically improved by close collaboration with fields in which the methods are needed. Thirdly, our research results and methodology are disseminated to relevant fields and serve to set state-of-the-art in terms of research results, and perhaps more importantly, further research methodology.
Our HSS projects investigate (1) radicalization of groups (focused around synchronic variation due to its rapid speed), (2) cultural differences over time (sense-aware diachronic change), (3) how rights, acknowledgement and justice have changed over time in media, legislation, and politics (sense-aware diachronic change), and (4) how the phone, steamer, and electricity changed the society, and how this was reflected in literature (diachronic change and synchronic variation).
The development of methods in NLP is essential to reach our main goal, to integrate our research with, and contribute to, state-of-the-art HSS research. The application projects are at least equally important as the theoretical and methodological development, and a crucial part in driving the methodological development. All parties bring cutting-edge research questions that we can answer under the umbrella of this program.