Words change their meanings over time, but tracking these changes has traditionally required painstaking manual analysis by linguists. In recent years, researchers have been using computational models to automatically detect when semantic change happens, and how much of a change has occurred. Recent research led by Associate Professor Nina Tahmasebi and her colleagues in the Change is Key! program introduces innovative computational methods for detecting qualitative features of semantic change, opening new possibilities for understanding language evolution at scale. More
Language is constantly evolving. Words can acquire new meanings, lose old ones, or shift their meaning entirely over time. For instance, the word ‘nice’ originally meant ‘foolish’ but has completely lost this meaning in modern English. Similarly, the word ‘mouse’ has expanded from referring to a small rodent to also describing a computer input device. These changes in word meanings, known as semantic change, reflect the dynamic nature of language as it adapts to new developments.
Traditionally, tracking these semantic changes required extensive manual work by linguists, who would carefully examine texts to document how word meanings evolved. This approach, while thorough, was limited by the amount of text a person could reasonably analyse and was often restricted to studying a small number of words or specific time periods.
Since 2008-2009, researchers have been developing computational methods to automatically detect semantic change by analysing large collections of digital texts, known as corpora. These methods leverage advanced artificial intelligence techniques to process vast amounts of historical corpora, allowing researchers to track meaning changes across thousands of words simultaneously.
A significant challenge in this computational approach has been finding effective ways to represent and compare word meanings across different time periods. Early methods relied on analysing how words appeared together in texts, assuming that words with similar meanings would appear in similar contexts. For example, they might notice that ‘gay’ appeared near words such as ‘happy’ and ‘cheerful’ in older texts, but near different words in modern texts. While these approaches showed promise, they were not effective.
As modern text-based AI methods were developed, words were instead represented by vectors created using static embedding models. Despite being more effective, the majority of the methods were tested only on the ability to say if a word had changed or not, conflating all kinds of change into one. They often struggled to capture the nuanced ways in which words can change their meanings over time, often missed subtle meaning changes, and couldn’t distinguish between different types of semantic change.
Since 2018, and the introduction of contextual models like BERT and RoBerta, we are better able to model fine-grained word meanings. And with the introduction of large language models – sophisticated AI systems trained on vast amounts of text data that have the ability to generate text – we have taken yet a further step.
Recent work by Professor Tahmasebi and her colleagues in the Change is Key! Research program extends on an innovative solution: using state-of-the-art artificial intelligence to generate dictionary-like definitions for words based on how they are used in context. This approach bridges the gap between traditional lexicography (or dictionary writing) and modern computational methods. It combines the interpretability of traditional dictionary definitions with the large-scale analysis of artificial intelligence. Their system can process millions of word uses across different time periods, generating clear, human-readable definitions for each instance.
The researchers achieved this by fine-tuning large language models. This fine-tuned model, which they named LlamaDictionary, can look at a word in a sentence and generate a precise, dictionary-style definition that captures its meaning in that specific context. For example, when given the sentence “This food revitalised the patient,” the system can generate the definition “give new life or energy to” for the word “revitalise.” When shown “The computer virus infected the system,” it correctly defines ‘virus’ as a harmful computer program, rather than a biological pathogen.
What makes this new approach particularly powerful is its ability to generate definitions that capture the specific sense in which a word is being used, rather than just providing all possible meanings. This is crucial for studying semantic change, as words often maintain multiple meanings simultaneously, with different senses becoming more or less prominent over time. The researchers demonstrated this capability through extensive testing, showing that their system could correctly identify and define different senses of words such as ‘bank’ (financial institution vs. river edge) with remarkable accuracy.
The researchers demonstrated two distinct ways to use generated definitions to study semantic change. The first approach directly compares definitions across time periods using embedding models – AI systems that can measure how similar two pieces of text are to each other. By converting definitions into mathematical representations called embeddings, the system can automatically measure how similar a word’s definitions are between different time periods. If the definitions become very different, this suggests semantic change has occurred. For instance, the system can track how the word ‘web’ evolved from primarily meaning a spider’s creation to also referring to the internet.
The researchers then used the generated definitions to determine the type of semantic change as well. Generated definitions are used to classify different types of semantic change according to established linguistic categories. These categories, first outlined by linguist Andreas Blank in 1997, include several distinct types of meaning change. “Generalisation” occurs when a word’s meaning becomes broader over time – for instance, the word “paper” originally meant specifically the papyrus plant but now refers to any material used for writing. “Specialisation” is the opposite process, where meaning becomes more specific – like how “meat” once meant any kind of food but now refers specifically to animal flesh.
Another category is “co-hyponymous transfer,” where a word takes on the meaning of a related concept – for example, in some languages, the word for “rat” has come to also mean “mouse.” Perhaps most intriguingly, some words develop opposite meanings over time, a phenomenon known as “auto-antonymy.” The English word “fast” demonstrates this – it can mean both “moving quickly” and “fixed firmly in place.”
The researchers’ computational system can automatically classify these different types of semantic change by analysing the relationships between generated definitions. In their experiments, they tested the system on words with known historical changes. This represents a significant advance over previous computational methods, which could typically only detect that a change had occurred without identifying its specific type. For example, when given the word “arrive” in texts from different time periods, the system not only detected the change in meaning but correctly classified it as an instance of generalisation.
Also innovative is the model’ use of “synchronic” relationships – meaning relationships between words senses that exist at the same time – to understand “diachronic” change, or how words change over time. The system learns to recognize different types of meaning relationships by studying modern dictionary definitions, then applies this knowledge to classify historical meaning changes. For instance, by understanding how modern words with broader and narrower meanings relate to each other (such as “animal” and “dog”), the system can identify similar patterns in historical data.
A particularly impressive impact of these works is the ability to explain models’ decisions. Unlike previous “black box” approaches that couldn’t clarify why they classified a change in a certain way, these systems provide clear definitions and explanations for each classification. When identifying a case of specialisation, for instance, it can show how the definitions became more specific over time, making the reasoning process transparent to researchers who are studying meaning change.
This research opens new possibilities for studying language change at unprecedented scales. These systems have the potential of enabling us to analyse thousands of words across multiple languages and time periods, identifying patterns of change that might be invisible to human researchers.
This groundbreaking research represents a significant step forward in our ability to understand how language evolves over time. Professor Tahmasebi and her colleagues have created powerful systems that can detect semantic change automatically and explain the ways in which words change their meanings. As our language continues to evolve rapidly, tools like these will become increasingly valuable for tracking and understanding these transformations.