Data involving angles can be found across a diverse array of scientific fields, but so far, the mathematical tools used to study them have often proved insufficient to detect the complex relationships between different angles within large datasets. Through its research, a team consisting of Professor Christophe Ley and Sophia Loizidou from the University of Luxembourg, Professor Shogo Kato from the Institute of Statistical Mathematics in Tokyo, and Professor Kanti Mardia from the University of Leeds, has developed a new model which overcomes many of these challenges: allowing the researchers to study relationships between three angles at once, as well as mixtures of angles and classical measurements on the line. More
From biochemistry to climate science, many different fields of research work with data involving angles, and the relationships between those angles. For example, while a biochemist may need to consider how the deeply complex, 3D shapes of protein molecules are affected by atomic-scale bond angles, climate scientists must constantly keep track of the interacting directions of wind and waves across the Earth’s oceans.
This type of ‘angular data’ behaves differently from regular data. When measuring angles, it is crucial for researchers to consider how they wrap around – meaning any two points separated by an angle of 360° are essentially the same point.
Compared with more traditional statistics, the circular nature of this type of data adds a layer of complexity to scientific analysis. The complexity deepens when considering how some measurements involve a mixture of angles and straight-line measurements, leading to what researchers call ‘cylindrical data’. More challenging still are situations where scientists need to track relationships between three angles, all at once.
While mathematical techniques now exist to handle data involving one or two angles, three-angle datasets have proven to be more difficult to model accurately. So far, this has meant that existing models are challenging to apply to real angular data, and offer results that are difficult for researchers to interpret.
Through their research, Ley and colleagues have developed a new statistical model which could make it easier to study three-angle data. Named the ‘trivariate wrapped Cauchy copula’, the team’s model offers researchers a useful new way to analyse and predict complex, multi-angle data with greater accuracy and flexibility. Their approach is simple yet powerful, offering features that solve many of the common issues faced by researchers when dealing with angular datasets.
The team’s model is built on a mathematical technique known as a ‘copula’. In basic terms, a copula is a way to understand relationships between different variables – in this case, three different angles. By designing a model specifically for three-angle data, the researchers addressed a critical gap in existing methods, which can mostly handle one or two angles.
One of the biggest advantages of the model is its flexibility: allowing researchers to adjust the degree of connection between each pair of angles. With this level of control, the researchers were able to show how complex relationships can be detected far more easily within complex angular datasets.
To test their model, Ley and his colleagues applied it to two specific datasets, both containing extensive quantities of real angular data. The first dataset consisted of sequences of dihedral and torsion angles of proteins. The building blocks of proteins are the amino acids, which consist of the backbone and the sidechains. The backbone consists of chemical bonds which can rotate around their axis and the angles formed are referred to as the dihedral angles.
An important part of the protein structure prediction problem consists of understanding the behavior of the dihedral angles. The goal of this analysis is to complement the work of Alphafold, artificial intelligence-based software for protein structure predictions, whose developers received the 2024 Nobel Prize in Chemistry. For their next test, the researchers used their model to analyse an extensive series of measurements of ocean waves. Recorded by a buoy off the coast of Italy, these half-hourly measurements included both the directions and heights of passing waves – a clear example of a cylindrical dataset.
But to fully model the impacts waves can have on their surrounding environments, it is often crucial for researchers to understand how they are affected by the local wind direction. This adds a new angle to the equation, creating a ‘hyper-cylinder’ which can’t be easily accounted for in existing models.
Again, the team’s approach could account for these interacting angles, and model them accurately. For environmental and climate scientists, this level of insight could help to deepen their understanding of many different environmental factors: including the drift of oil spills, coastal erosion, and the design of offshore structures.
In the deeply complex world of data analysis, models are only useful if they are both accurate and easy to use. Based on their results, the researchers are now confident that their trivariate wrapped Cauchy copula model strikes just the right balance between these two qualities: offering a user-friendly approach to understanding complicated angular data.
Their work so far has only scratched the surface of the possibilities their model presents. Complex angular data can be found across numerous other areas – as wide-ranging as the movements of animals, the circadian rhythms of the body, and the times during the day where crimes are most likely to be committed.
Kato, Ley, Loizidou and Mardia are now hopeful that their model could soon see widespread application across all of these areas and more. If the results of these studies are as successful as the tests they have made so far, they could ultimately help researchers to spot patterns within their angular datasets which would have once been invisible.