There is no doubt that libraries and museums have an incalculable value safeguarding world historical finding, among them, texts and manuscripts that have only been photographed to be enjoyed and exhibited, either in their headquarters or on web portals, however, even greater is the amount of documents that have never been interpreted or read until now, a new technology is being developed to understand historical facts of our planet.
Millions of these ancient texts are stored, even in monasteries, all over the world, so much is the magnitude of texts, that only the Library of the Abbey of St. Gallen, in Switzerland, houses about 160,000 volumes of literary and historical manuscripts dating back to the eighth century, all handwritten, on parchment, in languages rarely spoken today.
Today, we are one step closer to interpreting and learning the secrets contained in these ancient documents, thanks to researchers at the University of Notre Dame, who have presented a system based on an artificial neuron, still under development, capable of reading complex ancient writing, relying on human perception to improve deep learning transcription capabilities.
One of those responsible for the project is Walter Scheirer, an associate professor at the Dennis O. Doughty Collegiate in the Department of Computer Science and Engineering at Notre Dame, who stated "We are dealing with historical documents written in styles that have long gone out of fashion, going back many centuries, and in languages such as Latin, which are hardly used anymore," which is why this system goes beyond digitizing the ancient manuscript, We can get beautiful pictures of these materials, but what we have proposed is to automate the transcription in a way that mimics the perception of the page through the eyes of the expert reader and provides a quick and searchable reading of the text".
The artificial intelligence system works by combining traditional machine learning methods with visual psychophysics, which measures the connections between physical stimuli and mental phenomena, such as the time it takes an expert reader to recognize a particular character, gauge the quality of handwriting or identify the use of certain abbreviations, Scheirer told the Institute of Electrical and Electronics Engineers' Transactions on Pattern Analysis and Machine Intelligence.
Scheirer's team relied on digitized Latin manuscripts written in the Cloister of St. Gallen in the 9th century, entered their manual transcriptions into a specially designed software interface. The team then measured reaction times during transcription to learn which words, characters and passages were easy or difficult. Explained Scheirer, that the inclusion of that kind of data created a network more consistent with human behavior, as well as reducing errors, providing a more accurate and realistic reading of the text.
"It's a strategy not typically used in machine learning," Scheirer explained. "We're labeling the data through these psychophysical measurements, which come directly from psychological studies of perception, taking behavioral measurements. We then inform the network of common difficulties in the perception of these characters and can make corrections based on those measurements."
"There's a difference between just taking the pictures and reading them, and having a program that provides a searchable reading," said Hildegund Müller, associate professor in Notre Dame's Department of Classics; these are texts that so far have not been released to the public.
The team is still overcoming challenges, Scheirer said, as the system is constantly being improved, enhancing the accuracy of transcriptions, paying attention when a document is damaged or incomplete, as well as when documents are accompanied by images that can confuse the system.
It is a fight against the clock, unfortunately languages disappear every day, but the team is working non-stop to incorporate translations into the culture of our times.
07 de Septiembre, 2021