Event
Last Fridays Talks: Speech and Language
Location
Date
Type
Organizer
Last Fridays Talks
Each last-Friday-of-the-month, we are hosting the Last Fridays Talks, where one of our seven Collaboratories will present insights from their current work. Join us for a discussion on results and recent papers, followed by some socializing afterwards for everyone who wish to attend.
Talk 1
Lexicographic data in Wikidata
Abstract
Wikidata has a lexicographic part that currently describes over 1.3 million lexemes across close to 1.300 languages. It records both lexical forms and senses and link to the rest of Wikidata as well as Wikimedia Commons for image and audio media files. In this talk I will present work on the Wikidata lexeme and tools that I have developed to aggregate and present information from the wiki using live SPARQL queries. The tools include lexeme linking and simple games.
Speaker
Bio
Finn Årup Nielsen is an Associate Professor at DTU Compute, Technical University of Denmark (DTU). He has a PhD from DTU working with Neuroinformatics and done a postdoc at the Neurobiology Research Unit, Rigshospitalet. His area of research is knowledge graphs and natural
language processing.
Talk 2
AI ‘News’ Content Farms Are Easy to Make and Hard to Detect
Abstract
Large Language Models (LLMs) are increasingly used as “content farm” models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. I present the results of a case study in Italian, showing that it is possible to produce news-like texts that native speakers of Italian struggle to identify as synthetic, with only 40K Italian news texts from a public dataset and the first-generation Llama model. At the same time, detecting such texts in the wild is nearly impossible with either methods that rely on token likelihood information or supervised classification. This talk highlights the need for more research on the problem of synthetic text detection, and the ongoing changes in the information ecosphere of the open web.
Speaker
Bio
Anna Rogers is an Associate Professor in the Computer Science Department at the IT University of Copenhagen. Anna Rogers holds a PhD degree in Computational Linguistics from the University of Tokyo, followed by postdocs in Machine Learning for Natural Language Processing (University of Massachusetts) and in social data science (University of Copenhagen). Her main research area is analysis and evaluation of pre-trained language models. She currently serves as an editor-in-chief of ACL Rolling Review, the peer review platform for all major NLP conferences of the Association for Computational Linguistics.