Event

Last Fridays Talks: Speech and Language

Featured image

Location

Date

Type

Organizer

Last Fridays Talks 

Each last-Friday-of-the-month, we are hosting the Last Fridays Talks, where one of our seven Collaboratories will present insights from their current work. Join us for a discussion on results and recent papers, followed by some socializing afterwards for everyone who wish to attend. 

 

 

Talk 1

TBA

 

Speaker 

Finn Årup Nielsen

 

Talk 2

AI ‘News’ Content Farms Are Easy to Make and Hard to Detect

 

Abstract

Large Language Models (LLMs) are increasingly used as “content farm” models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. I present the results of a case study in Italian, showing that it is possible to produce news-like texts that native speakers of Italian struggle to identify as synthetic, with only 40K Italian news texts from a public dataset and the first-generation Llama model. At the same time, detecting such texts in the wild is nearly impossible with either methods that rely on token likelihood information or supervised classification. This talk highlights the need for more research on the problem of synthetic text detection, and the ongoing changes in the information ecosphere of the open web.

 

Speaker 

Anna Rogers

 

Bio

Anna Rogers is an Associate Professor in the Computer Science Department at the IT University of Copenhagen. Anna Rogers holds a PhD degree in Computational Linguistics from the University of Tokyo, followed by postdocs in Machine Learning for Natural Language Processing (University of Massachusetts) and in social data science (University of Copenhagen). Her main research area is analysis and evaluation of pre-trained language models. She currently serves as an editor-in-chief of ACL Rolling Review, the peer review platform for all major NLP conferences of the Association for Computational Linguistics.