Event

Last Fridays Talks: Speech and Language

Location

Seminar Room, Pioneer Centre for AI, Østre Voldgade 3, 1350 København K

Date

25 Oct 2024

14:00 - 15:00

Type

Talk

Organizer

The Collaboratory of Speech and Language

Last Fridays Talks

Each last-Friday-of-the-month, we are hosting the Last Fridays Talks, where one of our seven Collaboratories will present insights from their current work. Join us for a discussion on results and recent papers, followed by some socializing afterwards for everyone who wish to attend.

Talk 1

Lexicographic data in Wikidata

Abstract

Wikidata has a lexicographic part that currently describes over 1.3 million lexemes across close to 1.300 languages. It records both lexical forms and senses and link to the rest of Wikidata as well as Wikimedia Commons for image and audio media files. In this talk I will present work on the Wikidata lexeme and tools that I have developed to aggregate and present information from the wiki using live SPARQL queries. The tools include lexeme linking and simple games.

Speaker

Finn Årup Nielsen

Bio

Finn Årup Nielsen is an Associate Professor at DTU Compute, Technical University of Denmark (DTU). He has a PhD from DTU working with Neuroinformatics and done a postdoc at the Neurobiology Research Unit, Rigshospitalet. His area of research is knowledge graphs and natural
language processing.

Talk 2

AI ‘News’ Content Farms Are Easy to Make and Hard to Detect

Abstract

Large Language Models (LLMs) are increasingly used as “content farm” models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. I present the results of a case study in Italian, showing that it is possible to produce news-like texts that native speakers of Italian struggle to identify as synthetic, with only 40K Italian news texts from a public dataset and the first-generation Llama model. At the same time, detecting such texts in the wild is nearly impossible with either methods that rely on token likelihood information or supervised classification. This talk highlights the need for more research on the problem of synthetic text detection, and the ongoing changes in the information ecosphere of the open web.

Speaker

Anna Rogers

Bio

Anna Rogers is an Associate Professor in the Computer Science Department at the IT University of Copenhagen. Anna Rogers holds a PhD degree in Computational Linguistics from the University of Tokyo, followed by postdocs in Machine Learning for Natural Language Processing (University of Massachusetts) and in social data science (University of Copenhagen). Her main research area is analysis and evaluation of pre-trained language models. She currently serves as an editor-in-chief of ACL Rolling Review, the peer review platform for all major NLP conferences of the Association for Computational Linguistics.

Collaboratories

SL
Speech and Language
Led by Isabelle Augenstein and Christian Hardmeier

View all Events

Event

Last Fridays Talks: Speech and Language

Location

Date

Type

Organizer

Last Fridays Talks

Talk 1

Abstract

Speaker

Bio

Talk 2

Abstract

Speaker

Bio

Collaboratories

SL

Speech and Language