Programme and Accepted Papers

The 9th edition of the Slavic NLP Workshop

Registering for Slavic NLP 2023

Invited Talk

Presenters: Nikola Ljubešić, Tanja Samardžić

Title: Together we are stronger: Collaborative development of language resources and technologies for South Slavic languages

Abstract: Developing language resources and technologies for South Slavic languages with small number of speakers and suboptimal socio-economic conditions is a formidable challenge. In this talk, we will share lessons learned during our over-a-decade-long efforts to enhance Croatian and Serbian language resources and technologies. Our success stems from four key factors: 1) prioritising a bottom-up approach over relying on top-down institutional support, 2) using the Web as the primary source of linguistic data, 3) fostering collaboration among researchers working on different South Slavic languages, and 4) keeping abreast of technological advancements that benefit our under-resourced development scenario. To conclude, we will discuss future prospects given the latest technological breakthroughs in language modeling.

The programme is also available here in PDF format.

Time Schedule

9:00 - 9:10 Introduction
9:10 - 10:30 Regular papers I
9:10 - 9:30 - Resources and Few-shot Learners for In-context Learning in Slavic Languages
Michal Štefánik, Marek Kadlčík, Piotr Gramacki and Petr Sojka
9:30 - 9:50 - Information Extraction from Polish Radiology Reports Using Language Models
Aleksander Obuchowski, Barbara Klaudel and Patryk Jasik
9:50 - 10:10 - Dispersing the clouds of doubt: can cosine similarity of word embeddings help identify relation-level metaphors in Slovene?
Mojca Brglez
10:10 - 10:30 - Named Entity Recognition for Low-Resource Languages - Profiting from Language Families
Sunna Torge, Andrei Politov, Christoph Lehmann, Bochra Saffar and Ziyan Tao
10:30 - 11:15 Coffee break
11:15 - 12:45 Regular papers II
11:15 - 11:35 - Too Many Cooks Spoil the Model: Are Bilingual Models for Slovene Better than a Large Multilingual Model?
Pranaydeep Singh, Aaron Maladry and Els Lefever
11:35 - 11:55 - On Experiments of Detecting Persuasion Techniques in Polish and Russian Online News: Preliminary Study [Online]
Nikolaos Nikolaidis, Nicolas Stefanovitch and Jakub Piskorski
12:05 - 12:25 - Can BERT eat RuCoLA? Topological Data Analysis to Explain [Online]
Irina Proskurina, Ekaterina Artemova and Irina Piontkovskaya
12:25 - 12:45 - Automatic text simplification of Russian texts using control tokens
Anna Dmitrieva
12:45 - 14:15 Lunch break
14:15 - 15:15 Invited Talk
Together we are stronger: Collaborative development of language resources and technologies for South Slavic languages
Nikola Ljubešić, Tanja Samardžić
15:15 - 15:20 Shared Task overview
Slav-NER: the 4th Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages
Roman Yangarber, Jakub Piskorski, Anna Dmitrieva, Michał Marcińczuk, Pavel Přibáň, Piotr Rybak and Josef Steinberger
15:20 - 15:45 Pitch presentations: short papers and shared task papers
MAUPQA: Massive Automatically-created Polish Question Answering Dataset [Online]
Piotr Rybak
TrelBERT: A pre-trained encoder for Polish Twitter [Online]
Wojciech Szmyd, Alicja Kotyla, Michał Zobniów, Piotr Falkiewicz, Jakub Bartczuk and Artur Zygadło
Croatian Film Review Dataset (Cro-FiReDa): A Sentiment Annotated Dataset of Film Reviews
Gaurish Thakkar, Nives Mikelic Preradovic and Marko Tadić
Machine-translated texts from English to Polish show a potential for typological explanations in Source Language Identification
Damiaan Reijnaers and Elize Herrewijnen
Target Two Birds With One SToNe: Entity-Level Sentiment and Tone Analysis in Croatian News Headlines
Ana Barić, Laura Majer, David Dukić, Marijana Grbeša-Zenzerović and Jan Snajder
Is German secretly a Slavic language? What BERT probing can tell us about language groups
Aleksandra Mysiak and Jacek Cyranka
Analysis of Transfer Learning for Named Entity Recognition in South-Slavic Languages
Nikola Ivačič, Thi Hong Hanh Tran, Boshko Koloski, Senja Pollak and Matthew Purver
WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition
David Suba, Marek Suppa, Jozef Kubik, Endre Hamerlik and Martin Takac
Measuring Gender Bias in West Slavic Language Models
Sandra Martinková, Karolina Stanczak and Isabelle Augenstein
Exploring the Use of Foundation Models for Named Entity Recognition and Lemmatization Tasks in Slavic Languages
Gabriela Pałka and Artur Nowakowski
Large Language Models for Multilingual Slavic Named Entity Linking
Rinalds Vīksna, Inguna Skadiņa, Daiga Deksne and Roberts Rozis
15:45 - 16:30 Poster session open & Coffee break
16:30 - 17:00 Poster session - part II
17:00 - 18:00 Findings papers
17:00 - 17:20 - Going beyond research datasets: Novel intent discovery in the industry setting [Online]
Aleksandra Chrabrowa, Tsimur Hadeliya, Dariusz Kajtoch, Robert Mroczkowski and Piotr Rybak
17:20 - 17:40 - MLASK: Multimodal Summarization of Video-based News Articles
Mateusz Krubiński and Pavel Pecina
Regular papers III
17:40 - 18:00 - Comparing domain-specific and domain-general BERT variants for inferred real-world knowledge through rare grammatical features in Serbian
Sofia Lee and Jelke Bloem
18:00 End of the workshop