UNCE Workshop May 27, 2026

ABSTRACTS

Czech of native Chinese speakers through multidimensional lens

Adrian Zasina

Czech and Chinese are genetically and typologically distant languages. Czech belongs to inflectional languages, whereas Chinese is an isolating language. These typological differences pose challenges in the acquisition process for both Chinese learners of Czech and Czech learners of Chinese. For the former, mastering grammatical case and the rich morphological system of Czech is particularly demanding. As a result, acquisition among non-Slavic learners tends to progress more slowly at beginner levels, as evidenced by recent empirical research (Nogolová et al., 2023). In this context, it is essential to investigate the non-native Czech of Chinese learners in order to identify its most salient characteristics. Multidimensional Analysis (MDA) provides a methodological framework for this purpose. Originally developed for English and later successfully applied to Czech (Cvrček et al., 2018), MDA examines texts in terms of co-occurring linguistic features associated with particular communicative functions. This approach enables a more comprehensive analysis of learner language than studies focusing on isolated linguistic phenomena.

This study attempts to describe the characteristics of Czech written production by Chinese native speakers from mainland China using the Czech model of multidimensional analysis. The learner data were projected onto an eight-dimensional space and interpreted based on median scores within individual dimensions of variation. They were also compared with texts written by native speakers on the same topic to ensure comparability.

References

Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (2018). From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2018-0020

Nogolová, M., Hanušková, M., Kubát, M., & Čech, R. (2023). Linear Dependency Segments in Foreign Language Acquisition: Syntactic Complexity Analysis in Czech Learners’ Texts. Journal of Linguistics, 74(1), 193–203. https://doi.org/10.2478/jazcas-2023-0037

Cross-Linguistic Influence in Learners’ Plurilingual Repertoires

Silvie Převrátilová

Multilingualism is a defining feature of contemporary higher education, yet little research explores how learners perceive the interactions between their languages. This study, grounded in the frameworks of cross-linguistic influence (CLI), intercomprehension, and perceived positive language interaction, examines how international students in the Czech Republic navigate their plurilingual repertoires. Data were collected from 106 international students across diverse linguistic backgrounds via an online survey. Findings reveal that 75% of participants reported positive cross-linguistic interactions, particularly within typologically related languages (e.g., Romance, Slavic, Germanic). However, some learners also identified beneficial transfer between typologically distant languages, notably Czech and German, due to shared grammatical structures such as case systems. Participants further reported interactions involving three or more languages, demonstrating that plurilingual speakers do not rely solely on binary language comparisons but instead engage in complex, multi-directional linguistic transfer across their entire repertoire. These results highlight the cognitive flexibility of plurilingual learners and underscore the need for language education strategies that explicitly leverage cross-linguistic awareness.

CORVIS ELE: Visegrad Corpus of Spanish as a Foreign Language

Adéla Smažíková

This project seeks to develop the first learner oral corpus of Spanish as a foreign language produced by speakers of Slavic languages. The corpus is based on recordings of spontaneous speech collected from students of Spanish Philology in Poland, Slovakia, and Czechia. These recordings will subsequently be transcribed and processed using a unified methodological framework. The resulting corpus will provide valuable material for investigating the grammatical and pragmatic challenges characteristic of learners of Spanish in the Central European context. We have now completed the recording phase with the participants, and we are currently transcribing the recordings into written form and preparing them for detailed annotation. On this occasion, I would therefore like to outline the recording process and present the initial stage of the transcription procedure. The project leader is Adam Mickiewicz University in Poznań and the project is carried out in cooperation with Charles University and Comenius University in Bratislava.

Views on Authorship in the Context of AI across Domains: Piloting Semi-structured Interviews

Rudolf Rosa

We present our ongoing research on investigating the situation and changes of authorship with the rise of employment of Artificial Intelligence tools into the creative process.
The current phase of the research consists of conducting semi-structured interviews with authors across various fields, focusing on their experiences with AI and their views on AI.
Unlike previous similar projects, our goal is to cover a broad range of creative fields, including literature, translation, research, game development, etc.
We present our research question and initial hypotheses, our methodology and design of the interviews, and preliminary findings from a pilot phase of the interviews.

Strategies for Span Labeling with Large Language Models

Zdeněk Kasner

How can large language models (LLMs) „highlight“ the errors in the text? The question is trickier than it sounds, as – unlike encoder-based models like BERT – decoder-based models have no built-in mechanism to classify the input tokens. We survey existing approaches for span labeling with LLMs and show that most of them are ad hoc strategies that often deliver unsatisfactory performance on downstream tasks. We group these approaches into three families: tagging, indexing and matching. We also introduce a constrained decoding method LogitMatch that fixes a weakness of matching approaches, guaranteeing that every generated span is a verbatim substring of the input. We experimentally evaluate these methods on four diverse tasks: named entity recognition, grammatical error correction, detecting errors in machine translation, and a synthetic pattern matching task. Tagging is the most consistent baseline, but LogitMatch delivers competitive performance while producing shorter outputs. Overall, we find that the methods involve trade-offs and the choice between them is best guided by the specific task.

Aspectual Constraints on the Use of the Accusativus cum Infinitivo in Late Latin

Martina Vaníková

This paper investigates the distribution of present versus perfect infinitives in Accusativus cum Infinitivo constructions in Late Latin, with a focus on the role of verbal telicity.

In Classical Latin, perfect infinitives of telic verbs do not pose any complications, while present infinitives of telic verbs are rare in AcI constructions; when they occur, they do not express simple simultaneity, but rather render new meanings. This pattern aligns with findings from my previous research on telic verbs, which typically convey iterative or conative reinterpretations in combination with imperfective forms. The paper examines how these patterns appear in Late Latin, a period marked by significant changes in infinitival complementation.

Based on preliminary corpus observations, the study explores the hypothesis that telic present infinitives remain disfavoured in AcI constructions, whereas telic perfect infinitives are more stable due to their transparent aspectual value. The research offers a text-based study that provides insights into the interaction of telicity, infinitive forms, and complementation strategies.

SEEM-CZ: Annotation and Classification of Epistemic Markers in Czech

Michal Novák

We present a project focused on linguistic description, annotation and automatic classification of the so-called epistemic markers in Czech. These expressions, such as pravděpodobně ‘probably‘, zřejmě ‘apparently‘ and určitě‘certainly‘, typically operate within the pragmatic domain of language. We introduce a dataset containing manual annotations of the 40 most frequent epistemic markers in Czech, totalling almost 4,000 uses. This annotation was created using parallel InterCorp data (in Czech and English) and the TEITOK tool. We describe the annotation scheme used, the annotation process and data handling.
The dataset forms the core of the emerging lexical database of these expressions (SEEMLex). Thanks to the comprehensive manual annotation, the dataset can also serve as a source of further pragmatic information and can be used as a basis for further linguistic research. The proposed annotation scheme can also be used for other languages.
To demonstrate the dataset’s utility for automatic classification, we trained XLM-RoBERTa classifiers using 10-fold cross-validation, achieving 72.6% accuracy for type of use classification (6 classes) and 54.2% accuracy for degree of certainty classification (4 classes).