- Development of a methodology for the lexico-grammatical description of valency
- Computational analysis of valency in modern standard Slovene
Methodology for a machine-readable description of modern Slovene
As part of the project, we developed a methodology for a machine-readable description of valency in modern standard Slovene. The methodology is based on a description of automatically annotated syntactic relations in the Gigafida 2.1 corpus, and on automatically labelled semantic roles within the argument structure of a sentence. For this purpose, we used a set of participant roles and semantic labels, which was modelled on the Czech valency lexicon Vallex (http://ufal.mff.cuni.cz/vallex), and on the semantic-syntactic description of English verbs in the FrameNet project (https://framenet.icsi.berkeley.edu/). The methodology allows for sentence-patterns extraction from a verb phrase. Each pattern is defined by syntactic relations following the JOS model, and is represented by the semantic roles that participants play within the sentence pattern.
- Krek, Simon; Gantar, Polona (2021, in print). Analiza vezljivostnih vzorcev v sodobni standardni slovenščini.
- Gantar, Polona; Štrkalj Despot, Kristina; Krek, Simon; Ljubešić, Nikola (2018). ”Towards semantic role labeling in Slovene and Croatian”. IN: Darja Fišer, Andrej Pančur (ed.): Language Technologies & Digital Humanities: conference proceedings, pg. 93–98. Ljubljana: Ljubljana University Press, Faculty of Arts. http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Gantar-et-al_Towards-Semantic-Role-Labeling-in-Slovene-and-Croatian.pdf.
The methodology of machine description of valency patterns developed in the project was used to create a machine-readable valency lexicon database based on data from the Gigafida 2.1 corpus with automatically annotated labels at the level of syntactic relations and participant roles. The lexicon contains valency patterns for 14,595 verbs based on the JOS syntactic dependency model and on 25 semantic roles (agents, circumstances and participants within the verb phrase) identified in comparable valency lexicons for foreign languages (Vallex, FrameNet, etc.). For all valency patterns in which individual verbs occur, as well as for each participant role that appears in the pattern, statistics are given according to the representation in both the ssj500k training corpus and the Gigafida corpus 2.1. Each sentence pattern in which a particular verb occurs is also assigned at least one example from the GF2.1 corpus and all examples from the ssj500k training corpus. The lexicon, together with the formal description of syntactic structures that appear in sentence patterns, the list of participant roles and the list of identified sentence patterns is available on the Clarin.si repository. The automatically created valency lexicon represents a good foundation for quantitative and qualitative analyses of valency in modern Slovene, the detection of problematic points in automatic language processing, and further improvement of the training set.
- Krek, Simon; Gantar, Polona; Krsnik, Luka; Laskowski, Cyprian; Dobrovoljc, Kaja; Arhar Holdt, Špela; Čibej, Jaka; Kosem, Iztok; Klemenc, Bojan; Robnik Šikonja, Marko; Gorjanc, Vojko (2021). Valency lexicon extracted from the Gigafida 2.1 corpus, Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1418.