Collocations

Goals

  • Description of collocation phrase-level patterns with a selected set of grammatical features
  • Computational analysis of collocations and colligations in modern standard Slovene

Methodology for describing collocability in modern standard Slovene

Based on previous evaluations, which were based on the use of the Sketch Engine, the project upgraded the methodology of automatic collocation extraction with the ability to extract collocation data from a syntactically parsed corpus. The main advantage of this method is that during extraction, in addition to structural data on collocations, it also takes into account syntactic relations within collocation structures based on marked dependency connections in the Gigafida 2.1 corpus and morphosyntactic properties of tokens within source and target structures. The methodology can be used to further extract collocations from multi-level annotated corpora, allowing for constant monitoring of changes in language and their incorporation into language resources.

 

Computer dataset with collocation and colligation data

With the help of the aforementioned methodology, we created the Frequency list of collocations from the Gigafida 2.1 corpus. The list contains collocations with an absolute frequency of 10 and above, divided into files corresponding to 81 predefined syntax structures. The dataset also includes a formal description of syntactic structures with information on constraints and typical representations at the level of syntactic tags and syntactic connections in the corpus. The dataset offers collocation data on modern Slovene and makes for a good starting point for grammatical analyses at the phrase level, while allowing for machine processing and use in language-technological tasks.

  • Krek, Simon; Kosem, Iztok; Gantar, Polona; Arhar Holdt, Špela; Robnik Šikonja, Marko; Klemenc, Bojan; Dobrovoljc, Kaja; Čibej, Jaka; Laskovski, Cyprian; Krsnik, Luka; Gorjanc, Vojko (2021). Frequency lists of collocations from the Gigafida 2.1 corpus, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1415.
  • Kosem, Iztok; Gantar, Polona; Krek, Simon; Arhar Holdt, Špela; Čibej, Jaka; Laskowski, Cyprian; Pori, Eva; Klemenc, Bojan; Dobrovoljc, Kaja; Gorjanc, Vojko; Ljubešić, Nikola (2019). Collocations Dictionary of Modern Slovene KSSS 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1250.

 

Analysis of collocability in modern standard Slovene

The computer database was used for quantitative and qualitative linguistic analysis of collocability in modern Slovene, in which we identified the most productive collocation structures and analyzed the distribution of lexical elements in the most common collocations according to 1) the type of syntactic relation, 2) the way of expressing relations with morphosyntactic means, and 3) the grammatical information of collocation elements. The results of the analysis of grammatical aspects of collocability also enabled the identification of new standardization trends in modern Slovene. They were also used in the analysis of specialized vocabulary from the corpora of computer-mediated communication and school writing.

TOP