Enhancing Multilingual Language Resources with Derivationally Linked Multiword Expressions

Department of Computational Linguistics

Period: 2018 – june 2021 (extended by six months – protocol No 26 from a meeting of the Science Council from 04.12.2020)

Type of Project: collective, international

Partner: Research Institute for Artificial Intelligence, Romanian Academy

Funding: international exchange for bilateral agreements of BAS

Principal Investigator: assist. prof. Svetlozara Leseva PhD (from bulgarian side)

Participants: assist. prof. Svetlozara Leseva PhD, assist. prof. Tsvetana Dimitrova PhD, assist. prof. Maria Todorova PhD, assist. prof. Valentina Stefanova PhD, Ivelina Stoyanova PhD, from romanian side: sen. Researcher II deg. Verginica Mititelu (Principal Investigator), Radu Ion PhD, Elena Irimia PhD, Tiberiu Boros PhD, Sonia Pipa PhD, Maria Mitrofan PhD


The goal of the project is the recognition and the description of derivationally linked multiword expressions, mainly nouns and their derivationally linked multiword phrases (break the heart > breaking the heart; heartbreaker) including the nominal and adjective groups with participles and etc. More specifically these tasks are included: describing the Derivationally Linked Multiword Expressions, their derivatives and the relations between them in linguistic resources (Bulgarian and Romanian wordnet); automatic finding of such units in corpuses for the objectives of different applications. The description of the Derivationally Linked Multiword Expressions allows them to create more accurate picture for their usage and their distribution also to create derivation of semantic links and automatic recognition of formally different, but close by semantics and syntactically comparable structures.

The project is developed as a part of the priorirty scientific direction of the Institute for Bulgarian language „Electronic Language Resources and Tools for their Processing“.

Results: enriched content of the Bulgarian wordnet (by describing the derivationally linked multiword expressions), corpuses with annotated Derivationally Linked Multiword Expressions; developed system for automated recognition and annotation of the derivationally linked multiword expressions; thematically connected research papers and articles.