Multilingual Resources for CEF.AT in the Legal Domain

Department of Computational Linguistics

Period: 01. 10.2018 – 31.03.2021 (extended by six months – protocol from a meeting of the Science Council from 21.12.2020)

Type of project: collective, international

Partners: Section of Linguistics and Literary Scholarship, Hungarian academy of sciences; University of Zagreb, Croatia, Faculty of Humanities and Social Sciences; Institute of Computer sciences, Polish Academy of Sciences; Research Institute for Artificial Intelligence, Romanian Academy; Ľ. Štúr Institute of Linguistics of the Slovak Academy of Sciences; Institut Jožef Stefan, Slovenia

Funding: Innovation and Networks Executive Agency (INEA), Management Centre Europe (МСЕ)

Participants: prof. Svetla Koeva PhD (until 30.06.2020), prof. Tinko Tinchev PhD, assist. prof. Valentina Stefanova PhD, assist. prof. Tsvetana Dimitrova PhD, assist. prof. Dimitar Georgiev PhD (until 18.02.2019), Martin Yalamov, Valeri Kostov (from 25.02.2019 to August 2019), Nikola Obreshkov (from September 2019)

Annotation:

The main goal of the project (marcell-project.eu) is the development of a sustainable infrastructure for collecting and semantic processing of documents from the national law-making (laws, decrees, regulations etc.) of Bulgaria, Poland, Romania, Slovakia, Slovenia, Hungary and Croatia in order to help learning of current systems for automatic translation. The specific tasks of the team from The Institute for bulgarian language include: creating an infrastructure for automatic gathering, preliminary processing and linguistic annotation of documents from the national law-making of Bulgaria; semantic segmentation of the data; interlingual semantic comparison between difference by size from texts (words, phrases, sentences, paragraphs).
The results from the project are designed for the systems of the platform for automatic translation of the Mechanism for connecting Europe (CEF.AT). The quality of the automatic translation depends from the learning of the systems for translation on the basis of large quantity of translation documents from a specific field. The value of the automatic translation increases more and more with the increase of the economical, political and cultural relations between the different (european) states.
Project is part of the priority direction of the Institute of the Bulgarian language Electronic Language Resources and Tools for their Processing.

Results: infrastructure for automatic gathering and semantic processing of multilingual documents; linguistic processed and semantically connected data from the field of law-making on seven european languages.