Home » Projects » Past projects » The Bulgarian National Corpus

The Bulgarian National Corpus

Department of Computational Linguistics

Department of Bulgarian Lexicology and Lexicography

Period: 2017 – 2019

Type of project: collective, long-term

Funding: budgetary (BAS)

Principal Investigator: Prof. Svetla Koeva

Participants: Prof. Svetla Koeva, Assist. Prof. T. Dimitrova, Assist. Prof. S. Leseva, Assist. Prof. M. Todorova, Ivelina Stoyanova, B. Rizov, L. Dzhakov, M. Yalamov.

Abstract:

The project aims to develop further the Bulgarian National Corpus (BulNC) by expanding its contents and improving its representativeness, balance and accessibility for linguistic research and lexicographic work on the vocabulary of the Bulgarian language. For the purposes of further expansion of BulNC (including the parallel multilingual corpora that are part of the corpus), automatic identification and collection of relevant documents. An important direction in improving the BulNC is the construction of a model of taxonomic classification for organising the documents to allow collection and classification of new types of texts and easy restructuring of the corpus. The automatic linguistic annotation of BulNC is an ongoing task. For the purposes of lexicographic research, the project employs a methodology for selecting corpus samples to be used in the top