The Bulgarian National Corpus
Department of Computational Linguistics
Department of Bulgarian Lexicology and Lexicography
Period: 2017 – 2019
Type of project: collective, long-term
Funding: budgetary (BAS)
Principal Investigator: Prof. Svetla Koeva
Participants: Prof. Svetla Koeva, Assist. Prof. T. Dimitrova, Assist. Prof. S. Leseva, Assist. Prof. M. Todorova, Ivelina Stoyanova, B. Rizov, L. Dzhakov, M. Yalamov.
Abstract:
The project aims to develop further the Bulgarian National Corpus (BulNC) by expanding its contents and improving its representativeness, balance and accessibility for linguistic research and lexicographic work on the vocabulary of the Bulgarian language. For the purposes of further expansion of BulNC (including the parallel multilingual corpora that are part of the corpus), automatic identification and collection of relevant documents. An important direction in improving the BulNC is the construction of a model of taxonomic classification for organising the documents to allow collection and classification of new types of texts and easy restructuring of the corpus. The automatic linguistic annotation of BulNC is an ongoing task. For the purposes of lexicographic research, the project employs a methodology for selecting corpus samples to be used in the top