Project “Infrastructure for Fine-tuning Pre-trained Large Language Models”
![]() |
![]() |
![]() |
ANNOUNCEMENT
The Institute for Bulgarian Language signed Grant Agreement No. ПВУ – 55 from 12.12.2024 /BG-RRP-2.017-0030-C01/ as part of Procedure BG-RRP-2.017: Funding of research projects in the field of green and digital technologies of the Recovery and Resilience Facility, implementing Investment C2I2 Increasing the Innovation Capacity of the Bulgarian Academy of Sciences (BAS) in the fields of Green and Digital Technologies as part of the Recovery and Resilience Plan.
Title of the project
Infrastructure for Fine-tuning Pre-trained Large Language Models
Grant Agreement
ПВУ – 55 from 12.12.2024 /BG-RRP-2.017-0030-C01/
Start date
12/12/2024
End date
30/05/2026
Duration
17.5 months
Place
Sofia, Bulgaria
Total budget
BGN 437,446.38
European funding
BGN 437,446.38
National funding
BGN 0.00
Percentage of EU funding
100%
Summary
The project aims to develop a freely accessible infrastructure for the selection and pre-processing of large datasets for Bulgarian as well as tailored data for specific industries and fine-tuning suitable freely available large language models for specific purposes.
Objectives of the project
- To provide a detailed description of the characteristics of large language models and a specification of the criteria for their evaluation, comparison and selection.
- To develop an infrastructure component for the collection, filtering, anonymisation and reduplication of large, diverse and high quality text data for Bulgarian.
- To develop an infrastructure component for the fine-tuning of pre-trained large language models for Bulgarian.
Expected results of the project
- Technologies for fine-tuning large language models capable of integrating long-term general knowledge and inferring meaning.
- Technologies for the collection of clean, non-toxic data that is free of duplicates and personally identifiable information.
- Adoption of innovative language technology solutions that provide artificial intelligence applications and services for businesses in Bulgaria.
Project web page
This Announcement is part of the planned measures for communication and dissemination within the project Infrastructure for Fine-tuning of Pre-trained Large Language Models, Grant Agreement No. ПВУ – 55 from 12.12.2024 /BG-RRP-2.017-0030-C01/.