Information on scientific and (or) scientific-technical projects of grant financing by Ministry of Science and Higher EducationRepublic of Kazakhstan

1) Grant financing Grant financing of scientific and (or) scientific-technical projects for 2024-2026 (implementation period 36 months)

Project summary

Project title

ИРН AP22686434 «Building a multilingual text corpus that establishes refrain relationships with name groups»

Project manager

Kalman Gulzhan

Implementation period

2024-2026

Amount of funding

25 272 081 tenge.

Topicality

In the field of computational linguistics, this is due to the availability of a large amount of data in various languages, including Kazakh, developed by artificial intelligence, the latest methods of neural networks and prepared specifically to solve problems in natural language, automatic text analysis, although many of these resources are created for the English language and have very few corpuses with reference designations.

Perfect resources, first of all, researchers to create samples of the national language, national dictionaries on topics, scientific development, the application of methods necessary to conduct linguistic research of the latter provides data in the Kazakh language.

The methods developed for obtaining designated objects and establishing referential relations can have an independent value and contribute to the world science, since they are based on the use of parallel corpora and the integration of various machine learning methods on their basis and linguistic approaches taken into account. In particular, it is planned to compare different typologies of ways of referring to multiple languages (i.e., referring to previously mentioned objects in a text).

The peculiarity of the approach lies in the fact that the proposed in the project: for the first time develops a unique resource of automatic text processing of polylingual, aimed at solving important issues, referencing is carried out comparative analysis of solution methods based on classical linguistic approaches and machine learning methods, combined with respect to the latter in the Kazakh language.

Objective

The goal of this project is to create a multilingual resource to support national research in computational linguistics and automatic text processing.

Project outcomes

The main output of the project will be a multilingual text corpus created with labeled specified objects and references.