The 4th IKCEST “the Belt and Road” International Big Data Competition and the 8th Baidu & Xi’an Jiaotong University Big Data Competition

Won the International Third Prize, 2022

The big data competition has been jointly organized by the International Knowledge Centre for Engineering Sciences and Technology under the auspices of UNESCO (IKCEST), the China Knowledge Centre for Engineering Sciences and Technology (CKCEST), Baidu, and Xi’an Jiaotong University. The Chinese Academy of Engineering, the Ministry of Education, and the Universities Alliance of the Silk Road are also involved. The aim is to identify top talent in the global big data and artificial intelligence sector, particularly talented individuals from Belt and Road countries, in an effort to help the government, industry, and higher education institutions jointly drive the research, application, and development of the big data industry, and cultivate innovative AI talent.

In recent years, the Belt and Road Initiative has brought about increased demand for translation. Facing the great demand of our country, it is important to improve the quality of machine translation in major languages of the “Belt and Road Initiative”. Meanwhile, most of the languages involved in the “Belt and Road Initiative” are low-resource languages; translating those low-resource languages is an internationally recognized challenge and one of the frontiers of research. This competition focuses on the mutual translation between French, Russian, Thai, Arabic and Chinese, and encourages participants to carry out technological exploration, focusing on the aspects of data, model structure and training methods, aiming at promoting technology development and serving current needs.

Machine translation systems are often trained using sentence-aligned parallel corpora, which allows the translation model to learn the mapping relationships and language knowledge between languages, thus enabling translation between different languages. The training data provided in this competition are four sets of parallel sentence pairs (Chinese-French, Chinese-Russian, Chinese-Thai and Chinese-Arabic), and there is no restriction on participants using external bilingual or monolingual data. The test set involves eight directions from Chinese to French, French to Chinese, Chinese to Russian, Russian to Chinese, Chinese to Thai, Thai to Chinese, Chinese to Arabic and Arabic to Chinese.