Tajik-to-Persian writing systems converter
 
   
ТаджикскийRussian
 
       
 
Электронные редкие книги

Kind time of the day, dear visitors!

You are on page of developers of the project of Tajik-to-Persian writing systems coversion. Here the basic information about aims and tasks of this systems development are described.

With questions, and also with wishes and offers you can write to us on e-mail, or write your messages in guest book.

 
       

Developers team:

Наверх
 
  Science supervision (mathematical sciences):   Science supervision (philological siences):

Усманов Зафар Джураевич

 
Iskandarova Diloro

Academician of the Republic of Tajikistan, Doctor of Physical and Mathematical Sciences, Professor

Usmanov Zafar Juraevich

 

Doctor of Philological Sciences, Professor

Iskandarova Diloro Mukaddasovna

Mathematical and algorithmical researching:   Linguistical and informational researching:

Гращенко Леонид Александрович

 
Фомин Алексей Юрьевич

applicant of the Institute of Mathematics, Academy of Sciences of the Republic of Tajikistan,

Graschenko Leonid Alexandrovich

 

applicant of the Institute of Mathematics, Academy of Sciences of the Republic of Tajikistan,

Fomin Alexey Yrievich

 
Historical background: Наверх
 
Historical events of XIX - XX centuries in Central Asia and the Middle East led to a large group of Persian-speaking nations became divided under the new state formations - Iran, Afghanistan and Tajikistan. In the linguistically unified Persian language as well has been divided into three languages - Persian (Farsi) within Iran, Dari in Afghanistan and Tajik - firstly on the part of the territory of the former Russian Empire, then the USSR, and now - in a sovereign state of Tajikistan and Uzbekistan, and some areas of Kyrgyzstan.
Over the past historical period, the Persian language proved relatively resilient to external shocks, while Dari was influenced by the English language, from which it was borrowed many words and terms. However, these two languages have preserved one of the main components of their identity - a letter based on the Arabic alphabet.
It is difficult to place and multistage formation of the modern Tajik language. The first stage was determined by entering the territory of Central Asia, including densely populated by Tajiks, in the Russian Empire. At that stage, the Tajik people continued to speak a language entirely similar to the Persian script and use the Arabic script. Until the end of XIX century the Tajik-Persian language was circulated as an official language of correspondence, records and proceedings in the Muslim principalities.
The second stage came in 1929 when the Tajik language first shock of the transition to writing based on Latin. The attempted reform has not yielded satisfactory results, however, gave impetus to the further divergence of the Persian language.
In 1940 began another transition to the new script, now in the Cyrillic-based, in some sense objectively due to a high rate of socialist construction, the need for expansion of education, science, literature and art. This stage is characterized by initiation of Tajiks in Russian culture, and through it the culture of the Soviet peoples and world civilization. Awakened from centuries of torpor, the Tajik people successfully involved in the scientific and technical progress and at the same time more and more divorced from its great historical heritage, passing, and isolating from their Persian counterparts. The fundamental difference between writing in Arabic script and the Cyrillic alphabet has led to the fact that for the masses of Iran and Afghanistan were not available or the historical experience of the Tajiks, nor their achievements during their stay in the USSR. In turn, the Tajiks, the weight, lost the opportunity to perceive the information contained in the printed products of these countries.
For these reasons, it seems quite clear attempt to revive the Tajik public uniqueness of his language in the way of its convergence to the ancient Iranian language tree. Strong support for such ventures has Act 1999 the Supreme Council of the Republic "On Language", in which the assigned state status of the Tajik language. Thus, the fourth stage in the evolution of the Tajik language, started with the proclamation of independence of Tajikistan in 1991, the year, is aimed at the rapprochement with the languages of the fraternal peoples of Iran and Afghanistan, especially through careful reform currently used scripts. Mindful of the grave consequences of the historical experiments, the leadership of the Republic with the utmost care shall transformation in this area. This applies particularly to the recent clarification of the Cyrillic alphabet, which are derived from 4 letters, the pronunciation of which is not characteristic of the Tajik language. Attempts are made to replace some Russian terms own national or borrowed from the Persian language.
Inevitable in a historical perspective convergence Persian-speaking nations will require an intensification of documents, providing economic, cultural and scientific cooperation, regulatory framework, official correspondence, scientific and technical exchange. With the development of international telecommunications and space, in particular the Internet, the differences in the written schedule can become a serious constraint for electronic communication between the citizens of those countries that fuel the popularity of the ideas of transition in Tajikistan on the Persian schedule, and Iran, perhaps - to the characters based on the Latin alphabet.

 

Aims and tasks of project: Наверх
 

Purpose: to increase the level of the Tajik-Persian intercultural interaction through the introduction of effective means of texts conversion, as well as the creation of preconditions for the development of Persian-Tajik-Russian machine translation systems.

The scientific task: to develop of mathematical and algorithmic basis for a promising automated conversion of texts from Tajik to Persian language.

 

Current achievements: Наверх
 

To date, developed a prototype automated system for the Tajik-Persian writing systems converting , which implements 95% conversion accuracy of texts.

The main functions implemented in the system:

  • pre-correction of input text to align with the resulting language - Farsi or Dari;
  • automatic recognition of proper names and abbreviations, transliteration of foreign names
  • optional deciphering of abbreviations;
  • indication uncertain converted words;
  • optional conversion of numbers and office symbols;
  • optional indication of diacritical marks;
  • analysis and a separate conversion of borrowings from Arabic and European languages.

Below screenshot developed system:

Developed software product registered by the National Patent Information Center of Ministry of Economic Development and Trade of the Republic of Tajikistan as an intellectual product № 091TJ from 16.03.2009.

Свидетельство о регистрации интеллектуального продукта

In order to verify and refine the product, we invite all who wish to send to our mailbox texts in the Tajik language, up to 10,000 characters, do not contain tables and figures. The conversion of your files will be made free of charge, in response to hearing from you a report on identified errors.
 
  Publications by topic : Наверх
 

1. Usmanov Z.J., Graschenko L.A., Fomin A.Y. Information basis of the automated Tajik-to-Persian transliteration // Izvestya AN RT - №1(130) – 2008 – P. 20-26.
2. Graschenko L.A. The conformity dictionary formation algorithm of the Tajik and Persian word forms // Doklady AN RT – vol. 51, №5 – 2008. – P. 339-345.
3. Graschenko L.A., Fomin A.Y. Experience of Tools Implementation of the Tajik-Persian Writing Systems Conversion // Doklady AN RT – vol. 51, №8 – 2008 – P. 580-583.
4. Graschenko L.A. Conceptual Model of the Tajik-to-Persian Conversion of Graphical Letter Systems // Doklady AN RT – vol. 52, №2 – 2009. – P. 111-115.

Our project in the mass-media:

  1. November 25, article in "Khovar".
  2. December 11, 2008 - article in "Digest" newspaper.
  3. December 12, article in IA "Asia-plus".
  4. December 15, reporting in "MIR". Video clip can be downloaded here (3.22 MB)
 
 

Start project: 06.11.2008 г.

Last update: 11.07.2010 г.