Transliteration/Romanization of Urdu Processing by Rashida sharif

RashidaSharif 977 views 6 slides Jan 03, 2015
Slide 1
Slide 1 of 6
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6

About This Presentation

Transliteration/Romanization of Urdu Processing by Rashida sharif


Slide Content

9
th
National Research Conference

1

Abstract
Urdu is the language which is understandable by most of
the regions in South Asia and rapidly growing medium of
communication in Arab World .Transliteration commonly known as
Romanization is a way of mapping words from one system of writing
into another. The research taken place here in “Center of Excellence
for Urdu Informatics” was about the mapping of Romanized
alphabets of Urdu in English Script to the Urdu Script. Software for
converting transliterates text into Urdu script for whom cannot write
the Urdu script. In the research, standard alphabets of Urdu listed
with their all states e.g. initial state, medial state, final state and
alone states with their Romanized alphabets of English. Similarly
developed lists of those words take place with two alphabets of
English ھب ~ bh as well as Urdu vowels and diacritics (airaab). In the
defined rules of application genuinely discussed the letters which
may be romanized in different ways depending on their context,
Romanization of orthographic symbols other than letters and vowel
sign. Deeply hunt the transliteration as affected by grammatical
structure and special characters and characters modifiers in
transliteration.


Keywords: Transliteration, English, Urdu, Phonetic issues

I. INTRODUCTION
he methodology of converting text from one writing
system to another writing system in a systematic manner
is known as transliteration. In systematic way transliteration is
the mapping from one system of writing to another letter by
letter. Mostly there mis-understands about transliteration and
transcription that both are same but clarifying the confusion


between transliteration and transcription they are situational
opposite of each other i.e. transliteration
Romanization attempts to transliterate the original script, the
guiding principle is a one-to-one mapping of characters in the
source language into the target script, with less emphasis on
how the result sounds when pronounced according to the
reader's language is mapping of words in a language where as
transcription is mapping of sounds with words in a language .
Arabic script is much into the popularity that Roman script
is not easily acceptable by the communities and they oppose
this trend. Meanwhile it is very popular on the cyber end
because of the unavailability of Arabic Script as it is rarely
Implemented or mostly underdevelopment. Numerous websites
and blogs including technical forums are in Roman Script and
communication is vastly understandable by the community on
the cyber end.
Some of the central Asian countries are not able to read or
understand Arabic script and they have used transliteration for
the recitation of Quran in Arabic even the Arab nations the
owner of the language accept and standardized in Malaya,
Indonesian, and The languages and they have standardized
letters of the Arabic alphabets to make it possible unless the
stranderazation is not existent transliteration nor possible. As
the Urdu character set is not completely implemented it is
highly in discussion the standardize and make it possible to
count on to transliterate. While the transcription implies
seeking the best way to render foreign words into a particular
language, the typing transliteration is a purely pragmatic
process of inputting text in a particular language therefore it is
highly in demand for URDU to standardize its character set for
transliteration.
Mostly transliteration is feasible where original script is not
available to write or understandable from the foreign language
user. It can be beneficial for the learning and understanding the
Languages by non native or foreign language users to use local
Learning can be made possible in easy manner.

II. PREMATURE HISTORY OF TRANSLITERATION
The work on transliteration has been started several
hundred years ago in Asia (India, arab), Indians phoneticians
had much work and analyses the sound category and analyzed
the issues accrued. Some work on transliteration has been done
Transliteration / Romanization for Urdu
processing (June 2009)
Rashida Sharif
Center of Excellence for Urdu Informatics (CEUI),
National Language Authority, Islamabad
[email protected]

T

9
th
National Research Conference

2
in the guidance of a committee was set up at the Geneva
oriental Congress in September 1894 which have been broadly
finalized the standard of transliteration of Sanskrit.

Previous Work
For transliteration of Urdu into English and several other
languages several numerous systems have been developed but
they are not reversible, most popular are British Library,
Library of Congress, and Encyclopedia of Islam. Since Urdu
and Hindi are grammatically same language and they also
share a very good number of words, its easier for both
speakers to understand each others’ language. The only
obstacle is the script. Pakistan chose Arabic script instead of
Devnagari for Urdu. And the script does not transliterate well
into Hindi or probably roman too.
1. URDU INFORMATICS “Reversible Urdu
Transliteration to be used in
computer/Email/Internet” by Dr. Attash durrani, at
national language authority publish in 2008. in this
article letters of alphabet, their roman properties and
values are discussed in incredibly
aspect.[APPENDIX B][1]
2. Letters of the Urdu alphabets are discussed in the
library of congress[4]
3. Transliteration editor for Arabic, Persian and Urdu
has developed in India at Carnegie Mellon
University, Hyderabad; they put the light of
commonalities of Middle East Languages and
discussed the all primary forms of letters.[3]
4. British Library[6]
5. Google Labs (Google Indic transliteration)[8]
6. The encyclopedia of Islam[7]
7. Urdu orthography also explain with its character sets
include in basic and secondary letters diacritics
(aerab) punctuation marks and special symbols in
center of research in Urdu language processing,
national university of computer and emerging
sciences.[4]
Provided through above mentioned resources didn’t
provide complete set of Urdu alphabets and their
transliteration scheme are not enough to develop a system that
can provide entirely explicable conversation as the user want.

III. SCHEMES/ SYSTEMS FOR TRANSLITE RATION
Different languages have their different phonetic schemes to
make possible and resolve the native language issue to convert
text from one writing system to another in a systematic way.
For Urdu language some worth full schemes that are
considered to go behind are following:
1. Speech assessment methods phonetic Alphabet
(SAMPA) widely used scheme across the world for
encoding the international phonetic alphabets (IPA).
2. Universal Intermediate Description (UIT), a scheme to
transcribe text in Urdu, Punjabi and Hindi, considered
an un ambiguous standard.[2]
3. ALA-LA Romanization scheme for letters of the
alphabet: Transliteration Schemes for Non-Roman
Scripts, approved by the Library of Congress and the
American Library Association.[4]

Issues of Mapping Urdu Alphabets into English
We need a transliteration system based both on letters
conversions and phonetic approach. For Urdu transliteration,
to a great extent work has been done or in struggling to be
done handled complexity of its structure as compare to other
languages. Numerous systems completed but still with many
ambiguities remaining for users. For mapping of Urdu letters
into English there are countless problems to acquire perfect
reversible system that adopt 100% correct equalant
transliteration into Urdu from Roman and expecting dialogue
is no symbols have to add with or before and after the letters.
In fast university, they develop a scheme for their corpus based
Urdu lexicon development system that I observe particularly in
the sense of standard alphabets of Urdu listed and its analyzed
found ambiguities, and suggest possible solution

SAMPA Urdu Letters
t ، ت ط
s ، س، ص ، ث
z ذ ،ض، ظ ، ز
h ، ہ ح
a ا آ ،
@ ء، ع
Table A: ambiguous letters in Appendix A

If ت" “ and “ط” are converted to “t“ reverse transliteration
is not possible because the said table mapping devising uni-
directional transliteration. So the same is the case for the all
listed letters of Table A.

As per the mapping given at table c the revised system could
produce again and again recursively without any problem as
shown or ignored by Appendix A mapping.

IV. CONCLUSION
As we need a transliteration system based both on letter
conversion and phonetic approach suggested Urdu set of
characters [table c] for reversible transliteration scheme is a
worth able to admit the developing systems rising day by day
feeling the need of writing to communicate among native user,
all alphabets are conversed and settled for English to Urdu
transliteration. In the list all consonants, Diagraph representing
Urdu aspirates, Urdu vowel and Diphthongs discretely
describe to evade the ambiguous of same sounds letters that’s
rebuff the level way to make transliteration possible.

9
th
National Research Conference

3
Appendix A

9
th
National Research Conference

4
Appendix B

9
th
National Research Conference

5

9
th
National Research Conference

6


REFERENCES
[1] Attash Durrani.Dr, “Reversible Urdu Translitration (book,
Urdu Informatics)”, 1
st
ed.Vol:1, Islamabad: 2008. pp.48-
50
[2] Sarmad Hussain.dr & Madiha Ijaz, “corpus based Urdu
lexicon development”(article), Center for research in Urdu
language processing, national university of Computer and
emerging sciences, Lahore 2007
[3] M.G Abbas malik, pushpak Bhattacharyya, Christian boilt
“Hindi Urdu Machine transliteration using Finite-state
Transliteration (HUMT)”(Article), GTALP, laboratories
d’informatique grenoble, University josep Fourier,
France. Dept. of Computer science and engineering, IIT
Bombay, India 2008
[4] ALA .LA, “Library of congress”(online sources) www.loc.gov
[5] SAMPA, ”speech assessment method of phonetic Alphabets”(online
sources, article), Sarmad Hussain.dr & Madiha Ijaz, “corpus based Urdu
lexicon development”[2], http://phone.ucl.ac.uk/home/sampa/
[6] British Library,”transliteration scheme” (online
sources).Http://www.bl.uk/
[7] The encyclopedia of Islam, (online
sources),www.muslimphilosophy.com/ei2/list.htm
[8] Google Labs “Google Indic Transliteration” (online sources),
http://www.google.com/transliterate/indic/Urdu



Rashida Sharif is working in Center of Excellence for Urdu Informatics,
National Language Authority, Islamabad
Tags