NAMA BAYDA (နာမဗေဒ) that is an online tool developed for operations related to Myanmar names

1
Nama Bayda (နာမ
ဗေဒ
)
ဟိန်းထက်အာကာ
မောင်

နိုဝင်ဘာလ၊ ၂၄ရက်
Abstract

နိုင်ငံတကာတွင်
AI
နည်းပညာသည် သိသိသာသာ တိုးတက်နေပြီး ယခုအချ ိန်တွင်
GPT 4 ထိ
ရောက်ရှိနေပြီ

ဖြစ်သော်လည်းမြန်မာစာ၊မြန်မာဘာသာစကား၌မူ အားနည်းနေသေးသည်ကိုတွေ့ရှိရပါ သည်။ထိုသို့ အားနည်းနေ

ခြင်းမှာ ဘာသာဗေဒဆိုင်ရာ အားနည်းနေခြင်း၊မြန်မာစာ၊မြန်မာဘာသာစကား နှင့်ပတ်သက်သော
Data

အချက်အလက်များ လုံလောက်စွာမရှိခြင်းတို့ကြောင့်ဖြစ်သည်ဟုထင်မှတ်မိသဖြင့်မြန်မာစာ၊မြန်မာဘာသာစကား

တိုးတက်လာစေရန် ရည်ရွယ်၍ ချဥ်းကပ်ခဲ့ခြင်းဖြစ်သည်။
Generative Ai
၌ အားနည်းနေသေးသဖြင့်
Traditional method
ဖြစ်သည့်
Recurrent Neural Network
(RNN) [5][6]
ကိုသုံးပြီး
Train
ထားပါသည်။
Dataset
ကိုမ ူမြန်မာနိုင်ငံရ ှိ တက္က သိုလ်များမ ှကျောင်းသား၊

ကျောင်းသူများ၏ နာမည်များကို ကိုယ်တိုင် စရင်းပြုစုပြီး စုဆောင်းထားသည်။ လက်ရှိတွင ်
Data
ပေါင်းလေး

ထောင်ကျော် ရှိနေပြီဖြစ်သည်။
API
အဖြစ်
Google
မှ
released
ထားသော
Tensorflow API
ကိုသုံးထားသည်။

မြန်မာလူမျ ိုးများ၏ အမည်များသည် တစ်လုံးချင်း၌ အဓိပ္ပာယ်ရှိနေသဖြင့်
(
ဥပမာ
-
ကောင်းမြတ် ဆို

သောအမည်တွင်
"ကေ
ာင်း
"
ဟူသည်
"ကေ
ာင်းမွန်ခြင်း
"
ဟုအဓိပ္ပာယ်ရပြီး
"မြ
တ်
"
ဟူသည်
"မြ
င့်မြတ်ခြင်း
" ဟုအဓိပ
္

ပယ်ရသည်။
) Generate
လုပ်ရာတွင ် ၎င်းအချက်ကိုလည်း ထည့်သွင်းစဥ်းစားထားပြီး တစ်လုံးနှင့်တစ်လုံး

ချ ိတ်ဆက်ရာတွင်လဲလှပ၍အဓိပ္ပယ်ရှိစေရန် အဓိကထားခဲ့သည်။

2
Introduction

နာမဗေဒဟူသည်မြန်မာနာမည်နှင့်ပတ်သတ်ပြီး လုပ်ဆောင်ချက်များပါရှိသောနေရာတစ်ခုဖြစ်သည်။ ရုံး

လုပ်ငန်းများအတွက် ပိုမိုလွယ်ကူစွာ
Generate
လုပ်နိုင်စေရန်
Online Tool
အဖြစ်ဖန်တီးတည်ဆောက် ထားသည်။

နာမဗေဒတွင် အပိုင်းလေးပိုင်းပါရှိသည်။ အဓိကအပိုင်းဖြစ်သည့်
Burmese Name Generator
၊
Burmese Name
Romanizer
၊
Burmese Name Gender Detector
၊
Burmese Name Meaning Explainer
တို့ဖြစ်ပါသည်။
Burmese Name Genertor
ဆိုသည်မှာ
AI
နည်းပညာဖြင့်မြန်မာနာမည်ကိုထုတ်ပေးခြင်းဖြစ်သည်။

သက်ရှိသက်မဲ့အရာရာတိုင်းတွင ် နာမည်သည်မရှိမဖြစ် လိုအပ်သောအရာဖြစ်ပြီးပြီးစလွယ ် မှားပေးမသင့်သော

အရာလဲဖြစ်ပါသည်။ ဤသို့ အရေးကြီးသောအရာကို ဂရုတစိုက်ရွေးချယ်ရာတွင် အကြံအဉာဏ်ပေး ကူညီပေးနိုင်

စေရန်အတွက ်
(
အထူးသဖြင့်မြန်မာနာမည်လိုအပ်နေသော နိုင်ငံခြားသားများ
) Burmese Name Generator ကို
ဖ
န်တီးခဲ့ခြင်းဖြစ်သည်။
NLP
အတွက်ဆိုလျှင်လည်းမြန်မာဘာသာနှင့်
Dataset
တစ်ခုတိုးလာခဲ့ပြီးမြန်မာစာ ကို မည်သို့ မည်ပုံ
Manage
လုပ်ရသည်ကို သိရှိရမည်ဖြစ်သည်။
Dataset
ကို အစားထိုးလိုက်ယုံဖြင့်လည်း တခြားသောနာမည်များ
(
အိမ်မွေးတိရိစ္ဆန်နာမည်၊ လမ်းနာမည်၊ လုပ်ငန်းနာမည်၊ အစရှိတာတွေ
) ကို
ဖန်တီးနိုင်ပါသည်။
Burmese Name Romanizer
ဆိုသည်မှာမြန်မာနာမည််ကိုမြန်မာစာမှ အင်္လိပ်သို့ ၊ အင်္လိပ်မှမြန်မာစာသို့

ပြောင်းပေးခြင်းဖြစ်သည်။
Data
အများအပြားကိုင်တွယ်ရသောနေရာများ
(
ဥပမာ
Data Analysis) တွင်

အလွယ်တကူပြောင်းပေးနိုင်စေရန် ရည်ရွယ်၍ဖန်တီးခဲ့ခြင်းဖြစ်သည်။
Burmese Name Gender Detector
ဆိုသည်မှာ
AI
နည်းပညာဖြင့်မြန်မာနာမည်ကိုယောက်ျား၊ မိန်းမ ခွဲ

ပေးခြင်းဖြစ်သည်။
Data Analysis
ကဲ့သို့သော အပိုင်းတွင ် များစွာအထောက်အကူပြုစေနိုင်သည်ဟ ု ထင်မိ၍
ဖ
န်တီးခဲ့ခြင်းဖြစ်သည်။
Burmese Name Meaning Explainer
ဆိုသည်မှာမြန်မာနာမည််တွင ် စာလုံးတစ်လုံးစီတိုင်းအတွက်

အဓိပ္ပာယ်ကိုယ်စီဖြင့်ဖွဲ့စည်းထားသဖြင့်ယင်းအဓိပ္ပာယ်ကို သိချင်သောသူများ
(
အထူးသဖြင့်မြန်မာနာမည် လိုအပ်

နေသော နိုင်ငံခြားသားများ
)
အတွက် များစွာအထောက်အကူပြုစေနိုင်ရန် ရည်ရွယ်၍ဖန်တီးခဲ့ခြင်းဖြစ်သည်။

စမ်းသပ်သုံးကြည့်ပြီး လိုအပ်သည်များကို အကြံဉာဏ်ပေးနိုင်စေရန်အတွက်
Streamlit cloud ပေ
ါ်တွင်တင်
ထား
ပေးပါသည်။
Streamlit Link: https://nama-bayda-ue44squejirjyddmylit3m.streamlit.app/
Github: https://github.com/Xeyn-X/Nama-Bayda

3
Methodology
1. Burmese Name Generator

နာမဗေဒ မှ
Burmese name generator
ကို
Traditional method
ဖြစ်သော
Recurrent Neural Network
(RNN)
ကိုသုံးပြီး တည်ဆောက်ထားပါသည်။ တည်ဆောက်ပုံအဆင့်ဆင့်မှာအောက်ပါအတိုင်းဖြစ်ပြီး
Model အား
Train
ထားသည့်
Coding
ကိုလည်း
Sharing လုပ်ထား
ပေးပါသည်။
Kaggle: https://www.kaggle.com/code/heinhtetahkarmg/burmese-name-generator
1.1. Data Collection

ခေတ်နှင့်အညီ လှပေသာ နာမည်များ ရရှိစေနိုင်ရန်အတွက် နာမည်များကိုမြန်မာနိုင်ငံရှိ တက္က သိုလ်များမှ

ကျောင်းသား၊ကျောင်းသူများ၏ နာမည်များက ို စုဆောင်း၍ စာရင်းပြုစုထားသည်။ လက်ရှိတွင ်
Data ပေ
ါင်း

၄၀၀၀ကျော်ခန့်စုဆောင်းထားပါသည်။
Dataset link
ကိုအောက်တွင်ဖော်ပြထားပါသည်။
Dataset link: https://www.kaggle.com/datasets/heinhtetahkarmg/burmese-name
1.2. Data Preprocessing

မြန်မာစာများသည ် အင်္လိပ်စာများကဲ့သ ို့
Space
ခြားပြီးမရေးရသဖြင့်
Tokenize လုပ်
သောအခါ
Character
အလိုက် ခွဲလျှင် အက္ခရာတစ်လုံးစီပြန့်ကွဲသွားမည်ဖြစ်သည ်
( “ ”
ဥပမာမြန်မာ
- “ ”
မ ြန ်မာ
) ။

ထို့ပြင်
Character
အလိုက်
Tokenize
လုပ်၍
Model
ကို
Train
သောအခါ
Result
မကောင်းသောကြောင့်
Tokenize

လုပ်သည့်အခါ
Syllable
လုပ်၍ ခွဲရမည်။
Syllable
ဆိုသည်မှာ စကားအသံကို လိုက်၍ခွဲခြင်းဖြစ်သည်။
(ဥပမာ
“ ” “ ”
ကျွန်တော်သည် ကိုခွဲမည်ဆိုလျှင ် ကျွန ်တော ် သည် ဟူ၍ဖြစ်သည်။ ပထမဦးစွာ အထက်ပါနည်းအတိုင်း
Tokenize
လုပ်သည်။
[1][2]

ထို့နောက်
Tokenize
လုပ်ထားသော စာလုံးများကို
Out of Vocab (OOV)
ဟုခေါ်သည့်နည်းလမ်းကိုသုံး၍
Sequence
များအဖြစ်သို့ပြောင်းသည်။
Sequence
သို့ပြောင်းရခြင်းမှာ
Y
အား
Predict
လုပ်သောအခါ
Syllable

ပြောင်းထားသော စာလုံးများအတိုင်း ရရှိရန်ဖြစ်သည်
(
ဥပမာ
X = “ ”
ကောင်းမြတ်
, Y = “”
မင်း
)
။ပြောင်းလိုက်သည့်
Sequence
များကို စာပိုဒ်ကြီးတစ်ခုကဲ့သို့
Concat
လုပ်လိုက်ပြီး
Model Train
ရန်
Sequence Length
ကို ၁၀ ထား

ပြီး
X
နှင့်
Y
အဖြစ် ခွဲထုတ်လိုက်ပါသည်။ ထို့နောက် ရရှိလာသော
X
နှင့်
Y
ကို
Sequence
မှ
One-hot Encoding

ပြောင်းပါသည်။
[3][4]
1.3. Model Training
Model
တည်ဆောက်ရန်အတွက ်
Advanced RNN Architecture
တစ်ခုဖြစ်သော
LSTM ကိုသ
ုံးထားပြီး
Flatten Layer
တစ်ခု၊
Relu Activation Function
ကိုသုံးထားသော
Dense Layer
တစ်ခုနှင့်
Softmax Activation
Function
ကိုသုံးထားသော
Dense Layer
တစ်ခုကို သုံးထားပါသည်။
Optimizer
အဖြစ်
Adam
ကိုသုံးထားပြီး
Loss

တွင်
categorical_crossentropy ကိုသ
ုံးထားပါသည်။

ထို့နောက်
Batch Size
အား ၆၄ ထားကာ
Epoch
အကြိမ်ရေ၁၀၀ဖြင့်
Model
ကို
Train ပါသည်။
1.4. Result and Evaluation

4
Accuracy
မှာ ၀
.
၈ ရရှိပြီး
Loss
မှာ ၀
.
၅ ရရှိသဖြင့်
Genenrate
လုပ်သည့်အခါ
Result ကေ
ာင်းနိုင်သည့်

အတွက်
Evaluation
အဆင့်သို့ ဆက်လက်လုပ်ဆောင်ပါသည်။
Evaluation
အဆင့်တွင် ပထမဦးစွာ
X
ကိုရရှိရန ်
Concat
လုပ်ထားသော
Sequence
ထဲမှ နာမည်သုံးလုံး

ကျော်ခန့်အရှည်ရှိသော
Sequence
အတိုတစ်ခုအား
random
ယူသည်။ ထို့နောက် ၎င်း
X
ကိုသုံး၍
Y
ကို
Predict

ပြီး နာမည်အသစ်တစ်ခုကို စတင်ပြုလုပ်ပါသည်။
X
ထဲရှိ
Enter(\n)
အထိကို
Predict
ပြီးသောအခါ နာမည်အသစ်

တစ်ခုကို ရရှိပါသည်။ ရရှိလာသော ရလဒ်ကို ပုံ၁
.၄(က)
တွင်ဖော်ပြထားသည်။

ပုံ၁
.၄(က)
နာမည်အသစ်များ ရလဒ်

ရရှိလာသော နာမည်များသည် ရလဒ်ကောင်းမွန်သဖြင့်
Model
အား
Keras File Type
ဖြင့်
Save လိုက်ပါ

သည်။ ထို့နောက ်
Streamlit
ကိုအသုံပြု၍
User Interface
များထည့်သွင်းတည်ဆောက်ပါသည်။ ထို့နောက် ပိုမို

ကောင်းမွန်စေရန်
Generate
ထုတ်မည့်အရေအတွက်နှင့်ကျား၊မ ခွဲထုတ်ပေးသည့်
Filter
များကိုပါ ဆက်လက်ထည့်

သွင်းထားပြီး နာမည်အသစ်များ၏ အဓိပ္ပာယ်များကိုပါ တွဲဖက်၍ ထည့်သွင်းထားသည်။ ကျား၊မ ခွဲပေးသည့်ကဏ္ဍ

နှင့်နာမည်များ၏ အဓိပ္ပာယ်ကိုဖော်ပြပေးသော ကဏ္ဍ တို့ ကိုအောက်ပိုင်းတွင် အသေးစိတ်ဖော်ပြပေးသွားပါမည်။

ရလဒ်အား ပုံ၁
.၄(ခ)
တွင်ဖော်ပြထားသည်။

ပုံ၁
.၄(ခ) Burmese Name Generator နေ
ာက်ဆုံးရလဒ်
2. Burmese Name Gender Detector

5

နာမဗေဒမှနောက်ထပ်အပိုင်းတစ်ခုဖြစ်သည့်
Burmese name gender detector
ကိုလည်းပဲ
Traditional
method
ဖြစ်သော
Recurrent Neural Network (RNN)
ကိုသုံးပြီး တည်ဆောက်ထားပါသည်။ တည်ဆောက်ပုံအ

ဆင့်ဆင့်မှာအောက်ပါအတိုင်းဖြစ်ပြီး
Model
အား
Train
ထားသည့်
Coding
ကိုလည်း
Sharing
လုပ်ထားပေး
ပါသည်။
Kaggle: https://www.kaggle.com/code/heinhtetahkarmg/gender-detection-for-burmese-name
2.1. Data Collection

အထက်တွင်ဖော်ပြခဲ့သော
Burmese Name Generator
ရှိ
Dataset
မှ နာမည်များကို ကျား၊မ
Column

တစ်ခုထပ်တိုး၍ စာရင်းအသစ်ပြန်လည်ပြုစုထားသည်။
Dataset link
ကိုအောက်တွင်ဖော်ပြထားပါသည်။
Dataset link: https://www.kaggle.com/datasets/heinhtetahkarmg/burmese-name-with-gender
2.2. Data Preprocessing

ပထမဦးစွာ
Burmese Name Generator
တွင်ဖော်ပြခဲ့သည့်နည်းအတိုင်း
Syllable
အလိုက ်
Tokenize

လုပ်ပြီး
training dataset
နှင့်
testing dataset
ဟူ၍ နှစ်ခုခွဲလိုက်သည်။ ထို့နောက ်
training dataset အတွင်းရှိ
Syllable
လုပ်ထားသော နာမည်များက ို
Out of Vocab (OOV)
ဟုခေါ်သည့်နည်းလမ်းကိုသုံး၍
Sequence များ

အဖြစ်သို့ပြောင်းသည်။
2.3. Model Training
Model
တည်ဆောက်ရန်အတွက ်
Advanced RNN Architecture
တစ်ခုဖြစ်သော
LSTM ကိုသ
ုံးထားပြီး
Flatten Layer
တစ်ခု၊
Relu Activation Function
ကိုသုံးထားသော
Dense Layer
နှစ်ခုနှင့်
Sigmoid Activation
Function
ကိုသုံးထားသော
Dense Layer
တစ်ခုကို သုံးထားပါသည်။
Optimizer
အဖြစ်
Adam
ကိုသုံးထားပြီး
Loss

တွင်
binary_crossentropy
ကိုသုံးထားပါသည်။ ထို့နောက်
Epoch
အကြိမ်ရေ၂၀ဖြင့်
Model
ကို
Train ပါသည်။
2.4. Result and Evaluation
Training Dataset
တွင ်
Accuracy
မှာ ၀
.
၉၉ ရရှိပြီး
Loss
မှာ ၀
.
၀၀၄ ရရှိပြီး
Testing Dataset တွင်မူ
Accuracy Score
မှာ ၉၄
.၈%
၊
Precision Score
မှာ ၉၄
.၇%
၊
Recall Score
မှာ ၉၄
.၇%
၊
F1 Score
မှာ ၉၄
.၇% ရရှိ

ပါသည်
(ပ
ုံ၂
.၄(က)တွင်
ဖော်ပြပေးထားသည်
)
။ ထို့နောက်
Evaluation အဆင
့်သို့ ဆက်လက်လုပ်ဆောင်ပါသည်။
ပ
ုံ၂
.၄(က) Testing Dataset
ဖြင့်ရလဒ်

6

ရရှိလာသော ရလဒ်များသည်ကောင်းမွန်သဖြင့်
Model
အား
Keras File Type
ဖြင့်
Save လိုက်ပါသည်။

ထို့နောက် အထက်တွင်ဖော်ပြခဲ့သော
User Interface
တွင်ထပ်ပေါင်း ထည့်သွင်းတည်ဆောက်ပါသည်။ ရလဒ်အား

ပုံ၂
.၄(ခ)
တွင်ဖော်ပြထားသည်။

ပုံ၂
.၄(ခ) Gender Detector နေ
ာက်ဆုံးရလဒ်
3. Burmese Name Romanizer

နာမဗေဒမှနောက်ထပ်အပိုင်းတစ်ခုဖြစ်သည့်
Burmese name romanizer
ကို
AI
နည်းပညာကို သုံးရန်မ

လိုအပ်သဖြင့်
Formula
ကိုသာ အသုံးပြုခဲ့သည်။ တည်ဆောက်ပုံအဆင့်ဆင့်မှာအောက်ပါအတိုင်းဖြစ်သည်။
Coding
ကိုလည်း
Github
ပေါ်တွင် တင်ပေးထားပြီး
Link
အား
Introduction
တွင်
Sharing လုပ်ထား
ပေးပါသည်။
3.1. Data Collection

အထက်တွင်ဖော်ပြခဲ့သောအပိုင်းမှ
OOV
သုံးလိုက်သဖြင့်ရရှိလာသော စာလုံး
Index
များ
( ‘ ’
ဥပမာအောင် ၊
‘ ’
စု ၊ စသည်ဖြင့်
)
ကို
CSV File Type
အဖြစ် သိမ်းထားပြီး ၎င်း
File
အား
English Romanize Column
တစ်ခု ထပ်

တိုး၍ စာရင်းအသစ်ပြန်လည်ပြုစုထားသည်။ ဥပမာပုံကို ပုံ၃
.၁(က)
တွင်ဖော်ပြထားပါသည်။
Dataset ကိုလည်း
Github
တွင် ထည့်ထားပေးပါသည်။

ပုံ၃
.၁(က)
ပြန်လည်ပြုစုထားသော
Dataset

3.2. Data Preprocessing

7

ပထမဦးစွာ တချ ို့မြန်မာနာမည်များသည ်
Romanize
လုပ်သည့်အခါ တဆက်တည်းပေါင်းရေးရသည်
( ‘ ’ ‘
ဥပမာ အာကာ ဆိုလျှင်
Arkar’ ဟု
ပေါင်းရေးသည်
)
။ ထို့ကြောင့်တဆက်တည်းပေါင်းရေးရသည့်နာမည်များကို

ထူးခြားနာမည်များအဖြစ ် ထပ်မံစရင်းပြုစု၍
(
ပုံ၃
.
၂
(က)
တွင ်ဖော်ပြထားသည်
) CSV File Type အ
ဖြစ်

သိမ်းထားသည်။

ပုံ၃
.
၂
(က)
ထူးခြားနာမည်များ
Dataset

မြန်မာစာ
Unicode
တွင ် တစ်ချ ို့စာလုံးများက ိုရေးသည့်အခါရေးသားနည်း တစ်ခုထက်မက ရှိသည်
( ‘ ’
ဥပမာ သိမ့်ဟူသောစာလုံးတွင် အသတ်
(်)
ရေးပြီးနောက်တွင်မှအောက်ကမြစ်
(့)
ကိုရေးသည့်ပုံစံ နှင့်အောက်
က
မြစ်
(့)
ရေးပြီးနောက်တွင်မှ အသတ်
(်)
ကိုရေးသည့်ပုံစံ ဟူ၍ရှိသည်
)
။ ထိုစာလုံးများကို ပုံသေတစ်မျ ိုးတည်း

ဖြစ်အောင် ညှိပြီးပြောင်းရသည်။ထို့နောက် နာမည်များက ို
Syllable
အလိုက ်
Tokenize
ဦးစွာလုပ်ပြီး
Romanize
လုပ်ပါသည်။

အင်္လိပ်မ ှပြောင်းပြန ်
Romanize
လုပ်သည့်အခါ အထက်၌ဖော်ပြခဲ့သော ပုံ၃
.၁(က)
မ ှ
Dataset ကို
Column
အရှေ့အနာက်ပြောင်းပြီး
Dataset
အသစ်တည်ဆောက်ပါသည်။ ထို့နောက ် နာမည်များက ိုပြောင်းပြန်
Romanize လုပ်ပါသည်။
3.3. Integration

မြန်မာမှအင်္လိပ်၊ အင်္လိပ်မှမြန်မာပြောင်းပေးသည့်
Filter
တစ်ခု ထည့်ထားသည်။မြန်မာမှအင်္လိပ် သို့
Romanize
သည့်အပိုင်းတွင ်
CSV File
ကိုပါ
Upload
လုပ်ပြီး
Romanize
နိုင်ပြီး ရရှိလာသောရလဒ်ကိုပင ်
CSV
File
အဖြစ်
Download
ပြီး ကိုယ်ပိုင်
Device
ထဲသို့
Save
ထားနိုင်သည်။

အင်္လိပ်မှမြန်မာသ ို့
Romanize
သည့်အပိုင်းတွင် တချ ို့သော နာမည်များသည် စာလုံးများ တူနေသော

ကြောင့်ဖြစ်နိုင်သော နာမည်များကို ထုတ်ပြပေးထားပါသည်
( ‘
ဥပမာ
San’
ဟူသော နာမည်ကို
Romanize လုပ်မည်
‘ ’ ‘ ’ ‘ ’ ‘ ’ ‘ ’
ဆိုလျှင် စံ ၊ စန်း ၊ စမ်း ၊ စန် ၊ စမ် ဟူသော ရလဒ်များ ရှိသည်
)
။ အင်္လိပ်စာလုံးကြီး၊သေး
Format
အား တစ်

မျ ိုးတည်းဖြစ်အောင် ညှိပြီးပြောင်းထားသည်။ ထို့နောက် အထက်တွင်ဖော်ပြခဲ့သော
User Interface တွင်ထပ်

ပေါင်း ထည့်သွင်းတည်ဆောက်ပါသည်။
4. Burmese Name Meaning Explainer

8

နာမဗေဒမှနောက်ထပ်အပိုင်းတစ်ခုဖြစ်သည့်
Burmese name meaning explainer
ကို
AI နည်းပညာကို

သုံးရန်မလိုအပ်သဖြင့်
Formula
ကိုသာ အသုံးပြုခဲ့သည်။ တည်ဆောက်ပုံအဆင့်ဆင့်မှာအောက်ပါအတိုင်း

ဖြစ်သည်။
Coding
ကိုလည်း
Github
ပေါ်တွင် တင်ပေးထားပြီး
Link
အား
Introduction
တွင်
Sharing လုပ်ထား
ပေ
းပါသည်။
4.1. Data Collection

အထက်တွင်ဖော်ပြခဲ့သောအပိုင်းမှ
OOV
သုံးလိုက်သဖြင့်ရရှိလာသော စာလုံး
Index
များ
( ‘ ’
ဥပမာအောင် ၊
‘ ’
စု ၊ စသည်ဖြင့်
)
ကို စာလုံးတစ်ခုစီ၏ အဓိပ္ပာယ်ဖွင့်ပြီး
json File Type
အဖြစ် သိမ်း၍ စာရင်းပြုစုထားသည်။
json
File
ကိုလည်း
Github
တွင် ထည့်ထားပေးပါသည်။
4.2. Data Preprocessing and Integration

ပထမဦးစွာမြန်မာနာမည်က ို
Syllable Tokenize
လုပ်ရသည်။ တချ ို့သောမြန်မာနာမည်များသည်

တစ်လုံးချင်း၌ အဓိပ္ပာယ်မရှိဘဲပေါင်းရေးမှသာ အဓိပ္ပာယ်ဖွင့်၍ ရပါသည ်
( ‘ ’
ဥပမာ ရတနာ ဟူ၍ပေါင်းရေးမှ
‘Treasure’
ဟူသော အဓိပ္ပာယ်ဖွင့်၍ရသည်
)
။ ထို့ကြောင့်ပေါင်းရေးရမည့်စာလုံးများနှင့်အဓိပ္ပာယ်များကို
Dictionary List
အဖြစ် တည်ဆောက်ထားသည်။ ထို့နောက် အထက်တွင်ဖော်ပြခဲ့သော
User Interface တွင်ထပ်

ပေါင်း ထည့်သွင်းတည်ဆောက်ပါသည်။

9
Challanges and Limitions
1. Challanges

အဓိကအားဖြင့်တွေ့ရသော
Challange
များမှာအောက်ပါအတိုင်းဖြစ်သည်။
1.Data
သည် အားသာချက်ဖြစ်သကဲ့သို့ အားနည်းချက်လဲဖြစ်သည်။
Dataset
ထဲရှိ နာမည်များသည်
Clean

မဖြစ်သဖြင့်
(
ဥပမာ
Font
မမှန်ခြင်း၊ စာလုံးပေါင်း မမှန်ခြင်း၊ တိုင်းရင်းသား နာမည်များ၊ရှေးဆန်သော

နာမည်များ ပါဝင်မ ှု များနေသောကြောင့်
)
ရလဒ်မကောင်းခြင်း၊ ရလဒ်မမှန်ခြင်းများဖြစ်စေသည်။ ထို့

ကြောင့်
Data
များကို
Clean
ဖြစ်အောင် ဦးစွာ စီမံခဲ့ရသည်။
2.
နာမည်များကို
Random Generate
လုပ်သောကဏ္ဍတွင် နာမည်အစအား
Word Index
မှ
Random ယူ
ပြီး
Generate
သောအခါ နာမည်များ မမှန်သည်က ိုကြုံခဲ့ရသည်။ ထို့ကြောင့်
Concat လုပ်ထား
သော
Sequence
ထဲမ ှ နာမည်သုံးလုံးကျော်ခန့်အရှည်ရှိသော
Sequence
အတိုတစ်ခုအား
Random ယူ
ပြီး
Generate
လုပ်လိုက်သောအခါမှ နာမည်များ မှန်ကန်ပြီး ရလဒ်ကောင်းများ ရရှိခဲ့သည်။
3.
ကျား၊ မ
Data
အား တစ်ကြောင်းစီ ထည့်သွင်းရ၍ အချ ိန်ပေးခဲ့ရသည်။
4.Save
ထားသော
Model
အား
Load
လုပ်၍
Predict
လုပ်သောအခါ
Model
ကို
Train
သောကဏ္ဍမှ အချ ို့
Data
များကိုပြန်လည်အသုံးချရန် လိုအပ်သည်ကိုတွေ့ရသည်။ ထို့ကြောင့်၎င်းလိုအပ်သော
Data များကို
pkl File Type
ဖြင့်သိမ်းပြီးပြန်သုံးခဲ့ရသည်။
5.
မြန်မာစာ
Unicode
တွင် တစ်ချ ို့စာလုံးများကိုရေးသည့်အခါရေးသားနည်း တစ်ခုထက်မက ရှိသဖြင့်၎င်း

တို့ ကို ရှာဖွေပြီး ညှိယူခဲ့ရသည်။
6.
အင်္လိပ်မှမြန်မာသ ို့
Romanize
သည့်အပိုင်းတွင် တချ ို့သော နာမည်များသည် စာလုံးများ တူနေသော

ကြောင့်ဖြစ်နိုင်သော နာမည်များကိုသာ ထုတ်ပြပေးထားပြီး
Limitation
အဖြစ်သာ သတ်မှတ်လိုက်သည်။
7.
တချ ို့သောမြန်မာနာမည်များသည ် တစ်လုံးချင်း၌ အဓိပ္ပာယ်မရှိဘဲပေါင်းရေးမှသာ အဓိပ္ပာယ်ရှိသဖြင့်

၎င်းတို့ ကို ရှာဖွေပြီး စီမံခဲ့ရသည်။
8.
တချ ို့သောမြန်မာနာမည်များသည် အဓိပ္ပာယ်ရေရေရာရာ မရှိဘဲအလှ သီးသန့်သာဖြစ်သည်။ တစ်ချ ို့

သည်လည်း ပါဠိ၊ အင်္လိပ်၊ ဟိန္ဒူ စသည်မှ ဆွဲယူထားသည်။ ထို့ကြောင့်ထိုနာမည်များကို အဓိပ္ပာယ်ဖွင့်

ရာ၌ အခက်အခဲများစွာ ရှိခဲ့သည်။
2. Limitations

အဓိကအားဖြင့်တွေ့ရသော
Limitation
များမှာအောက်ပါအတိုင်းဖြစ်သည်။
1.AI
ဆိုသည်မှာဖြစ်နိုင်ချေအများဆုံးကို ခန့်မှန်းခြင်းဖြစ်သဖြင့်၁၀၀
%
မှန်ကန်သည်ဟူ၍ မရှိပါ။
2.
တိုင်းရင်းသား နာမည်များ၊ရှေးဆန်သော နာမည်များ ပါဝင်မှု များနေလျှင် ရလဒ်မကောင်းခြင်း၊ ရလဒ်မ

မှန်ခြင်းများဖြစ်စေသည်။
3.
တစ်ချ ို့နာမည်များသည် ကျား၊ မ နှစ်မျ ိုးလုံးတွင် သုံးသည့်အတွက် ရလဒ်မမှန်ခြင်းများဖြစ်စေသည်။
4.Romanize
လုပ်သည့်အပိုင်းတွင ်
Formula
ကိုသာ သုံးထားသဖြင့်
Dataset
ထဲ၌မရှိသော နာမည်စာလုံး

များ
(
အထူးသဖြင့်တိုင်းရင်းသားနာမည်များ
)
ကို
Romanize မလုပ်နိုင်ပါ။
5.
အင်္လိပ်မှမြန်မာသ ို့
Romanize
သည့်အပိုင်းတွင် တချ ို့သော နာမည်များသည် စာလုံးများ တူနေသော

ကြောင့်ဖြစ်နိုင်သော နာမည်များကိုသာ ထုတ်ပြပေးထားသည်။

10
6.
အဓိပ္ပာယ်ဖွင့်ဆိုပေးသည့်အပိုင်းတွင်လည်း
Formula
ကိုသာ သုံးထားသဖြင့်
Dataset ထ
ဲ၌မရှိသော

နာမည်စာလုံးများကို အဓိပ္ပာယ်မဖွင့်ပေးနိုင်ပါ။

11
Conclusion

အကျဥ်းချုပ်အားဖြင့်ဆိုရသော် နာမဗေဒသည်မြန်မာနာမည်နှင့်ပတ်သတ်ပြီး လုပ်ဆောင်ချက်များပါရှိပြီး

တိုးတက်လာသော
AI
နည်းပညာ၌မြန်မာစာ၊မြန်မာဘာသာစကား၌မူ အားနည်းနေမှုကိုဖြေရှင်းရန် ပထမခြေလှမ်း

ဖြစ်သည်။ ထို့ အပြင် ရုံးစရင်းအခန်းကဏ္ဍ၊
Data Analysis
ကဏ္ဍ များတွင် တစ်နေရာရာ၌ အထောက်အကူပေး

နိုင်သည်။
NLP
အခန်းကဏ္ဍအတွက်တွင်လည်းမြန်မာ
Dataset
များပေါ်များလာခြင်း၊မြန်မာစာနှင့်ပတ်သတ်သော

အခြေခံများ၊ ထူးခြားချက်များက ို
Reference
ယူနိုင်သည်။မြန်မာဘာသာဗေဒက ိုလေ့လာနေသူများ အတွက်

လည်း တစ်နည်းနည်းဖြင့်အထောက်အကူပေးနိုင်သည်။
Future Work

နောက်ထပ်ခြေလှမ်းသစ်များအနေဖြင့်မြန်မာစာ၊မြန်မာဘာသာစကားကို အခြေခံသောမြန်မာဘာသာ

ပြန်များ
(Burmese Translator)
၊ စာလုံးပေါင်းသတ်ပုံစစ်ဆေးခြင်း
(Spell Checker)
၊ စာလုံးများကြိုတင်ခန့်မှန်း

ခြင်း
(Text Predition)
၊
Burmese Chatbot
များ၊ အမုန်းစကားများစစ်ဆေးခြင်း
(Hatespeech Detector)
၊ စာသား

ကိုကြည့်ပြီး စိတ်နေစိတ်ထား ခန့်မှန်းခြင်း
(Emotional Detector)
အစရှိသည်တို့ ကို တစ်ခုပြီးတစ်ခု တစ်ဆင့်ပြီး

တစ်ဆင့်ဆက်လက် အကောင်အထည်ဖော်သွားရန် ရည်မှန်းထားသည်။ ထို့ပြင်
RNN
အပြင် အခြားနည်းလမ်းများ

ကိုလည်းလေ့လာအသုံးပြု၍ ဆက်လက် အကောင်အထည်ဖော်သွားပါမည်။

12
References
1.Myanmar Syllable https://www.slideshare.net/slideshow/myanmar-syllable-
239356703/239356703
2.myWord: Burmese Syllable Segementation Tool by Mr. Ye Kyaw Thu https://github.com/ye-
kyaw-thu/myWord
3.Arabic Name Generator with RNN in Keras by Ouassim Adnane
https://www.kaggle.com/code/ishivinal/arabic-name-generator/notebook
4.Indian Baby Names Generator: Text-Processing+RNN by Meet Ranoliya
https://www.kaggle.com/code/meemr5/indian-baby-names-generator-text-processing-rnn
5.Introduction to Recurrent Neural Networks by geeksforgeeks
https://www.geeksforgeeks.org/introduction-to-recurrent-neural-network/
6.What is LSTM - Long Short Term Memory? By geeksforgeeks
https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/

13
Nama Bayda
Hein Htet Arkar Mg
November 24
Abstract
In the international arena, AI technology is advancing significantly, and currently, it has
reached the level of GPT-4. However, it is observed that in the Myanmar language, the progress is
still lacking. This deficiency is believed to be due to linguistic limitations and the insufficient
availability of data related to the Myanmar language. Thus, efforts have been made to enhance the
development of the Myanmar language.
Due to the current limitations in generative AI, a traditional method, specifically Recurrent
Neural Network (RNN) [5][6], was employed for training. The dataset was manually compiled by
collecting names of students from universities in Myanmar, resulting in over 4,000 names available
at present. The Google-released TensorFlow API was used as the primary API for this project.
Myanmar names have individual meanings (e.g., in the name “Kaung Myat,” “Kaung” means
“goodness” and “Myat” means “glory”). This consideration was integrated into the generation
process to ensure that the generated names are both meaningful and harmonious, taking into account
the connection between individual components.

14
Introduction
Nama Bayda is an online tool developed for operations related to Myanmar names, designed
to facilitate easier name generation for office and administrative tasks. It comprises four main
components: Burmese Name Generator, Burmese Name Romanizer, Burmese Name Gender
Detector, Burmese Name Meaning Explainer.
The Burmese Name Generator utilizes AI technology to generate meaningful Myanmar
names. Names are essential and should not be chosen carelessly, as they are significant identifiers for
both animate and inanimate entities. This tool was created to assist users (especially foreigners in
need of Myanmar names) by providing thoughtful name suggestions. Additionally, this project
contributes to the field of Natural Language Processing (NLP) by enriching datasets related to the
Myanmar language, offering insights into how to manage Burmese script. By simply replacing the
dataset, the generator can be adapted to create names for various other categories (e.g., pet names,
street names, business names).
The Burmese Name Romanizer converts Myanmar names between Burmese and English
scripts. This feature was designed to streamline data handling in scenarios where large datasets are
involved (e.g., data analysis).
The Burmese Name Gender Detector uses AI to classify Myanmar names by gender (male
or female). It was developed with the intention of providing significant support in areas like data
analysis, where gender classification is often required.
Myanmar names are composed of individual syllables, each carrying a distinct meaning (e.g.,
in the name “Kaung Myat,” “Kaung” means “goodness” and “Myat” means “glory”). The Burmese
Name Meaning Explainer was created to help users (especially foreigners seeking Myanmar
names) understand the meanings behind each part of a name.
The tool has been deployed on Streamlit Cloud for testing and feedback, allowing users to
explore its features and provide suggestions for improvement.
•Streamlit Link: https://nama-bayda-ue44squejirjyddmylit3m.streamlit.app/
•GitHub Repository: https://github.com/Xeyn-X/Nama-Bayda
This project not only enhances accessibility to Myanmar names but also contributes to the
growing field of Burmese NLP by providing useful datasets and tools.

15
Methodology
1. Burmese Name Generator
The Burmese name generator, based on the study of Burmese names, is built using the
traditional method of a Recurrent Neural Network (RNN). The construction process is as follows,
and the code used to train the model is also shared.
Kaggle: https://www.kaggle.com/code/heinhtetahkarmg/burmese-name-generator
1.1. Data Collection
To generate modern and appealing names, a list has been compiled using the names of
students from various universities in Myanmar. Currently, over 4,000 data entries have been
collected. The dataset link is provided below.
Dataset link: https://www.kaggle.com/datasets/heinhtetahkarmg/burmese-name
1.2. Data Preprocessing
Since Burmese characters are not written with spaces like English, tokenizing at the character
level would result in the separation of each letter (for example, “ ”
မြန်မာ
would become “
မ ြန ်မ
”
ာ
). Additionally, tokenizing at the character level did not yield good results during model training.
Therefore, to avoid this, tokenization should be done at the syllable level. A syllable is the segment
of speech that is separated based on pronunciation (for example, “ ”
ကျွန်တော်သည်
would be tokenized
as “ ”
ကျွန်တော် သည်
). Initially, tokenization is performed as described above. [1][2]
After tokenization, the words are converted into sequences using a method known as Out of
Vocabulary (OOV). This conversion into sequences is important to predict Y correctly based on the
syllable tokens (for example, X = “ ”
ကောင်းမြတ်
, Y = “”
မင်း
). The transformed sequences are then
concatenated, similar to a large sentence, and the sequence length is set to 10 to train the model. The
data is split into X and Y, and these sequences are converted into one-hot encoding. [3][4]
1.3. Model Training
To build the model, an advanced RNN architecture known as LSTM is used. It includes a
Flatten layer, a Dense layer with a ReLU activation function, and another Dense layer with a
Softmax activation function. The Adam optimizer is applied, and categorical cross-entropy is used as
the loss function. Next, the model is trained with a batch size of 64 and 100 epochs.
1.4. Result and Evaluation

16
With an accuracy of 0.8 and a loss of 0.5, the results are promising, so the evaluation phase
proceeds.
During the evaluation, a random sequence with a length of more than three names is selected
from the concatenated sequence to obtain X. Using this X, Y is predicted, and a new name is
generated. The prediction continues until the Enter (\n) character is reached, signaling the completion
of the name. The resulting output is shown in Figure 1.4(a).
Figure 1.4 (a) The Result of Generated Name
Since the generated names yielded good results, the model was saved in the Keras file
format. Next, Streamlit was used to build and integrate the user interface. To improve functionality,
filters were added to specify the number of names to generate and to distinguish between male and
female names. Additionally, the meanings of the new names were included alongside the generated
names.
The sections for gender classification and the display of name meanings are detailed below.
The final output is shown in Figure 1.4(c).

17
Figure 1.4(c) The Final Result of Burmese Name Generator
2. Burmese Name Gender Detector
Another part of the project, the Burmese name gender detector, is also built using the
traditional method of Recurrent Neural Network (RNN). The construction process follows the steps
outlined below, and the code used to train the model is also shared.
Kaggle: https://www.kaggle.com/code/heinhtetahkarmg/gender-detection-for-burmese-name
2.1. Data Collection
A new dataset has been compiled by adding a gender column (male/female) to the names
from the Burmese Name Generator dataset mentioned above. The updated dataset is available via the
link provided below.
Dataset link: https://www.kaggle.com/datasets/heinhtetahkarmg/burmese-name-with-gender
2.2. Data Preprocessing
First, following the method described in the Burmese Name Generator, the names are
tokenized at the syllable level. The dataset is then split into a training dataset and a testing dataset.
Next, the names in the training dataset, which have been tokenized at the syllable level, are converted
into sequences using the Out of Vocabulary (OOV) method. This allows the sequences to be used for
further training of the model.
2.3. Model Training

18
To build the model, an advanced RNN architecture, LSTM, is used. It includes a Flatten
layer, two Dense layers with a ReLU activation function, and one Dense layer with a Sigmoid
activation function. The Adam optimizer is applied, and binary cross-entropy is used as the loss
function.Next, the model is trained for 20 epochs.
2.4. Result and Evaluation
The model achieved an accuracy of 0.99 and a loss of 0.004 on the training dataset. On the
testing dataset, the accuracy score was 94.8%, precision score was 94.7%, recall score was 94.7%,
and F1 score was 94.7%, as shown in Figure 2.4(a). Next, the evaluation phase continues.
Figure 2.4(a) The result of Testing Dataset
Since the results were promising, the model was saved in the Keras file format. Following
this, the user interface, as previously described, was further enhanced and integrated. The final output
is shown in Figure 2.4(c).
Figure 2.4(c) The Final Result of Gender Detector
3. Burmese Name Romanizer

19
Another part of the project, the Burmese name romanizer, does not require AI technology and
instead relies on a formula-based approach. The construction process follows the steps outlined
below. The code has been uploaded to GitHub, and the link is shared in the Introduction section.
3.1. Data Collection
Since the OOV method was used in the previous part, the resulting character indices (e.g.,
‘ ’
အောင်
, ‘’
စု
, etc.) were saved in a CSV file. This file was updated by adding an “English Romanize”
column to create a new list. An example of this is shown in Figure 3.1(a). The dataset has also been
uploaded to GitHub.
Figure 3.1(a) Updated Dataset

3.2. Data Preprocessing
Firstly, some Burmese names require continuous writing when romanized (e.g., ‘ ’
အာကာ
becomes ‘Arkar’). Therefore, these names that need to be written continuously were categorized as
special names and added to a new list. This updated list is saved in a CSV file, as shown in Figure
3.2(a).

20
Figure 3.2(a) Dataset of Special Names
In Burmese Unicode, some characters have multiple ways of being written (e.g., the word ‘သိ
’
မ့်
can be written with the “ ”
အသတ်
(်) followed by the “ ”
အောက်ကမြစ်
(့), or vice versa). These
variations need to be standardized and converted into a single format. After resolving these
variations, the names are tokenized by syllable and then romanized.
For the romanization process, the dataset from Figure 3.1(a) was rearranged by changing the
columns, and a new dataset was created. Afterward, the names were converted to their romanized
forms.
3.3. Integration
A filter has been added for both Burmese-to-English and English-to-Burmese romanization.
For the Burmese-to-English romanization section, a CSV file can be uploaded, and the romanized
results can be downloaded as a CSV file, allowing users to save it to their own devices.
In the English-to-Burmese romanization section, some names have multiple possible
variations due to similar letters. For example, the name “San” can have multiple romanized outcomes
such as ‘’
စံ
, ‘’
စန်း
, ‘’
စမ်း
, ‘’
စန်
, or ‘’
စမ်
. These possible results are displayed for the user. Additionally,
the formatting for uppercase and lowercase letters is standardized to maintain consistency.Finally,
these features were integrated into the previously described user interface.
4. Burmese Name Meaning Explainer

21
Another part of the project, the Burmese name meaning explainer, does not require AI
technology and instead relies on a formula-based approach. The construction process follows the
steps outlined below. The code has been uploaded to GitHub, and the link is shared in the
Introduction section.
4.1. Data Collection
Since the OOV method was used in the previous part, the resulting character indices (e.g.,
‘ ’
အောင်
, ‘’
စု
, etc.) were saved with their meanings for each individual character. These entries were
stored in a JSON file format and compiled into a list. The JSON file has also been uploaded to
GitHub.
4.2. Data Preprocessing and Integration
First, Burmese names are tokenized by syllable. Some Burmese names do not have individual
meanings for each character but gain their meaning when combined (e.g., ‘ ’
ရတနာ
means ‘Treasure’
when combined). Therefore, a dictionary list was created to map these combined syllables to their
meanings.
Afterward, this functionality was integrated into the user interface mentioned above, allowing
users to view the meanings of these combined syllables.

22
Challanges and Limitions
1. Challanges
The main challenges encountered were as follows:
1.Data Quality: While the dataset was a strength, it also had significant limitations. The names
in the dataset were not clean, containing issues like incorrect fonts, misspelled characters,
inclusion of ethnic names, and outdated names, which led to poor or inaccurate results. As a
result, the data required extensive cleaning before processing.
2.Random Name Generation: When generating random names, the names generated from
randomly selected word indices often resulted in incorrect names. To address this, a sequence
of names longer than three characters was randomly selected from the concatenated
sequence, yielding more accurate and meaningful names.
3.Gender Data Handling: The gender classification (male, female) data had to be processed
one entry at a time, which took additional time.
4.Model Loading and Data Reuse: When loading the saved model for prediction, it became
apparent that some of the data used to train the model was needed again. Consequently, the
necessary data was saved in a .pkl file format and reused as needed.
5.Handling of Myanmar Unicode: Some Myanmar characters have multiple valid ways of
writing, so these needed to be identified and standardized before processing.
6.Romanization from English to Myanmar: The English to Myanmar romanization step
faced challenges where names with similar characters might have multiple possible
interpretations. These cases were displayed with possible variations but were limited in
scope.
7.Meaning of Myanmar Names: Some Myanmar names did not have a meaningful definition
for individual characters but had meanings only when combined. These names needed to be
identified and processed separately.
8.Challenges with Meaning Interpretation: Some Myanmar names either did not have a clear
meaning (simply being aesthetic in nature) or were derived from languages like Pali, English,
or Hindi. This presented difficulties in interpreting their meanings accurately.
These challenges required careful data handling, cleaning, and adjustments to ensure the
accuracy and consistency of results.

23
2. Limitations
The main limitations encountered were as follows:
1.AI’s Nature: AI, by design, works on probability-based predictions, meaning that results are
never 100% accurate. Therefore, some names may not be classified or generated correctly,
especially when dealing with rare or uncommon data.
2.Ethnic and Outdated Names: The inclusion of ethnic and outdated names in the dataset
often led to poor or inaccurate results. These names may not follow common patterns or
conventions, making it harder for the model to generate or classify them correctly.
3.Names with Dual Gender: Some names are used for both male and female, causing
classification issues and leading to inaccurate predictions of gender in those cases.
4.Romanization Limitations: The Romanization process relied on a formula, which means
that names or characters not present in the dataset, especially ethnic names or lesser-known
variants, could not be Romanized correctly. This limits the system’s ability to handle new or
unseen names.
5.Romanization from English to Myanmar: For the English to Myanmar Romanization,
names that shared similar characters or pronunciation may result in multiple possible outputs,
and only a limited set of variations was displayed. This could lead to ambiguity or
inaccuracies in some cases.
6.Meaning Interpretation Limitations: Similarly, when interpreting the meanings of names,
the system used a formula-based approach. As a result, names or words not found in the
dataset couldn’t have their meanings explained accurately. The system was limited to names
present within the dataset and could not handle unseen names or words effectively.
These limitations highlight the challenges when working with AI-based systems, particularly
when they rely on existing data and formulas. While the results may work well for the common or
seen cases, they may fall short for rare or unseen ones.

24
Conclusion
In summary, the field of name linguistics (or onomastics) plays a vital role in understanding
and processing Burmese names. It is an important step toward addressing the challenges in the use of
AI technologies with the Burmese language, where existing resources and tools are often limited.
This work serves as a foundational move towards resolving these issues and opens doors for broader
applications.
Furthermore, the system can offer significant support in various domains, such as in office
management systems or data analysis, where name handling and gender classification are crucial. For
the field of Natural Language Processing (NLP), this work contributes to the development of
Burmese-specific datasets and provides a reference for understanding the unique characteristics of
the Burmese language. It can also be a valuable tool for researchers and students who are studying
Burmese linguistics, offering insights into language processing techniques that are particularly
tailored to the language’s nuances.
Future Work
The outlined next steps for developing Burmese language-based applications sound exciting
and impactful. Expanding beyond RNN to explore other methods, such as Transformers or attention-
based models, could help enhance performance and address various language processing challenges
effectively. If you need help with any of these projects in the future, feel free to reach out!

25
References
7.Myanmar Syllable https://www.slideshare.net/slideshow/myanmar-syllable-
239356703/239356703
8.myWord: Burmese Syllable Segementation Tool by Mr. Ye Kyaw Thu https://github.com/ye-
kyaw-thu/myWord
9.Arabic Name Generator with RNN in Keras by Ouassim Adnane
https://www.kaggle.com/code/ishivinal/arabic-name-generator/notebook
10.Indian Baby Names Generator: Text-Processing+RNN by Meet Ranoliya
https://www.kaggle.com/code/meemr5/indian-baby-names-generator-text-processing-rnn
11.Introduction to Recurrent Neural Networks by geeksforgeeks
https://www.geeksforgeeks.org/introduction-to-recurrent-neural-network/
12.What is LSTM - Long Short Term Memory? By geeksforgeeks
https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/

NAMA BAYDA (နာမဗေဒ) that is an online tool developed for operations related to Myanmar names

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

NAMA BAYDA (နာမဗေဒ) that is an online tool developed for operations related to Myanmar names

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx