Analyzing Textual Data for Fatality Classification in Afghanistan’s Armed Conflicts: A BERT Approach

peacefulquraniqbal 14 views 18 slides Mar 12, 2025
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

ML for Bert Approuch


Slide Content

Analyzing Textual Data for Fatality Classification in Afghanistan’s Armed Conflicts: A BERT Approach Hikmatullah Mohammadi , Ziaullah Momand , Parwin Habibi , Nazifa Ramaki , Bibi Storay Fazli , Sayed Zobair Rohany , and Iqbal Samsoor Faculty of Computer Science Kabul University Kabul, Afghanistan 1 Thursday, October 26, 2023 Event : 14th Joint Symposium on Computational Intelligence (JSCI14) The preprint is available at : https://arxiv.org/abs/2310.08653

Agenda Introduction Background and Rationale Research Questions Research objectives Methodology Data Collection and Preparation Dataset Preparation Model Development and Training Model Validation and Performance Evaluation Q&A 2 Thursday, November 30, 2023

Agenda Introduction Background information Problem statement Research objective Methodology Data preparation EDA Model architecture Model’s Hyperparameters Evaluation metrics Results Discussions and future works Conclusion Q&A 3 Thursday, October 26, 2023

Intro- Background info Afghanistan's history is marked by conflicts and power struggles The US intervened in Afghanistan after the Sep 2001 terrorist attack Withdrawal of American troops from Afghanistan began in 2014 Armed conflicts resulted in high casualties among Afghan security forces Thousands of civilians lost their lives between 2014 and 2019 The Taliban's takeover of Kabul in Aug 2021 led to the collapse of the republican government –a big change Over 1,000 civilian deaths were reported since the Taliban's rise to power . This shows a decrease compared to the previous equivalent period, UN says. Conflicts in Afghanistan pose a fatal threat to human lives All these information is cited in the original paper, https://arxiv.org/abs/2310.08653 Thursday, October 26, 2023 4

Intro- Problem Statement Mind-blowing impacts of armed conflicts on human lives in Afghanistan Lack of research on event fatality analysis in Afghanistan's armed conflicts Thursday, October 26, 2023 5

Intro- Research Objective Analyze diverse textual data to identify factors influencing conflict outcomes Build a robust machine learning model to classify Afghanistan armed conflicts based on their textual descriptions as: Fatal: resulting in casualties Or Non-fatal : not resulting in casualties Thursday, October 26, 2023 6

Methodology- Data P reparation Utilized the Armed Conflict Location & Event Data Project (ACLED) dataset Focused on events occurring from August 2021 to March 2023 Extracted the ‘ notes ’ feature, providing textual descriptions of the events Dropping duplicates resulted in 4752 observations Converted the fatalities feature from continuous to binary Set non-zero fatalities to 1 and the rest to Ensured robust evaluation by dividing the dataset into three sets: 3826 for training, 426 for validation, and 500 for testing Thursday, October 26, 2023 7

Methodology- EDA Table 1. S tatistical descriptions of the length of event descriptions in the dataset Thursday, October 26, 2023 8 Types Min Mean Max Number of characters 72 245.5 920 Number of words 14 40.5 147

Methodology- EDA cont’d . Fig. 1. A bar chart of the top 10 most common words in the fatal events descriptions Fig. 2. A bar chart of the top 10 most common words in the non-fatal events descriptions Thursday, October 26, 2023 9

Methodology- EDA cont’d . Thursday, October 26, 2023 10 Fig. 3. The word cloud of event descriptions in Afghanistan

Methodology- Model architecture Implemented BERT (Bidirectional Encoder Representations from Transformers ). Utilized the "small_bert/bert_en_uncased_L- 4 _H- 512 _A- 8 /2" model and "bert_en_ uncased _preprocess/3" preprocessing block. BERT encodes raw text input into contextualized text embeddings. Employed a single-neuron Dense layer with sigmoid activation for classification, finally. Fig. 4 in the next slide illustrates the process more vividly Thursday, October 26, 2023 11

Methodology- Model architecture cont’d. Thursday, October 26, 2023 12 Fig. 4. The model architecture and flow

Methodology- Model’s Hyperparameters Configured the number of training epochs to be 10. Utilized the AdamW optimizer with Binary Cross Entropy loss for model training. AdamW is a modified variant of the Adam optimizer that separates weight decay from the gradient update. Set the initial learning rate to 3e-5 (0.00003) for the optimization process. Implemented a warm-up strategy, with the warm-up phase covering 10% of the total training steps. Incorporated a Dropout layer with a rate of 0.3 to mitigate overfitting risks . Thursday, October 26, 2023 13

Methodology- Evaluation Metrics Evaluation metrics used include accuracy, precision, recall, and F1 score . Thursday, October 26, 2023 14

Results NO Subsets Accuracy Precision Recall F1 Score 1 Validation Set 98.12% 97.9% 98.73% 98.31% 2 Test Set 98.8% 99.6% 98.05% 98.82% Thursday, October 26, 2023 15 Table 2. Bert-based A fghanistan event fatality classifier’s performance

Discussions & future works Addressed the scarcity of research on event fatality prediction in Afghanistan. The model's performance benefited from BERT's contextual pattern capturing capabilities. Utilized the ACLED dataset ; its diversity and inclusion of various sources ensured the model's generalization. Minimal preprocessing in comparison to previous approaches. Identified significant patterns, like ' Taliban ' prevalence in event descriptions. Model's robustness enables applications in resource allocation, policymaking, and humanitarian aid . Future work : Enhance current approach and research tailored event severity scoring . considering multiple dimensions for assessing event severity. Thursday, October 26, 2023 16

Conclusion Limited research on event fatality prediction in Afghanistan despite ongoing conflicts and severe impact. Developed a machine learning-based text classification approach for accurate prediction of fatality in Afghanistan armed conflicts. Utilized ACLED dataset with comprehensive descriptions of armed conflicts from August 2021 to March 2023 . Leveraged BERT model's ability to capture contextual information effectively. Achieved robust performance with high evaluation metrics on validation and test sets. Minimal preprocessing required as the model accepts raw text for predicting event fatality. Model's strength enables implementation in resource allocation, policymaking, and humanitarian aid efforts. Pioneering effort in event fatality prediction, filling a significant research void in Afghanistan. Insights gained serve as a foundation for future endeavors, focusing on event severity scoring. Thursday, October 26, 2023 17

Thank you! Your questions are valuable and can lead to deeper discussions and new insights. Thursday, October 26, 2023 18