Machine Learning In Information And Communication Technology Proceedings Of Icict 2021 Smit Hiren Kumar Deva Sarma

ljubavsallau 7 views 78 slides May 17, 2025
Slide 1
Slide 1 of 78
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78

About This Presentation

Machine Learning In Information And Communication Technology Proceedings Of Icict 2021 Smit Hiren Kumar Deva Sarma
Machine Learning In Information And Communication Technology Proceedings Of Icict 2021 Smit Hiren Kumar Deva Sarma
Machine Learning In Information And Communication Technology Proceedin...


Slide Content

Machine Learning In Information And
Communication Technology Proceedings Of Icict
2021 Smit Hiren Kumar Deva Sarma download
https://ebookbell.com/product/machine-learning-in-information-
and-communication-technology-proceedings-of-icict-2021-smit-
hiren-kumar-deva-sarma-47288552
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Deployable Machine Learning For Security Defense Second International
Workshop Mlhat 2021 Virtual Event August 15 2021 Proceedings
Communications In Computer And Information Science 1st Ed 2021 Gang
Wang Editor
https://ebookbell.com/product/deployable-machine-learning-for-
security-defense-second-international-workshop-mlhat-2021-virtual-
event-august-15-2021-proceedings-communications-in-computer-and-
information-science-1st-ed-2021-gang-wang-editor-38451846
Android Malware Detection Using Machine Learning Datadriven
Fingerprinting And Threat Intelligence 86 Advances In Information
Security 86 1st Ed 2021 Elmouatez Billah Karbab
https://ebookbell.com/product/android-malware-detection-using-machine-
learning-datadriven-fingerprinting-and-threat-
intelligence-86-advances-in-information-security-86-1st-
ed-2021-elmouatez-billah-karbab-33377426
Android Malware Detection Using Machine Learning Datadriven
Fingerprinting And Threat Intelligence 86 Advances In Information
Security 86 Elmouatez Billah Karbab Mourad Debbabi Abdelouahid Derhab
Djedjiga Mouheb
https://ebookbell.com/product/android-malware-detection-using-machine-
learning-datadriven-fingerprinting-and-threat-
intelligence-86-advances-in-information-security-86-elmouatez-billah-
karbab-mourad-debbabi-abdelouahid-derhab-djedjiga-mouheb-33377428
Artificial Intelligence And Machine Learning 33rd Benelux Conference
On Artificial Intelligence Bnaicbenelearn 2021 Eschsuralzette
Luxembourg In Computer And Information Science 1530 1st Edition Luis A
Leiva
https://ebookbell.com/product/artificial-intelligence-and-machine-
learning-33rd-benelux-conference-on-artificial-intelligence-
bnaicbenelearn-2021-eschsuralzette-luxembourg-in-computer-and-
information-science-1530-1st-edition-luis-a-leiva-37518338

Artificial Intelligence And Machine Learning 32nd Benelux Conference
Bnaicbenelearn 2020 Leiden The Netherlands November 1920 2020 Revised
In Computer And Information Science 1398 1st Edition Mitra Baratchi
https://ebookbell.com/product/artificial-intelligence-and-machine-
learning-32nd-benelux-conference-bnaicbenelearn-2020-leiden-the-
netherlands-november-1920-2020-revised-in-computer-and-information-
science-1398-1st-edition-mitra-baratchi-34627412
Availability Reliability And Security In Information Systems Ifip Wg
84 89 Tc 5 International Crossdomain Conference Cdares 2016 And
Workshop On Privacy Aware Machine Learning For Health Data Science
Paml 2016 Salzburg Austria August 31 S Francesco Buccafurri
https://ebookbell.com/product/availability-reliability-and-security-
in-information-systems-ifip-wg-84-89-tc-5-international-crossdomain-
conference-cdares-2016-and-workshop-on-privacy-aware-machine-learning-
for-health-data-science-paml-2016-salzburg-austria-
august-31-s-francesco-buccafurri-5607748
Handbook Of Research On Machine And Deep Learning Applications For
Cyber Security Advances In Information Security Privacy And Ethics
Padmavathi Ganapathi
https://ebookbell.com/product/handbook-of-research-on-machine-and-
deep-learning-applications-for-cyber-security-advances-in-information-
security-privacy-and-ethics-padmavathi-ganapathi-11069536
Exercises In Applied Mathematics With A View Toward Information Theory
Machine Learning Wavelets And Statistical Physics Daniel Alpay
https://ebookbell.com/product/exercises-in-applied-mathematics-with-a-
view-toward-information-theory-machine-learning-wavelets-and-
statistical-physics-daniel-alpay-57081340
Applications Of Machine Learning And Artificial Intelligence In
Education Seda Khadimally
https://ebookbell.com/product/applications-of-machine-learning-and-
artificial-intelligence-in-education-seda-khadimally-48752580

Lecture Notes in Networks and Systems 498
Hiren Kumar Deva Sarma
Vincenzo Piuri
Arun Kumar Pujari   Editors
Machine
Learning in
Information and
Communication
Technology
Proceedings of ICICT 2021, SMIT

Lecture Notes in Networks and Systems 
Volume  498 
Series Editor 
Janusz  Kacprzyk, Systems  Research  Institute,  Polish  Academy  of  Sciences, 
Warsaw,  Poland 
Advisory Editors 
Fernando  Gomide, Department  of  Computer  Engineering  and  Automation—DCA, 
School  of  Electrical  and  Computer  Engineering—FEEC,  University  of 
Campinas—UNICAMP,  São  Paulo,  Brazil 
Okyay  Kaynak, Department  of  Electrical  and  Electronic  Engineering, 
Bogazici  University,  Istanbul,  Turkey 
Derong  Liu, Department  of  Electrical  and  Computer  Engineering,  University  of 
Illinois  at  Chicago,  Chicago,  USA 
Institute  of  Automation,  Chinese  Academy  of  Sciences,  Beijing,  China 
Witold  Pedrycz, Department  of  Electrical  and  Computer  Engineering,  University  of 
Alberta,  Alberta,  Canada 
Systems  Research  Institute,  Polish  Academy  of  Sciences,  Warsaw,  Poland 
Marios  M.  Polycarpou, Department  of  Electrical  and  Computer  Engineering, 
KIOS  Research  Center  for  Intelligent  Systems  and  Networks,  University  of  Cyprus,  Nicosia,  Cyprus 
Imre  J.  Rudas, Óbuda  University,  Budapest,  Hungary 
Jun  Wang, Department  of  Computer  Science,  City  University  of  Hong  Kong, 
Kowloon,  Hong  Kong

The  series  “Lecture  Notes  in  Networks  and  Systems”  publishes  the  latest 
developments  in  Networks  and  Systems—quickly,  informally  and  with  high  quality. 
Original  research  reported  in  proceedings  and  post-proceedings  represents  the  core 
of  LNNS. 
Volumes  published  in  LNNS  embrace  all  aspects  and  subfields  of,  as  well  as  new 
challenges  in,  Networks  and  Systems. 
The  series  contains  proceedings  and  edited  volumes  in  systems  and  networks, 
spanning  the  areas  of  Cyber-Physical  Systems,  Autonomous  Systems,  Sensor  Networks,  Control  Systems,  Energy  Systems,  Automotive  Systems,  Biological  Systems,  Vehicular  Networking  and  Connected  Vehicles,  Aerospace  Systems, 
Automation,  Manufacturing,  Smart  Grids,  Nonlinear  Systems,  Power  Systems,  Robotics,  Social  Systems,  Economic  Systems  and  other.  Of  particular  value  to  both  the  contributors  and  the  readership  are  the  short  publication  timeframe  and 
the  world-wide  distribution  and  exposure  which  enable  both  a  wide  and  rapid  dissemination  of  research  output. 
The  series  covers  the  theory,  applications,  and  perspectives  on  the  state  of  the  art 
and  future  developments  relevant  to  systems  and  networks,  decision  making,  control,  complex  processes  and  related  areas,  as  embedded  in  the  fields  of  interdisciplinary 
and  applied  sciences,  engineering,  computer  science,  physics,  economics,  social,  and  life  sciences,  as  well  as  the  paradigms  and  methodologies  behind  them. 
Indexed  by  SCOPUS,  INSPEC,  WTI  Frankfurt  eG,  zbMATH,  SCImago. 
All  books  published  in  the  series  are  submitted  for  consideration  in  Web  of  Science. 
For  proposals  from  Asia  please  contact  Aninda  Bose  ([email protected]).

Hiren  Kumar  Deva  Sarma 
Arun  Kumar  Pujari 
Editors 
Machine Learning 
in Information 
and Communication 
Technology 
Proceedings  of  ICICT  2021,  SMIT

Editors
Hiren  Kumar  Deva  Sarma 
Department  of  Information  Technology 
Gauhati  University 
Jalukbari,  Assam,  India 
Arun  Kumar  Pujari 
Department  of  Computer  Science  and  Engineering  Mahindra  University  Hyderabad  Hyderabad,  India 
Vincenzo  Piuri 
University  of  Milan  Milan,  Italy 
ISSN  2367-3370 ISSN  2367-3389  (electronic) 
Lecture  Notes  in  Networks  and  Systems 
ISBN  978-981-19-5089-6 ISBN  978-981-19-5090-2  (eBook) 
https://doi.org/10.1007/978-981-19-5090-2 
©  The  Editor(s)  (if  applicable)  and  The  Author(s),  under  exclusive  license  to  Springer  Nature 
Singapore  Pte  Ltd.  2023 
This  work  is  subject  to  copyright.  All  rights  are  solely  and  exclusively  licensed  by  the  Publisher,  whether  the  whole  or  part  of  the  material  is  concerned,  specifically  the  rights  of  translation,  reprinting,  reuse 
of  illustrations,  recitation,  broadcasting,  reproduction  on  microfilms  or  in  any  other  physical  way,  and  transmission  or  information  storage  and  retrieval,  electronic  adaptation,  computer  software,  or  by  similar  or  dissimilar  methodology  now  known  or  hereafter  developed. 
The  use  of  general  descriptive  names,  registered  names,  trademarks,  service  marks,  etc.  in  this  publication 
does  not  imply,  even  in  the  absence  of  a  specific  statement,  that  such  names  are  exempt  from  the  relevant 
protective  laws  and  regulations  and  therefore  free  for  general  use. 
The  publisher,  the  authors,  and  the  editors  are  safe  to  assume  that  the  advice  and  information  in  this  book  are  believed  to  be  true  and  accurate  at  the  date  of  publication.  Neither  the  publisher  nor  the  authors  or 
the  editors  give  a  warranty,  expressed  or  implied,  with  respect  to  the  material  contained  herein  or  for  any  errors  or  omissions  that  may  have  been  made.  The  publisher  remains  neutral  with  regard  to  jurisdictional  claims  in  published  maps  and  institutional  affiliations. 
This  Springer  imprint  is  published  by  the  registered  company  Springer  Nature  Singapore  Pte  Ltd.  The  registered  company  address  is:  152  Beach  Road,  #21-01/04  Gateway  East,  Singapore  189721,  Singapore

Preface 
Information  and  communication  technologies  have  grown  up  significantly  and  pene-
trated  in  almost  all  dimensions  of  human  society.  There  has  been  tremendous  growth 
in  all  the  areas  of  information  technology  including  Information  gathering,  informa-
tion  processing,  information  sharing,  information  distribution,  information  retrieval, 
information  security,  and  so  on.  Internet  has  revolutionized  the  human  society  and 
opened  up  many  new  dimensions  that  include  information  sharing  models,  elec-
tronic  commerce,  and  also  new  economic  models  altogether.  Although  artificial 
intelligence  was  conceptualized  long  time  back,  at  present  time,  there  has  been  a 
tendency  to  achieve  wide  meaningful  practice  of  artificial  intelligence  in  all  major 
domains  of  knowledge,  even  apart  from  information  technology-based  applications. 
Machine  learning  (ML)  algorithms  are  playing  important  role  in  this  aspect.  Overall, 
health  care  is  such  a  domain  of  knowledge  at  present,  in  which  ML  algorithms  are 
being  widely  applied  for  better  processing  of  healthcare  data  in  order  to  make  mean-
ingful  decisions.  Penetration  of  information  technology  in  medical  and  healthcare 
domain  is  worth  noticing,  which  is  expected  to  help  health  professionals  significantly 
in  making  wise  and  better  decisions.  Apart  from  healthcare  data-related  studies, 
application  of  ML  algorithms  is  expanding  and  influencing  all  other  domains  of 
information  and  communication  technologies,  in  general.  Communication  technolo-
gies  are  getting  enhanced  significantly  and  modern  communication  paradigms  are 
emerging  and  evolving.  In  the  area  of  communication,  optimization  in  terms  of 
energy  expenditure,  overall  cost  of  communication  and  even  life  span  of  communi-
cation  systems  has  always  been  an  issue  of  great  significance.  In  recent  time,  it  has 
been  witnessed  that  there  is  a  growing  interest  in  applying  ML  algorithms  in  various 
parts  of  communication  systems  in  search  of  better  solutions. 
Therefore,  in  these  book,  effort  has  been  made  to  see  the  three  major  contemporary 
areas  of  knowledge,  namely  information  technology,  communication  technology  and  machine  learning,  in  a  connected  manner.  The  aim  of  the  book  is  to  share  the  latest 
research  happenings  in  applying  ML  algorithms  in  different  information  technology- based  applications  as  well  as  in  communication  technologies.
v

vi
There  are  31  research  articles  in  this  book.  These  are  the  research  papers  presented 
during  the  first  International  Conference  on  Information  and  Communication  Tech-
nology  (ICICT)  2021  held  at  Sikkim  Manipal  Institute  of  Technology  (SMIT)  during 
23–24  December  2021  organized  by  the  Department  of  Information  Technology.  The 
articles  are  broadly  categorized  into  the  following:  healthcare  informatics,  recom-
mendation  systems,  communication  networks,  assistive  technology,  social  networks, 
image  and  video  processing,  cybersecurity,  and  miscellaneous  applications. 
We  are  thankful  to  all  the  authors  of  ICICT  2021.  We  sincerely  thank  SMIT  for 
extending  all  the  support  in  organizing  ICICT  2021.  We  are  highly  grateful  to  Prof.  Janusz  Kacprzyk  for  his  guidance  and  encouragement  throughout.  We  sincerely  thank 
the  editorial  team  of  Springer  Nature  for  their  great  support  that  we  have  received.  We  sincerely  thank  Springer  Nature  for  being  our  publication  partner  and  for  all  the  help  and  great  support  that  have  been  extended  to  us  during  the  making  of  this  book. 
Jalukbari,  India 
Milan,  Italy 
Hyderabad,  India 
Hiren  Kumar  Deva  Sarma 
Vincenjo  Piuri 
Arun  Kumar  Pujari

Contents 
Healthcare Informatics 
Earth Mover’s Distance-Based Automated Disease Tagging 
of Indian ECGs...................................................3 
Burhan  Basha,  Dhruva  Nandi,  Karuna  Nidhi  Kaur, 
Priyadarshini  Arambam,  Shikhar  Gupta,  Mehak  Segan,  Priya  Ranjan, 
Upendra  Kaul,  and  Rajiv  Janardhanan 
Cardial Disease Prediction in Multi-variant Systems Using 
MT-MrSBC Model................................................21 
Pandiyan  Nandakumar  and  Subhashini  Narayan 
SVM-based  Pre- and  Post-treatment  Cancer 
Segmentation  from  Lung  and  Abdominal  CT  Images 
via Neighborhood-Influenced Features..............................35 
Tiyasa  Chakraborty,  Ashok  Kumar  Bhadra,  and  Debashis  Nandi 
Blood Cancer Detection with Microscopic Images Using Machine 
Learning.........................................................45 
Christo  Ananth,  P.  Tamilselvi,  S.  Agnes  Joshy,  and  T.  Ananth  Kumar 
A Survey on Machine Learning-Based Approaches for Leukaemia 
Detection.........................................................55 
Leena  I.  Sakri  and  Rajeshwari  V.  Patil 
Periocular Region Recognition—A Brief Survey......................63 
R.  Sheela  and  R.  Suchithra 
A Bibliometric Analysis on the Relationship Between Emotional 
Intelligence, Self-Management and Health Information Seeking.......77 
Jennifer  Gurung,  Vivek  Pandey,  Samrat  Kumar  Mukherjee, 
Saibal  Kumar  Saha,  Ankit  Singh,  and  Ajeya  Jha
vii

viii
Preferences in the Detailing Process Among Young and Senior 
Physicians........................................................89 
Saibal  Kumar  Saha,  Ankita  Sarangi,  Sonia  Munjal,  Piyanka  Dhar, 
and  Ajeya  Jha 
Non-cognitive Differences on Social Media Branded Drugs 
Promotion: Study of Indian Patients and Physicians..................97 
Samrat  Kumar  Mukherjee,  Jitendra  Kumar,  Jaya  Rani  Pandey, 
Vivek  Chhetri,  and  Ajeya  Jha 
AUTCD-Net: An Automated Framework for Efficient Covid-19 
Diagnosis on Computed Tomography Scans..........................109 
Palash  Ghosal,  Amish  Kumar,  Soumya  Snigdha  Kundu, 
Utkarsh  Prakash  Srivastava,  Ashis  Datta,  and  Hiren  Kumar  Deva  Sarma 
Computer-Aided Detection of Brain Midline Using CT Images.........117 
Palash  Ghosal,  Amish  Kumar,  Ashis  Datta,  Hiren  Kumar  Deva  Sarma, 
and  Debashis  Nandi 
Deep  Neural  Network-Based  Classification  of  ASD 
and Neurotypical Subjects Using Functional Connectivity 
Features Derived from Resting-State fMRI Data.....................125 
Nirmal  Rai,  P.  C.  Pradhan,  Hemanta  Saikia,  O.  P.  Singh,  and  Rinkila  Bhutia 
Application of Deep Learning in Healthcare..........................131 
Aryan  Shahi,  Chandralika  Chakraborty,  Shubhodeep  Ghosh,  and  Ankit  Anand 
Recommendation Systems 
A Popularity-Based Recommendation System Using Machine 
Learning.........................................................143 
Pranati  Rakshit,  Sougata  Saha,  Arindam  Chatterjee,  Subhayan  Mistri, 
Swagata  Das,  and  Gunjan  Dhar 
Machine Learning-Based Movies and Shows Recommendation 
System...........................................................151 
Sanmoy  Dev  Purkayastha,  Suraj  Kumar,  Ashish  Saha,  and  Saumya  Das 
Communication Networks 
A Survey on Application of LSTM as a Deep Learning Approach 
in Traffic Classification for SDN....................................161 
Prerna  Rai  and  Hiren  Kumar  Deva  Sarma

Contents
Assistive Technology 
A Novel Low-cost Visual Aid System for the Completely Blind 
People............................................................177 
Christo  Ananth,  M.  Kameswari,  R.  Srinivasan,  S.  Surya, 
and  T.  Ananth  Kumar 
Social Networks 
Twitter Sentiment Analysis Using Machine Learning Techniques: 
A Case Study Based on Farmers Protest.............................187 
C.  Sahithi,  Y.  Sreeja,  S.  Akhil,  K.  Taruni,  and  C.  C.  Sobin 
Sentiment Analysis to Find Sentence Polarity on Tweet Data...........197 
Pranati  Rakshit,  Sumit  Gupta,  and  Tarpan  Das 
Impact of Emotional Support and Medical Adherence on Social 
Media Branded Drug Promotion....................................203 
Samrat  Kumar  Mukherjee,  Jitendra  Kumar,  Ajeya  Jha, 
Jaya  Rani  Pandey,  and  Saibal  Kumar  Saha 
Image and Video Processing 
A Survey on Video Description and Summarization Using Deep 
Learning-Based Methods...........................................213 
Pranati  Rakshit,  Anuj  Kumar,  and  Amlan  Chakraborty 
Aerial Image Classification Using Convolution Neural Network........225 
Praveen  Kumar  Pradhan  and  Udayan  Baruah 
Image Retrieval Using Neural Networks for Word Image 
Spotting—A Review...............................................243 
Naiwrita  Borah  and  Udayan  Baruah 
Cybersecurity 
Priority-Based Mitigation in Education Sector using Machine 
Learning.........................................................271 
Sonal  Shukla  and  Anand  Sharma 
Miscellaneous Applications 
Prediction and Analysis of Air Quality Index Using Machine 
Learning Algorithms..............................................281 
Avishek  Choudhuri,  R.  Sujatha,  Chhazed  Shreyans  Nitin,  Jyotir  Moy  Chatterjee,  and  R.  N.  Thakur 
A Short Overview on Various Bio-Inspired Algorithms................295 
K.  Boopalan,  C.  Shanmuganathan,  K.  Lokeshwaran,  and  T.  Balaji

x
Gesture-Based Drawing Application: A Survey.......................303 
Leena  I.  Sakri,  Vijeta  Kerur,  Gautam  Shet,  Sohail  Mokashi, 
and  Abhishek  Patki 
Simulation and Modeling of Electrical Load Data Using Machine 
Learning.........................................................311 
Manish  Kumar  and  Nitai  Pal 
Analysis of Single Missing Gate Faults in Quantum Circuit............317 
Shubhrojit  Paul,  Mousum  Handique,  and  Hiren  Kumar  Deva  Sarma 
Experimental Validation of Frequency Scaling Over Indian Hilly 
Region...........................................................327 
Badichapta  Deka  Baro  and  Swastika  Chakraborty 
School Uniform Identification Using Deep Learning Based 
Approach.........................................................333 
Ashis  Datta,  Sanju  Kumar  Giri,  Vibhuti  Sharma,  Anushka  Das, 
and  Joyashri  Basak 
Author Index......................................................341

Editors and Contributors 
About the Editors 
Prof. Hiren Kumar Deva Sarma 
Technology,  Gauhati  University,  Assam,  India.  He  received  Bachelor  of  Engineering 
in  Mechanical  Engineering  from  Assam  Engineering  College,  Guwahati,  Assam 
(1998).  He  completed  Master  of  Technology  in  Information  Technology  from  Tezpur 
University,  Assam  (2000).  He  received  Doctor  of  Philosophy  (in  Computer  Science 
and  Engineering)  from  Jadavpur  University,  West  Bengal  (2013).  He  has  co-authored 
three  books,  edited  four  book  volumes,  and  published  more  than  eighty  research 
papers  in  different  International  Journals  and  refereed  International  and  National 
Conferences  of  repute.  He  is  the  recipient  of  the  Young  Scientist  Award  from  Inter-
national  Union  of  Radio  Science  (URSI)  in  the  XVIII  General  Assembly  2005,  held 
at  New  Delhi,  India  and  has  received  IEEE  Early  Adopter  Award  in  2014.  His  current 
research  interests  are  networks,  network  security,  robotics  and  big  data  analytics. 
Prof. Vincenzo Piuri  guished  Scientist.  He  is  known  for  his  work  in  the  field  of  information  processing,  with  specific  focus  on  artificial  intelligence,  computational  intelligence,  signal/image 
processing,  biometrics,  industrial  applications,  measurement  systems,  arithmetic  units  and  fault-tolerant  architectures.  He  received  his  M.S.  and  Ph.D.  in  Computer  Engineering  from  Politecnico  di  Milano,  Italy.  He  is  Full  Professor  at  the  Univer-
sity  of  Milan,  Italy  (since  2000),  where  he  was  also  Department  Chair  (2007–2012).  He  was  Associate  Professor  at  Politecnico  di  Milano,  Italy  (1992–2000),  Visiting  Professor  at  the  University  of  Texas  at  Austin,  USA  (summers  1996–1999),  and 
visiting  researcher  at  George  Mason  University,  USA  (summers  2012–2019).  He  founded  a  start-up  company,  Sensure  srl,  in  the  area  of  intelligent  systems  for  indus- trial  applications  (leading  it  from  2007  until  2010)  and  was  active  in  industrial 
research  projects  with  several  companies.
xi

xii Editors and Contributors
Prof. Arun Kumar Pujari 
and  Information  Sciences  at  the  University  of  Hyderabad  (UoH).  He  has  been  Vice-
Chancellor  of  the  Central  University  of  Rajasthan  (2015–2020)  and  Vice-Chancellor 
of  Sambalpur  University,  Odisha  (20082011).  He  joined  the  UoH  as  Reader  in  1985 
and  became  a  professor  in  1990.  Prior  to  joining  University  of  Hyderabad,  he  served 
JNU,  New  Delhi.  He  completed  his  postgraduation  from  the  Sambalpur  University 
in  1974  and  got  his  Ph.D.  from  IIT-Kanpur  in  1980.  He  has  more  than  25  years’ 
experience  as  Vice-Chancellor,  Dean,  Head  of  Department,  and  other  administrative 
positions.  He  has  served  several  high-level  bodies  of  UGC,  DST,  DRDO,  ISRO  and 
AICTE.  He  has  wide  exposure  in  the  national  and  international  arena,  and  he  has 
been  invited  to  University  of  Tokyo,  Japan,  University  of  Paris-Sud,  France,  Griffith 
University,  Australia,  Memphis  University,  USA  and  IIST,  United  Nations  Univer-
sity,  Macau  on  different  visiting  assignments.  He  has  more  than  100  publications  to 
his  credit  and  his  book,  Data  Mining  Techniques,  was  a  popular  textbook  adopted 
by  several  universities  in  India  and  abroad. 
Contributors 
Akhil S. 
Anand Ankit 
Ananth Christo 
University,  Samarkand,  Uzbekistan 
Arambam Priyadarshini 
Noida,  India 
Balaji T.  Tamil  Nadu,  India 
Baro Badichapta Deka  nology,  Majitar,  East  Sikkim,  India 
Baruah Udayan  of  Technology,  Sikkim  Manipal  University,  Majitar,  East  Sikkim,  India;  Department  of  Information  Technology,  Sikkim  Manipal  Institute  of  Technology, 
Sikkim  Manipal  University,  Rangpo,  Sikkim,  India 
Basak Joyashri 
Bengal,  India 
Basha Burhan  School  of  Engineering  and  Applied  Sciences,  SRM  University,  Amaravati,  Andhra 
Pradesh,  India 
Bhadra Ashok Kumar 

Editors and Contributors xiii
Bhutia Rinkila 
Boopalan K. 
Andhra  Pradesh,  India 
Borah Naiwrita  of  Technology,  Sikkim  Manipal  University,  Rangpo,  Sikkim,  India 
Chakraborty Amlan 
sity  of  Calcutta,  Kolkata,  India 
Chakraborty Chandralika  Majhitar,  India 
Chakraborty Swastika  nology,  Majitar,  East  Sikkim,  India 
Chakraborty Tiyasa 
Institute  of  Technology  Durgapur,  Durgapur,  India 
Chatterjee Arindam  College  of  Engineering,  Kalyani,  India 
Chatterjee Jyotir Moy 
Chhetri Vivek 
Technology,  Sikkim  Manipal  University,  Gangtok,  India 
Choudhuri Avishek 
Das Anushka 
Sikkim,  India 
Das Saumya  Technology,  Sikkim  Manipal  University,  Sikkim,  India 
Das Swagata  Engineering,  Kalyani,  India 
Das Tarpan 
Engineering,  Kalyani,  India 
Datta Ashis  Sikkim  Manipal  Institute  of  Technology,  Majitar,  Rangpo,  East  Sikkim,  India 
Dhar Gunjan  Engineering,  Kalyani,  India 
Dhar Piyanka 
sity,  Gangtok,  Sikkim,  India 
Ghosal Palash 

xiv Editors and Contributors
Ghosh Shubhodeep 
India 
Giri Sanju Kumar 
Sikkim,  India 
Gupta Shikhar 
Gupta Sumit 
Engineering,  Kalyani,  India 
Gurung Jennifer  sity,  Gangtok,  India 
Handique Mousum 
Janardhanan Rajiv 
Health  Sciences,  SRM  Institute  of  Science  &  Technology,  Chennai,  India 
Jha Ajeya  Technology,  Sikkim  Manipal  University,  Gangtok,  Sikkim,  India 
Joshy S. Agnes  India 
Kameswari M. 
Kalasalingam  Academy  of  Research  and  Education,  Srivilliputhur,  Tamilnadu,  India 
Kaul Upendra  Medical  Research  Centre,  New  Delhi,  India 
Kaur Karuna Nidhi 
ology,  Amity  Institute  of  Public  Health,  Noida,  India 
Kerur Vijeta  of  Engineering  and  Technology,  Dharwad,  Karnataka,  India 
Kumar Anuj  Bengaluru,  India 
Kumar Jitendra  Technology,  Sikkim  Manipal  University,  Gangtok,  India 
Kumar Manish  Mines),  Dhanbad,  India;  Department  of  Nuclear  Science  and  Technology,  Pandit  Deendayal  Energy  Univer-
sity,  Gandhinagar,  India 
Kumar Suraj 
Technology,  Sikkim  Manipal  University,  Sikkim,  India 
Kumar T. Ananth 
College  of  Engineering,  Villupuram,  India

Editors and Contributors xv
Kundu Soumya Snigdha 
lathur,  India 
Lokeshwaran K. 
Melvisharam,  Tamil  Nadu,  India 
Mistri Subhayan  of  Engineering,  Kalyani,  India 
Mokashi Sohail  College  of  Engineering  and  Technology,  Dharwad,  Karnataka,  India 
Mukherjee Samrat Kumar  Institute  of  Technology,  Sikkim  Manipal  University,  Gangtok,  India 
Munjal Sonia  Uttarakhand,  India 
Nandakumar Pandiyan 
Vellore  Institute  of  Technology,  Vellore,  Tamil  Nadu,  India 
Nandi Debashis  Institute  of  Technology  Durgapur,  Durgapur,  India 
Nandi Dhruva  Sciences,  SRM  Institute  of  Science  &  Technology,  Chennai,  India 
Narayan Subhashini 
Institute  of  Technology,  Vellore,  Tamil  Nadu,  India 
Nitin Chhazed Shreyans 
Pal Nitai 
Dhanbad,  India 
Pandey Jaya Rani  of  Technology,  Sikkim  Manipal  University,  Gangtok,  India 
Pandey Vivek  Gangtok,  India 
Patil Rajeshwari V. 
Karnataka,  India 
Patki Abhishek  College  of  Engineering  and  Technology,  Dharwad,  Karnataka,  India 
Paul Shubhrojit 
Pradhan P. C. 
India 
Pradhan Praveen Kumar  Chisopani,  South  Sikkim,  India

xvi Editors and Contributors
Purkayastha Sanmoy Dev 
Manipal  Institute  of  Technology,  Sikkim  Manipal  University,  Sikkim,  India 
Rai Nirmal 
Rai Prerna 
Rakshit Pranati 
of  Engineering,  Kalyani,  India 
Ranjan Priya 
Saha Ashish  Technology,  Sikkim  Manipal  University,  Sikkim,  India 
Saha Saibal Kumar 
of  Technology,  Sikkim  Manipal  University,  Gangtok,  Sikkim,  India 
Saha Sougata  Engineering,  Kalyani,  India 
Sahithi C. 
Saikia Hemanta 
India 
Sakri Leena I.  of  Engineering  and  Technology,  Dharwad,  Karnataka,  India 
Sarangi Ankita 
sity,  Gangtok,  Sikkim,  India 
Deva Sarma Hiren Kumar  Rangpo,  East  Sikkim,  Sikkim,  India 
Segan Mehak  Amity  Institute  of  Public  Health,  Noida,  India 
Shahi Aryan 
Shanmuganathan C. 
Chennai,  India 
Sharma Anand 
Sharma Vibhuti  Sikkim,  India 
Sheela R.  (Deemed-to-be  University),  Bangalore,  India 
Shet Gautam  of  Engineering  and  Technology,  Dharwad,  Karnataka,  India 
Shukla Sonal 

Editors and Contributors xvii
Singh Ankit 
Singh O. P. 
Sobin C. C. 
Sreeja Y. 
Srinivasan R. 
and  Technology,  Kattankulathur,  India 
Srivastava Utkarsh Prakash  Sikkim,  India 
Suchithra R. 
(Deemed-to-be  University),  Bangalore,  India 
Sujatha R. 
Surya S. 
Academy  of  Research  and  Education,  Krishnan  Koil,  Srivilliputtur,  Tamilnadu,  India 
Tamilselvi P.  (Deemed  to  Be)  University,  Bangalore,  India 
Taruni K. 
Thakur R. N. 

Healthcare Informatics

Earth Mover’s Distance-Based
Automated Disease Tagging of Indian
ECGs
Burhan Basha, Dhruva Nandi, Karuna Nidhi Kaur,
Priyadarshini Arambam, Shikhar Gupta, Mehak Segan, Priya Ranjan,
Upendra Kaul, and Rajiv Janardhanan
Abstract Ours  is  the  era  of  cardiovascular  disorders.  In  this  work,  a  corpus  of  ECGs 
collected  in  the  state  of  Jammu  and  Kashmir  is  studied  for  automated  perceptual 
similarity-based  disease  tagging.  Following  on  our  earlier  work,  we  have  deployed 
Earth  Mover’s  distance  (EMD)  as  the  similarity  metric  to  generate  automated  disease 
tags.  Rationale  for  generating  these  tags  is  based  on  the  similarity  of  test  ECG  with
B.  Basha 
Department  of  Electronics  and  Communication  Engineering,  School  of  Engineering  and  Applied 
Sciences, 
SRM  University,  Amaravati,  Andhra  Pradesh,  India 
e-mail: [email protected] 
D.  Nandi · R.  Janardhanan 
Department  of  Medical  Research,  Faculty  of  Medical  and  Health  Sciences,  SRM  Institute  of 
Science 
&  Technology,  Kattankulathur,  Chennai,  India 
e-mail: [email protected] 
R.  Janardhanan 
e-mail: [email protected] 
K.  N.  Kaur · M.  Segan 
Laboratory  of  Disease  Dynamics  and  Molecular  Epidemiology,  Amity  Institute  of  Public  Health, 
Noida, 
India 
e-mail: [email protected] 
M.  Segan 
e-mail: [email protected] 
P.  A r a m b a m · S.  Gupta 
Amity  Institute  of  Public  Health,  Amity  University,  Noida,  India 
e-mail: [email protected] 
S.  Gupta 
e-mail: [email protected] 
P.  R a n j a n (
B) 
ECE,  Bhubaneshwar  Institute  of  Technology,  Info  Valley,  Harapur,  Odisha,  India 
e-mail: [email protected] 
U.  Kaul 
Batra  Heart  Centre,  Academics  and  Research,  Batra  Hospital  and  Medical  Research  Centre,  New 
Delhi, 
India 
e-mail: [email protected] 
©  The  Author(s),  under  exclusive  license  to  Springer  Nature  Singapore  Pte  Ltd.  2023 
H. 
K.  Deva  Sarma  et  al.  (eds.), Machine Learning in Information and Communication
Technology,  Lecture  Notes  in  Networks  and  Systems  498, 
https://doi.org/10.1007/978-981-19-5090-2_1 
3

4
healthy  and  unhealthy  ECGs.  If  the  test  ECG  resembles  representative  healthy  ECG, 
then  it  is  tagged  as  healthy,  and  if  it  resembles  representative  unhealthy  ECG,  then 
it  is  tagged  as  unhealthy.  Future  directions  for  increasing  the  accuracy  of  this  work 
are  discussed.  It’s  integration  with  biomarkers  in  a  multi-sensor  data  fusion-based 
automated  CVD  tagging  criteria  is  also  explored. 
Keywords ·Electrocardiogram ·Disease  tagging ·
Cardiovascular  disease ·Rural  health 
1 Introduction
Electrocardiogram  (ECG)  is  a  noninvasive  diagnostic  modality  that  has  a  substantial  clinical  impact  on  investigating  the  severity  of  cardiovascular  diseases  (CVD)  [
1]. 
The  ECG  is  an  important  screening  tool  that  offers  practitioners  a  wealth  of  informa-
tion  that  can  be  used  alongside  the  history  and  clinical  findings  [2].  ECGs  are  pivotal 
in  the  diagnosis  of  cardiac  ischaemia  and  infarction  which  provides  the  evidence  for  pacemaker  implantation,  and  detect  inherited  abnormalities  such  as  cardiomyopathy 
and  long-QT  syndrome  [
3].  ECGs  are  also  useful  in  detecting  non-cardiac  pathology, 
for  example,  pulmonary  emboli  and  electrolyte  disorders.  In  accordance  with  the  World  Health  Organization  (WHO),  India  accounts  for  one-fifth  of  CVD  associ-
ated  mortalities  worldwide  especially  in  younger  population  [
4].  Global  Burden  of 
Disease  study  has  also  stated  the  age-standardized  cardiovascular  disease  death  rate  of  272  per  100,000  population  in  India  which  is  much  higher  than  that  of  global 
average  of  235  [
5].  This  pattern  is  uniform  throughout  the  country  despite  wide  vari-
ation  in  risk  factors.  India  faces  a  great  challenge  in  providing  quality  health  care  especially  in  rural  domains  due  to  lack  of  resources  and  trained  healthcare  providers. 
The  lack  of  resources  for  triaging  or  stratification  of  patients  has  led  to  worsening  of 
the  prognosis  of  patients.  Moreover,  the  scarcity  of  super-specialized  cardiologists 
significantly  impacts  the  clinical  prognosis  of  the  patients  intensifying  the  cardiovas-
cular  disease  burden.  The  hospital-based  registries  have  neither  been  able  to  provide 
accurate  estimates  of  the  CVD  burden  nor  identify  the  disease  drivers  for  the  CVD 
epidemic  despite  it  being  the  largest  cause  of  mortality.  India  is  the  world’s  sixth-
largest  country  by  land  area  and  the  world’s  second-largest  country  by  population 
density.  The  country’s  citizens  come  from  a  wide  range  of  socioeconomic,  linguistic, 
cultural,  and  ethnic  origins.  The  Indian  subcontinent  has  historically  served  as  a 
conduit  for  several  migratory  waves  originating  in  Africa,  both  on  land  and  along 
the  coast  [
6].  In  mainland  India,  genetic  research  has  shown  four  different  ances-
tral  populations,  as  well  as  a  separate  lineage  in  the  Andaman  and  Nicobar  Islands  [
7].  As  a  melting  pot  of  genetic  variety,  India  also  has  strict  inbreeding  practices 
and  founder  effects,  resulting  in  the  accumulation  of  adverse  genetic  variants  [8]. 
India  has  a  reported  birth  defect  rate  of  64.4  per  1000  live  births  [9].  Independent 
research  [10–13]  has  underlined  India’s  significant  genetic  load.  Because  there  was 
no  national  newborn  screening  programme  until  recently,  a  large  proportion  of  the

Earth Mover’s Distance-Based Automated … 5
Indian  population  was  affected  by  genetic  illnesses  [14].  The  genetic  diversity  of 
India  is  well  reflected  in  mitochondrial  DNA  (mtDNA),  Y  chromosomes,  and  candi-
date  genes/markers,  which  have  offered  a  reasonable  knowledge  of  their  related-
ness  and  divergence  [
15–23].  Due  to  cultural  and  social  traditions,  consanguinity  in 
marriage  is  common  in  several  subpopulations  in  India,  resulting  in  the  accumula- tion  of  genetic  characteristics  within  groups  [
7, 24].  According  to  the  studies,  there 
is  a  significant  level  of  relatedness  among  subgroups,  implying  the  accumulation 
of  detrimental  variants  [25, 26].  The  Indian  government  took  a  national  genome-
wide  strategy  to  analyse  population  architecture  and  hunt  for  markers  unique  to  the 
Indian  subcontinent.  The  Indian  Genome  Variation  (IGV)  group  typed  900  genes 
from  approximately  1800  people  across  55  endogamous  populations  using  single 
nucleotide  polymorphisms  (SNPs).  The  heterogeneity  of  subpopulations  was  high-
lighted  by  high  heterozygosity  values,  varied  allele  frequencies,  and  frequent  poly-
morphic  haplotypes  within  the  Indian  community.  In  addition,  novel  mutations  with 
concurrent  founder  effects  have  been  observed  in  the  subcontinent  [
27, 28].  The  IGV 
consortium’s  discoveries  have  led  to  the  discovery  of  markers  and  a  better  knowledge 
of  genotype–phenotype  relationships  across  the  Indian  subcontinent.  Susceptibility 
or  resistance  to  Plasmodium  falciparum  [29–33],  risk  of  acquiring  glaucoma  [34], 
homocysteine  levels  [35],  risk  of  getting  high-altitude  pulmonary  oedema  [36, 37], 
and  various  CVDs  are  only  a  few  instances  of  phenotypically  unique  results  of  sub- population  specific  genotypes.  As  a  result  of  the  substantial  genetic  variation  and 
endogamous  cultural  practices,  it  is  evident  that  genetic  affinities  and  differences  between  subpopulations  must  be  defined.  These  findings  also  highlight  the  genetic  differences  between  Indian  and  other  groups,  highlighting  the  dangers  of  imputation 
of  genetic  data  from  other  populations.  In  clinical  contexts,  a  generalization  of  the  population  architecture  might  obviously  lead  to  erroneous  conclusions.  Many  genetic  variants  that  predispose  people  to  CVDs  have  been  discovered  in  the  era  of  contempo-
rary  genomics  [
38].  Furthermore,  single  nucleotide  polymorphisms  (SNPs)  in  genes 
that  are  associated  with  a  favourable  response  to  CVD  medications  have  been  antic- ipated  by  these  genome-wide  association  analyses.  SNPs  associated  with  adverse 
drug-induced  toxicity  in  CVD  patients  have  also  been  implicated  in  such  investiga- tions.  These  findings,  taken  combined,  provide  a  genetic  foundation  for  stratifying  individuals  who  are  likely  to  benefit  from  aggressive  disease  care  by  starting  them 
on  pharmacological  regimens  tailored  to  their  genetic  variants.  Cardiomyopathy  is  claimed  to  be  more  prevalent  in  the  South  Asian  population  than  in  the  Western  population  [
39].  Polymorphisms  in  the  genes  that  code  for  myosin-binding  protein 
C  (MYBPC)  and  beta  myosin  heavy  chain  can  cause  cardiomyopathy  by  impairing  the  structural  integrity  and  growth  of  the  heart  muscle.  In  the  Indian  population,  only 
a  few  studies  have  looked  at  the  existence  of  polymorphisms  in  genes  linked  to  CVD  risk  [
39–42].  Even  fewer  studies  have  included  the  Indian  population,  where  CVD 
mortality  is  said  to  be  greater  than  in  other  parts  of  the  nation  (>300  deaths  per  100,000 
per  year)  [43].  The  Indian  demographics,  being  the  most  diverse,  ethnicity  holds  a 
strong  position  in  determining  the  burden  of  CVDs.  In  considering  Indian  ethnic 
identity  and  health,  genetic  basis  may  be  relevant  in  two  broad  senses.  First,  the  gene 
pools  of  different  ethnic  groups  of  different  sections  of  India  may  contain  different

6
frequencies  of  alleles  at  some  loci  that  are  pertinent  to  health  status  or  to  disease 
processes.  Second,  the  phenotype  consequent  on  a  given  genotype  may  vary  between 
ethnic  groups  because  of  interactions  with  environmental  factors  which  covers  all  the 
factors  including  prenatal  effects,  zonal  nutritional  influences,  the  preventive  conse-
quences  of  health  care,  peer  group  pressures,  educational  level,  religious  instruction, 
toxins  in  homes  and  in  the  air  and  water,  occupational  hazards,  job  stress,  and  expo-
sure  to  infectious  agents  and  many  others  [
44].  Better  understanding  of  the  genetics 
behind  cardiovascular  conditions  will  enhance  the  understanding  of  the  pathophys- iological  basis  of  these  diseases,  help  identify  novel  pharmacological  targets,  as  well  as  contribute  to  risk  assessment,  triaging  as  well  as  clinical  management  of 
the  patients.  Also,  recent  data  indicates  that  cardiac  conduction  measurements  taken  up  by  ECGs  are  heritable  [
45–47]  and  have  a  genetic  basis  [48–51].  Insight  into 
which  genes  are  implicated  and  the  nature  of  the  effect  these  genes  have  on  the 
heart,  it  is  needed  to  include  the  advanced  gene  sequencing  like  Next  Generation 
Sequencing  (NGS)  into  clinically  actionable  modalities.  Development  of  novel  arti-
ficial  intelligence  (AI)-enabled  ECG  interpretation  has  become  important  in  the  clin-
ical  ECG  workflow  since  its  inception  over  50  years  ago.  It  can  serve  as  a  crucial  aide 
to  physicians  generated  through  interpretation  not  only  in  resource-limited  clinical 
settings  prevalent  across  the  Indian  subcontinent  but  also  elsewhere  across  the  world. 
The  availability  of  cheap,  portable,  accessible,  and  scalable  computational  platforms 
with  capabilities  to  process  large-scale  raw  data  will  not  only  improve  expert  human 
ECG  interpretation  by  accurately  triaging  or  prioritizing  the  most  urgent  conditions 
but  also  importantly  reduce  the  rates  of  misdiagnosed  ECG  interpretations.  Hence, 
an  AI-enabled  differentiation  of  normal  from  abnormal  ECG  using  the  categoriza-
tion  based  upon  similarity  matrices  such  as  Earth  Mover’s  Distance  (EMD)  at  the 
community  level  which  could  consider  the  variations  in  genetic,  regional,  and  ethnic 
differences  could  play  a  role  in  modifying  the  graphs  and  hence  the  diagnosis.  There 
are  no  data  available  regarding  the  different  variations  in  the  healthy  as  well  as 
unhealthy  ECG  patterns  at  the  community  level.  Our  algorithm-enabled  segregation 
of  normal  records  could  make  a  pioneering  step  in  differentiating  it  from  abnormal 
traces  based  on  genetic  background.  In  our  work,  we  come  out  with  novel  frame-
work  for  computing  the  distance  between  images.  The  question  of  image  similarity 
is  complex  and  delicate.  It  not  only  takes  the  corresponding  bins  into  account,  but 
also  considers  the  correlations  between  non-corresponding  bins  and  thus  is  robust  to 
histogram  shift  and  rotation. 
2 Literature Review
ECG  interpretation  algorithms  have  gradually  become  the  industry  standard  for 
automating  diagnosis  and  anatomical  recognition  on  various  systems  throughout 
the  world.  These  algorithms  use  gender  and  age-specific  criteria  to  provide  a  virtual 
analysis  for  resting  ECG  interpretation,  including  detection  of  various  cardiovascular 
diseases  and  diagnostic  aids  to  provide  rhythm  and  morphology  interpretation  for  a

Earth Mover’s Distance-Based Automated … 7
variety  of  patient  populations,  as  well  as  an  effective  system  for  triaging  patients  to 
prioritize  the  most  serious  conditions.  Conventional  statistical  models,  such  as  linear 
and  logistic  regression,  and  even  neural  networks  with  few  layers  (often  referred 
to  as  “shallow  models”)  were  previously  constructed  based  on  human-selected  data 
elements  as  inputs.  In  ECG  analysis,  for  example,  the  inputs  were  morphological 
and  temporal  parameters  such  as  the  QT,  QRS,  and  RR  intervals  or  QRS  and/or 
T-wave  morphology,  while  the  outputs  were  the  ECG  rhythm,  serum  potassium 
level,  and  Left  ventricular  ejection  fraction  (LVEF).  The  model  is  constrained  to 
those  attributes  alone  since  the  inputs  are  human-selected  features.  Furthermore, 
any  random  or  systematic  error  in  computing  features  will  be  propagated  to  the 
output,  limiting  the  model’s  accuracy.  Deep  learning  allows  the  model  to  learn  a 
representation  of  the  input  data  that  includes  features  relevant  to  the  task  at  hand, 
without  any  human  bias  and  without  the  need  for  human  feature  selection  and  engi-
neering,  which  can  be  time  consuming,  inaccurate,  and  reliant  on  expertise  and 
current  physiological  theories.  The  agnostic  approach  in  a  neural  network  is  the  best 
representation,  but  it  is  also  nonlinear,  and  the  learned  associations  between  input  and 
output  data  are  currently  unexplainable,  making  the  model  a  “black  box”—humans 
cannot  understand  how  the  network  makes  decisions—which  is  one  of  the  concerns 
raised  about  deep-clinical  learning’s  application  [
52].  Static  and  predictable  algo-
rithms,  such  as  Earth  Mover’s  Distance  (EMDs),  on  the  other  hand,  execute  picture  matching  by  computing  perceptual  similarity  and  give  more  meaningful  and  inter-
pretable  matching  answers.  To  reiterate,  Artificial  Neural  Networks  (ANNs)  have  a  long  and  well-documented  history  of  intrinsic  instability,  and  their  automated  choices  cannot  be  trusted  to  make  life-saving  decisions  for  a  patient  suffering  from  a  severe 
cardiac  episode.  Due  to  a  lack  of  resources  and  competent  workers,  India  confronts  several  obstacles  in  providing  health  care,  particularly  in  rural  areas.  The  fact  that  CVD-related  fatalities  account  for  26%  of  all  deaths  in  India  demonstrates  the  enor-
mity  of  the  public  health  challenge  it  faces  [
53].  This  is  exacerbated  by  the  lack  of 
appropriate  algorithms  to  map  out  clinical  resource  allocation,  particularly  in  rural  underserved  areas  where  doctor–patient  ratios  are  low  and  access  to  inexpensive  and 
high-quality  health  care  is  severely  limited.  Despite  the  limits  of  current  algorithms  for  effectively  triaging  individuals,  the  12  lead  ECG  is  still  the  most  often  used  modality  for  identifying  differences  between  normal  ECGs  as  well  as  cardiac  disor-
ders.  Furthermore,  the  absence  of  well-structured  niche-specific  databases  that  span  the  various  genetic  foundations  for  referencing  and  analysis  stymies  the  advancement  of  AI-assisted  research  to  aid  and  optimize  processes  and  clinical  decision-making. 
Aside  from  the  complexity  already  discussed,  the  advancement  of  these  technologies  poses  significant  technological,  ethical,  privacy,  and  therapeutic  issues.  Despite  the 
aforementioned  challenges,  AI-based  automation  of  ECG  analysis  has  a  substan- tial  socioeconomic  impact  and  benefit  for  LMICs  like  India.  AI-based  systems  have  a  wide  range  of  potential  applications  in  health  care,  including  screening,  illness 
diagnosis,  patient  risk  assessment,  and  niche-specific  best  therapeutic  options.  The  development  of  AI-enabled  computational  applications  on  these  platforms  would  provide  a  valuable,  precision  public  health  tool  for  better  management  of  the  CVD

8
epidemic,  alleviating  a  significant  burden  on  the  government  finances,  given  the 
widespread  use  of  mobile  platforms  and  the  Internet  across  the  Indian  subcontinent. 
2.1 Earth Mover’s Distance
The  formal  definition  of  Earth  Mover’s  Distance  is  groundwork  for  “signatures”  of 
the  form  {(
1, 1),  ( 2, 2),  …,  ( m, m)},  where  xi  is  the  centre  of  the  data  cluster 
and  pi  is  the  number  of  points  in  the  cluster.  The  two  masses  may  be  unequal  if  two 
signatures  are  not  normalized.  Given  two  signatures 
1, 1),  …,  ( m, m)}  and 
S
1, 1),  …,  ( n, n)},  the  EMD  is  described  relating  to  optimal  flow  ij), 
which  curtails 
W (R ) =
m

i
n

j
fij dij
where  ij = i, j)  represents  dissimilarity  between  i and  j.  In  the  EMD  explo-
ration, 
The  flow  (
ij)  must  satisfy  the  following  constraints:  fij ≥
n

j
fij ≤ i ,
m

i
fij ≤ j ,
m

i
n

j
fij =


m

i
ri ,
n

j
sj


The  moment  optimal  flow  ij
* is  determined;  the  Earth  Mover’s  Distance  between 
R
EMD(R ) =


m

i
n

j
f

ij
dij

⎠/


m

i
n

j
fij )

Earth Mover’s Distance-Based Automated … 9
2.2 Various Distances
Assume 

represent  the  set  of  all  the  Borel  subspace 
of 
can  characterize  fundamental  distances  and  divergences  between  two  distributions 
Pr  and  Pg 
(I)The  total  variation  (TV)  distance  [54] 
δ(
....
A
|Pr |.
(II) The  Kullback–Leibler  (KL)  divergence 
KL ∥Pg 
.
log 
Pr
Pg
Pr
where  the  two  Pr  and  Pg  are  considered  to  be  truly  continuous  and  consequently 
confess  densities,  as  far  as  same  estimate 
2
.  The  KL  divergence 
[
54]  is  magnificently  unsymmetric  and  conceivable  illimitable  when  there  are 
points  so  that  Pg(
(III) The  Jensen–Shannon  (JS)  divergence  [54] 
JS(Pr ) = (Pr  ) + (Pg ),
where  Pm  is  the  average  (Pr  characterized  because  we  selected 
In  Earth  Mover’s  Distance,  the  probability  distributions  converge,  but  in  other 
distances  discussed  above,  the  probability  distributions  diverge. 
3 Statistics of ECG Training and Test Data Set
3.1 Data Set
This  data  set  is  collected  from  a  medical  camp  conducted  in  Pulwama  district,  Jammu  and  Kashmir.  Every  ECG  stream  determines  the  electrical  activity  over  one  heartbeat. 
Two  categories  of  heartbeat  were  used,  one  is  normal  heartbeat  and  other  is  myocar- dial  infarction.  The  data  set  consists  of  25  training  ECG  images  of  both  normal  heartbeat  and  myocardial  infarction  of  each  category  and  ten  testing  images  of  both 
normal  and  myocardial  infarction  of  each  category.

10
3.2 EMD-Based Analysis of Training Data
Firstly,  we  apply  EMD  on  training  data  set.  The  training  data  set  has  25  ECG  images 
in  each  of  the  two  categories:  one  is  normal  heartbeat,  and  another  is  myocardial 
infarction  heartbeat.  We  keep  these  two  categories  of  ECG  images  in  two  different 
folders.  Now  the  prime  issue  is  that  there  are  broad  inequalities  among  healthy 
ECG  images  and  in  unhealthy  ECG  images,  and  the  delving  query  which  is  ECG 
in  the  collection  of  healthy  ECGs  is  the  representative  of  normal  ECGs  which  can 
be  exploited  to  measure  amount  of  dissimilarity  with  test  ECG  images.  Identical 
query  emerges  to  identify  a  representative  unhealthy  ECG  which  can  be  exploited 
to  measure  dissimilarity  with  test  ECGs.  EMD  technique  answers  these  queries 
positively,  and  EMD  identifies  representative  healthy  ECG  and  unhealthy  ECGs, 
with  pairwise  EMD  computation  among  healthy  ECG  collection  and  unhealthy  ECG 
collection,  respectively. 
Table 1 presents  pairwise  EMD  computed  over  a  sample  of  five  ECG  images  so 
that  EMD  matrix  can  be  easily  printed  here.  Complete  EMD  matrix  for  entire  healthy 
ECG  data  with  dimension  of  25 
is  available.  In  Table 1,  the  last  row  represents  the  sum  of  the  EMDs  of  first  healthy 
ECG  in  first  column  with  respect  to  other  ECGs  and  similarly  second  column  for  second  ECG  and  so  on.  Representative  ECG  is  selected  on  the  basis  of  minimum  of 
sum  of  the  EMDs,  in  other  words  a  representative  ECG  is  picked  which  is  closest  to 
all  other  healthy  ECGs.  A  simple  observation  of  sum  of  EMDs  reveals  that  the  third 
column  has  minimum  EMD,  in  other  words  third  healthy  ECG  is  closest  to  all  other 
healthy  ECGs,  so  third  ECG  is  considered  as  representative  of  all  healthy  ECGs. 
Identical  task  is  done  for  unhealthy  ECGs  to  identify  unhealthy  representative 
ECG,  and  its  corresponding  five-sample  pairwise  EMD  matrix  is  shown  in  Table 2, 
where  the  sum  of  EMDs  entry  contains  sum  of  EMDs  of  first  ECG  in  first  column
Table 1
Healthy  ECG 

Healthy  ECG  2 Healthy  ECG  3 Healthy  ECG  4 Healthy  ECG  5 
Healthy  ECG 1  0.0000000
0.36726380.21245100.31083380.4157938 
Healthy  ECG 2 0.3647268
0.00000000.17768330.21618310.2967646 
Healthy  ECG 3 0.2110799
0.17928340.00000000.15943890.1562445 
Healthy  ECG 4 0.3129947
0.21957670.16048860.00000000.2897190 
Healthy  ECG 5 0.4157698
0.29624810.15724030.28895280.0000000 
Sum  of EMD’s 1.3045712
1.06237200.70786310.97540861.1585219 

Earth Mover’s Distance-Based Automated … 11
Table 2
Unhealthy 
ECG  1 
Unhealthy  ECG  2 Unhealthy  ECG  3 Unhealthy  ECG  4 Unhealthy  ECG  5 
Unhealthy ECG  1  0.0000000
0.31417700.23342880.22659010.4711726 
Unhealthy ECG  2 0.3171718
0.00000000.28620490.20942080.3973091 
Unhealthy ECG  3 0.2351154
0.28801500.00000000.33445080.4517883 
Unhealthy ECG  4 0.2281418
0.21094360.33513630.00000000.3081703 
Unhealthy ECG  5 0.4728687
0.39887570.45319630.30749640.0000000 
Sum  of EMD’s 1.253298
1.2120111.3079661.0779581.628440 
with  rest  of  all  unhealthy  ECGs,  second  column  for  second  ECG,  and  so  on.  A  simple 
observation  of  sum  of  EMDs  reveals  that  the  fourth  column  has  minimum  EMD,  in 
other  words  fourth  unhealthy  ECG  is  closest  to  all  other  unhealthy  ECGs,  so  fourth 
ECG  is  considered  as  representative  of  all  unhealthy  ECGs.  Identical  task  is  carried 
out  on  25  unhealthy  ECGs,  and  ECG  number  xx  is  selected  as  representative  ECG 
which  is  used  to  compare  with  rest  of  all  unhealthy  ECGs. 
3.3 ECG Tagging with EMD
Test  data  set  has  20  ECGs  which  includes  both  ten  healthy  and  ten  unhealthy  ECGs, and  these  are  compared  against  representative  of  healthy  ECG.  If  test  ECG  is  closer 
to  healthy  ECG  representative  that  is  their  EMD  is  smaller  compared  to  unhealthy ECG  representative,  then  the  test  ECG  is  tagged  as  “+1”,  or  if  it  is  closer  to  unhealthy ECG  representative,  then  it  is  tagged  as  “
overall  accuracy  of  60%  and  overall  accuracy  of  41.66%. 
3.4 Algorithmic Framework Setup
EMD  calculates  Earth  Mover’s  Distance  between  distributions,  and  emd2d  collates two  distributions  expressed  as  matrices.  EMD  feature  of  this  task  is  done  in  R  scripting 
language  (Fig. 
1).

12
Initialization
Unhealthy ECG Training
Dataset
Healthy ECG Training
Dataset
Pairwise EMD
Calculation
Pairwise EMD
Calculation
Pairwise EMD
Calculation
Pairwise EMD
Calculation
Test ECG
Test ECG
H>U
Tag
Healthy
Tag
Unhealthy
Fig. 1

Earth Mover’s Distance-Based Automated … 13
Computational results
Test  ECG  ID
EMD  with 
respect  to 
healthy  ECG 
EMD  with  respect  to  unhealthy  ECG EMD  difference  b/w  healthy  and  unhealthy  ECG Our  disease  tag Ground  truthError 
T1 0.163895
0.1991414 −+1+10 
T20.1340776
0.1587326 −+1+10 
T30.2082027
0.18104990.027153−+12 
T40.1531404
0.1779477 −+1+10 
T50.1307294
0.1722854 −+1+10 
T60.2092571
0.18176790.027489−+12 
T70.1531407
0.14858770.004553−+12 
T80.3140732
0.2431530.07092−+12 
T90.1515926
0.1835698 −+1+10 
T100.1809756
0.190618 −+1+10 
T110.1804128
0.1940299 −+1−−
T120.1562672
0.179345 −+1−−
T130.1536508
0.2079541 −+1−−
T140.1732075
0.17097120.002236−−0 
T150.2482202
0.21119420.037026−−0 
T160.2020969
0.219492 −+1−−
T170.1474309
0.181211 −+1−−
T180.1864928
0.2115686 −+1−−
T190.1650139
0.2058548 −+1−−
T200.1325707
0.146884 −+1−−
Computation of EMD for training healthy data
H1
0.1705611
0.2016369 −+1+10 
H20.1438227
0.1609232 −+1+10 
H30.1537007
0.1908258 −+1+10 
H40.1456105
0.1852883 −+1+10 
H50.1405678
0.1863929 −+1+10 
H60.1845443
0.1616410.022903−+1+2 
H70.171209
0.15682420.014385−+1+2 
H80.1768019
0.1938189 −+1+10 
H90.1644214
0.1686704 −+1+10 
H100.1446915
0.1840659 −+1+10 
H110.1251102
0.1712205 −+1+10 
H120.1780795
0.15645950.02162−+1+2
(continued)

14
(continued)
Test ECG ID
EMD with
respect to
healthy ECG
EMD with respect to unhealthy ECGEMD difference b/w healthy and unhealthy ECGOur disease tagGround truthError
H13 0.1101195
0.177229 −+1+10 
H140
0.2010695 −+1+10 
H150.1948021
0.2117433 −+1+10 
H160.1572228
0.2044841 −+1+10 
H170.2382775
0.15222110.086056−+1+2 
H180.1453631
0.2098166 −+1+10 
H190.1533407
0.1819317 −+1+10 
H200.2001808
0.16536520.034816−+1+2 
Computation of EMD for training unhealthy data
UH1
0.1495372
0.1863064 −+1−−
UH20.1629056
0.2187011 −+1−−
UH30.1441234
0.1671735 −+1−−
UH40.1294056
0.1888588 −+1−−
UH50.1870512
0.1632910.02376−−0 
UH60.1948595
0.2015528 −+1−−
UH70.1712404
0.1901085 −+1−−
UH80.1545726
0.1652415 −+1−−
UH90.2042541
00.204254−−0 
UH100.1465899
0.1531914 −+1−−
UH110.1532719
0.1724425 −+1−−
UH120.1371117
0.1599319 −+1−−
UH130.1967969
0.2129231 −+1−−
UH140.1179075
0.1661596 −+1−−
UH150.1331991
0.1931969 −+1−−
UH160.1689143
0.1795368 −+1−−
UH170.1467736
0.1930697 −+1−−
UH180.1580702
0.1878384 −+1−−
UH190.1395352
0.1605549 −+1−−
UH200.1610765
0.2151979 −+1−−
Future implications
The  Indian  subcontinent’s  current  population  is  a  mosaic  of  social,  cultural,  and 
ethnic  variety.  Moreover,  one  billion  people  live  on  the  Indian  subcontinent  today,

Earth Mover’s Distance-Based Automated … 15
representing  hundreds  of  linguistic  and  ethnic  groupings  [54, 55].  Genetic  and  anthro-
pological  research  has  revealed  that  the  subcontinent’s  peopling  has  a  complicated 
past,  with  contributions  from  several  ancestral  groups  [56].  Recent  autosomal  single 
nucleotide  polymorphisms  (SNP)  and  short  tandem  repeats  (STR)  investigations  have 
also  revealed  that  Indian  ethnic  and  linguistic  groupings  are  genetically  diverse  [7– 
57].  In  2010  research  titled  “Genetic  diversity  in  India  and  the  inference  of  Eurasian 
population  expansion,”  Jinchuan  Xing  et  al.  found  that  the  average  number  of  single  nucleotide  polymorphisms  (SNPs)  per  individual  for  Indians  is  1.5,  which  is  greater 
than  both  European  (1.3  SNPs)  and  East  Asian  individuals  (1.1  SNPs).  This  finding 
shows  that  Indian  people  have  more  genetic  diversity  than  other  Eurasian  groups  [
58]. 
Taken  into  this  consideration,  it  can  also  be  concluded  that  regarding  this  profound  genetic  diversity  there  must  be  variations  in  the  healthy  as  well  as  unhealthy  ECGs  of  the  Indian  individuals  which  is  yet  to  explore.  The  genetic  variation  in  the  ECGs 
can  be  very  useful  in  classifying  the  ECGs  into  wider  boundaries  of  CVDs  and  can  provide  better  diagnosis  of  the  various  CVDs  among  the  Indian  population  regard- less  of  the  wise  genetic  variations.  Our  study  was  based  on  ECGs  of  a  particular 
community  with  a  limited  sample  size  which  does  disease  tagging  based  on  two  classifications.  But  with  larger  and  diverse  ECGs,  our  software  can  be  easily  coupled 
with  conventional  markers  effective  for  the  diagnostic  tool  for  rapid  screening  of  large  populations  in  the  community  as  currently  no  data  accessible  on  the  variances  in  ECG  patterns  at  the  community  level  throughout  the  Indian  subcontinent,  and  this 
repository  would  be  a  pioneering  endeavour  which  can  further  classify  ECGs  into  various  CVDs  tags  with  the  help  of  EMD.  Not  only  confined  to  the  CVDs,  but  our  software  can  also  be  used  in  other  disease  tagging  like  lung  cancer  detection,  brain 
tumour  classification,  etc.  Our  programme  may  easily  be  combined  with  traditional  CVD  indicators  while  also  having  a  patient-agnostic  genetic  base,  allowing  it  to  be  used  as  a  quick  screening  tool  for  large  populations  in  both  community  and  tertiary 
care  settings.  By  incorporating  large  amount  of  genetic  basis  of  the  Indian  patients,  ECGs  tagging  will  be  much  more  resourceful  in  detecting  CVDs  not  only  confined  to  zonal  boundaries  in  India,  instead  it  will  be  able  to  detect  various  CVDs/pre-CVDs 
conditions  influenced  by  genetic  variations. 
4 Conclusion
In  this  work,  we  used  similarity  detection  metrics  like  EMD  exploited  more  in  image  retrieval  applications,  and  it  can  yield  an  accuracy  of  seventy-five  per  cent  in 
healthy  ECG  tagging.  This  work  helps  the  patients  make  sense  of  ECGs  and  further  recommend  doctor  consultation  if  required.  This  AI-enabled  computational  platform  using  EMD  seamlessly  will  segregate  the  ECG  image  repository,  and  if  the  genetic 
basis  of  the  individual  will  be  considered,  then  it  provides  accurate  and  precision- oriented  triaging  of  ECG-based  CVD  anomalies  even  in  resource-limited  milieus  to  ensure  precision-oriented  diagnosis,  thereby  alleviating  the  clinical  prognosis  of  the 
patients  especially  in  LMICs  like  India,  where  the  scarcity  of  specialized  cardiologists

16
significantly  affects  the  clinical  prognosis  of  patients.  Next  Generation  Sequencing 
analysis  is  one  of  the  anticipated  strategies  that  can  be  implemented  to  map  out  the 
frequency  of  novel  molecular  markers  predictive  of  an  impending  cardiovascular 
event  at  a  community  level  after  considering  the  Indian  genetic,  regional,  and  ethnic 
differences  that  could  play  a  statutory  role  in  modifying  the  graphs  and  hence  the 
diagnosis.  Currently,  there  is  no  data  available  regarding  the  different  differences  in 
the  ECG  patterns  at  the  community  level  across  the  Indian  subcontinent,  and  this 
repository  would  a  pioneering  effort  with  capabilities  to  be  deployed.  Our  software 
can  be  easily  coupled  with  conventional  markers  of  CVDs  incorporating  the  patient-
agnostic  genetic  basis  to  enable  its  application  as  an  effective  diagnostic  tool  for  rapid 
screening  of  large  populations  in  community  settings  as  well  as  tertiary  care  settings. 
As  future  work,  this  work  can  be  extended  to  further  classify  ECGs  into  several 
disease  tags  as  with  the  ever-expanding  number  of  variants  associated  with  disease, 
and  we  might  expect  that  over  time  hundreds  of  gene  regions  will  be  implicated  in 
the  pathophysiology  of  cardiac  conditions. 
Acknowledgements
support  as  grant  in  aid  for  the  project  entitled  “Intelligent  Decision  Support  System  for  Cardiovas-
cular  Disease  Detection  Using  ECG  Traces  at  Tertiary  Care  Centres  and  Extended  Community” 
(ISRM/12(83)/2020  ID  No.  2020-3500)  to  Dr.  Rajiv  Janardhanan. 
Funding Indian  Council  of  Medical  Research,  New  Delhi,  India. 
References
1. Rundo  F,  Conoci  S,  Ortis  A,  Battiato  S  (2018)  An  advanced  bio-inspired  photoplethysmography  (PPG)  and  ECG  pattern  recognition  system  for  medical  assessment.  Sensors  18(2):405 
2. Younker  J  (2011)  Assessment  of  the  cardiovascular  system.  Nursing  the  Cardiac  Patient.  Wiley- Blackwell,  Chichester,  pp  19–35 
3. Jowett  NI,  Thompson  DR  (2007)  Comprehensive  coronary  care.  Elsevier  Health  Sciences 
4. Garcia  TB  (2013)  12-lead  ECG:  The  art  of  interpretation.  Jones  &  Bartlett  Publishers 
5. Prabhakaran  D,  Jeemon  P,  Roy  A  (2016)  Cardiovascular  diseases  in  India:  current  epidemiology  and  future  directions.  Circulation  133(16):1605–1620 
6. Reich  D,  Thangaraj  K,  Patterson  N,  Price  AL,  Singh  L  (2009)  Reconstructing  Indian  population  history.  Nature  461(7263):489–494 
7. Basu  A,  Sarkar-Roy  N,  Majumder  PP  (2016)  Genomic  reconstruction  of  the  history  of  extant  populations  of  India  reveals  five  distinct  ancestral  components  and  a  complex  structure.  Proc  Natl  Acad  Sci  113(6):1594–1599 
8. Nakatsuka  N,  Moorjani  P,  Rai  N,  Sarkar  B,  Tandon  A,  Patterson  N,  Bhavani  GS,  Girisha  KM,  Mustak  MS,  Srinivasan  S,  Kaushik  A  (2017)  The  promise  of  discovering  population-specific  disease-associated  genes  in  South  Asia.  Nat  Genet  49(9):1403–1407 
9. Christianson  A,  Howson  CP,  Modell  B  (2005)  March  of  Dimes:  global  report  on  birth  defects,  the  hidden  toll  of  dying  and  disabled  children 
10. Kaur  A,  Singh  JR  (2010)  Chromosomal  abnormalities:  genetic  disease  burden  in  India.  Int  J  Hum  Genet  10(1–3):1–14 
11. Singh  I,  Faruq  M,  Mukherjee  O,  Jain  S,  Pal  PK,  Srivastav  MP,  Behari  M,  Srivastava  AK,  Mukerji  M  (2010)  North  and  South  Indian  populations  share  a  common  ancestral  origin  of  Friedreich’s  ataxia  but  vary  in  age  of  GAA  repeat  expansion.  Ann  Hum  Genet  74(3):202–210

Earth Mover’s Distance-Based Automated … 17
12. Sachdeva  K,  Saxena  R,  Puri  R,  Bijarnia  S,  Kohli  S,  Verma  IC  (2012)  Mutation  analysis  of 
the  CFTR  gene  in  225  children:  identification  of  five  novel  severe  and  seven  reported  severe 
mutations.  Genet  Test  Mol  Biomarkers  16(7):798–801 
13. Venugopal  A,  Chandran  M,  Eruppakotte  N,  Kizhakkillach  S,  Breezevilla  SC,  Vellingiri  B  (2018)  Monogenic  diseases  in  India.  Mutation  Res/Rev  Mutation  Res  776:23–31 
14. Verma  IC,  Bijarnia-Mahay  S,  Jhingan  G,  Verma  J  (2015)  Newborn  screening:  need  of  the  hour  in  India.  Indian  J  Pediatrics  82(1):61–70 
15. Bamshad  M,  Kivisild  T,  Watkins  WS,  Dixon  ME,  Ricker  CE,  Rao  BB,  Naidu  JM,  Prasad  BR,  Reddy  PG,  Rasanayagam  A,  Papiha  SS  (2001)  Genetic  evidence  on  the  origins  of  Indian  caste  populations.  Genome  Res  11(6):994–1004 
16. Kivisild  T,  Rootsi  S,  Metspalu  M,  Mastana  S,  Kaldma  K,  Parik  J,  Metspalu  E,  Adojaan  M,  Tolk  HV,  Stepanov  V,  Gölge  M  (2003)  The  genetic  heritage  of  the  earliest  settlers  persists  both  in  Indian  tribal  and  caste  populations.  Am  J  Hum  Genet  72(2):313–332 
17. Das  K,  Malhotra  KC,  Mukherjee  BN,  Walter  H,  Majumder  PP,  Papiha  SS  (1996).  Popula- tion  structure  and  genetic  differentiation  among  16  tribal  populations  of  central  India.  Hum  Biol:679–705 
18. Majumder  PP,  Roy  B,  Banerjee  S,  Chakraborty  M,  Dey  B,  Mukherjee  N,  Roy  M,  Thakurta 
PG,  Sil  SK  (1999)  Human-specific  insertion/deletion  polymorphisms  in  Indian  populations 
and  their  possible  evolutionary  implications.  Eur  J  Hum  Genet  7(4):435–446 
19. Thanseem  I,  Thangaraj  K,  Chaubey  G,  Singh  VK,  Bhaskar  LV,  Reddy  BM,  Reddy  AG,  Singh  L  (2006)  Genetic  affinities  among  the  lower  castes  and  tribal  groups  of  India:  inference  from  Y  chromosome  and  mitochondrial  DNA.  BMC  Genet  7(1):1–11 
20. Khan  F,  Pandey  AK,  Borkar  M,  Tripathi  M,  Talwar  S,  Bisen  PS,  Agrawal  S  (2008)  Effect  of  sociocultural  cleavage  on  genetic  differentiation:  a  study  from  North  India.  Hum  Biol  80(3):271–286 
21. Borkar  M,  Ahmad  F,  Khan  F,  Agrawal  S  (2011)  Paleolithic  spread  of  Y-chromosomal  lineage  of  tribes  in  eastern  and  northeastern  India.  Ann  Hum  Biol  38(6):736–746 
22. Guha  P,  Das  A,  Dutta  S,  Bhattacharjee  S,  Chaudhuri  TK  (2015)  Study  of  genetic  diversity  of  KIR  and  TLR  in  the  Rabhas,  an  endogamous  primitive  tribe  of  India.  Hum  Immunol  76(11):789–794 
23. Chinniah  R,  Vijayan  M,  Thirunavukkarasu  M,  Mani  D,  Raju  K,  Ravi  PM,  Sivanadham  R,  Karuppiah  B  (2016)  Polymorphic  Alu  insertion/deletion  in  different  caste  and  tribal  populations  from  South  India.  PLoS  ONE  11(6):e0157468 
24. Vadivelu  MK  (2016)  Emergence  of  sociocultural  norms  restricting  intermarriage  in  large  social  strata  (endogamy)  coincides  with  foreign  invasions  of  India.  Proc  Natl  Acad  Sci  113(16):E2215–E2217 
25. Bhasin  MK,  Nag  S  (2012)  Consanguinity  and  its  effects  on  fertility,  mortality  and  morbidity  in  the  Indian  region:  a  review.  Int  J  Hum  Genet  12(4):197–301 
26. Sengupta  D,  Choudhury  A,  Basu  A,  Ramsay  M  (2016)  Population  stratification  and  under- representation  of  Indian  subcontinent  genetic  diversity  in  the  1000  genomes  project  dataset.  Genome  Biol  Evol  8(11):3460–3470 
27. Indian  Genome  Variation  Consortium+  91-11-27667806+  91-11-27667471  [email protected].  (2005)  The  Indian  genome  variation  database  (IGVdb):  a  project  overview.  Human  Genet 
118:1–11 
28. Narang  A,  Roy  RD,  Chaurasia  A,  Mukhopadhyay  A,  Mukerji  M,  Dash  D,  Indian  Genome  Variation  Consortium  (2010)  IGVBrowser–a  genomic  variation  resource  from  diverse  Indian  populations.  Database  2010 
29. Sinha  S,  Mishra  SK,  Sharma  S,  Patibandla  PK,  Mallick  PK,  Sharma  SK,  Mohanty  S,  Pati  SS,  Mishra  SK,  Ramteke  BK,  Bhatt  RM  (2008)  Polymorphisms  of  TNF-enhancer  and  gene  for  Fc population.  Malar  J  7(1):1–11 
30. Sinha  S,  Qidwai  T,  Kanchan  K,  Anand  P,  Jha  GN,  Pati  SS,  Mohanty  S,  Mishra  SK,  Tyagi  PK,  Sharma  SK,  Venkatesh  V  (2008)  Variations  in  host  genes  encoding  adhesion  molecules  and  susceptibility  to  falciparum  malaria  in  India.  Malar  J  7(1):1–9

18
31. Sinha  S,  Arya  V,  Agarwal  S,  Habib  S,  Indian  Genome  Variation  Consortium  (2009)  Genetic 
differentiation  of  populations  residing  in  areas  of  high  malaria  endemicity  in  India.  J  Genet 
88(1):77–80 
32. Jha  P,  Sinha  S,  Kanchan  K,  Qidwai  T,  Narang  A,  Singh  PK,  Pati  SS,  Mohanty  S,  Mishra  SK,  Sharma  SK,  Awasthi  S,  Indian  Genome  Variation  Consortium  (2012)  Deletion  of  the  APOBEC3B  gene  strongly  impacts  susceptibility  to  falciparum  malaria.  Infection,  Genet  Evol  12(1):142–148 
33. Kanchan  K,  Pati  SS,  Mohanty  S,  Mishra  SK,  Sharma  SK,  Awasthi  S,  Venkatesh  V,  Habib  S  (2015)  Polymorphisms  in  host  genes  encoding  NOSII,  C-reactive  protein,  and  adhesion  molecules  thrombospondin  and  E-selectin  are  risk  factors  for  Plasmodium  falciparum  malaria  in  India.  Eur  J  Clin  Microbiol  Infect  Dis  34(10):2029–2039 
34. Bhattacharjee  A,  Banerjee  D,  Mookherjee  S,  Acharya  M,  Banerjee  A,  Ray  A,  Sen  A,  Ray  K,  &  Indian  Genome  Variation  Consortium  (2008)  Leu432Val  polymorphism  in  CYP1B1  as  a  susceptible  factor  towards  predisposition  to  primary  open-angle  glaucoma.  Mole  Vis  14:841 
35. Kumar  J,  Garg  G,  Kumar  A,  Sundaramoorthy  E,  Sanapala  KR,  Ghosh  S,  Karthikeyan  G, 
Ramakrishnan  L  (2009).  Single  nucleotide  polymorphisms  in  homocysteine  metabolism 
pathway  genes:  association  of  CHDH  A119C  and  MTHFR  C677T  with  hyperhomocys-
teinemia.  Circ:  Cardiovasc  Genet  2(6):599–606 
36. Aggarwal  S,  Negi  S,  Jha  P,  Singh  PK,  Stobdan  T,  Pasha  MQ,  Ghosh  S,  Agrawal  A,  Indian  Genome  Variation  Consortium,  Prasher  B,  Mukerji  M  (2010)  EGLN1  involvement  in  high- altitude  adaptation  revealed  through  genetic  analysis  of  extreme  constitution  types  defined  in  Ayurveda.  Proc  National  Acad  Sci  107(44):18961–18966 
37. Aggarwal  S,  Gheware  A,  Agrawal  A,  Ghosh  S,  Prasher  B,  Mukerji  M  (2015)  Combined  genetic  effects  of  EGLN1  and  VWF  modulate  thrombotic  outcome  in  hypoxia  revealed  by  Ayurgenomics  approach.  J  Transl  Med  13(1):1–11 
38. McPherson  R  (2013)  From  genome-wide  association  studies  to  functional  genomics:  new  insights  into  cardiovascular  disease.  Can  J  Cardiol  29(1):23–29 
39. Joshi  P,  Islam  S,  Pais  P,  Reddy  S,  Dorairaj  P,  Kazmi  K,  Pandey  MR,  Haque  S,  Mendis  S,  Rangarajan  S,  Yusuf  S  (2007)  Risk  factors  for  early  myocardial  infarction  in  South  Asians  compared  with  individuals  in  other  countries.  JAMA  297(3):286–294 
40. Dhandapany  PS,  Sadayappan  S,  Xue  Y,  Powell  GT,  Rani  DS,  Nallari  P,  Rai  TS,  Khullar  M,  Soares  P,  Bahl  A,  Tharkan  JM  (2009)  A  common  MYBPC3  (cardiac  myosin  binding  protein  C)  variant  associated  with  cardiomyopathies  in  South  Asia.  Nat  Genet  41(2):187–191 
41. Simonson  TS,  Zhang  Y,  Huff  CD,  Xing  J,  Watkins  WS,  Witherspoon  DJ,  Woodward  SR,  Jorde  LB  (2010)  Limited  distribution  of  a  cardiomyopathy-associated  variant  in  India.  Ann  Hum  Genet  74(2):184–188 
42. Reddy  KS,  Shah  B,  Varghese  C,  Ramadoss  A  (2005)  Responding  to  the  threat  of  chronic  diseases  in  India.  The  Lancet  366(9498):1744–1749 
43. Soman  CR,  Kutty  VR,  Safraj  S,  Vijayakumar  K,  Rajamohanan  K,  Ajayan  K,  PROLIFE  Study  Group  (2011)  All-cause  mortality  and  cardiovascular  mortality  in  Kerala  State  of  India:  results  from  a  5-year  follow-up  of  161  942  rural  community  dwelling  adults.  Asia  Pacific  J  Public  Health  23(6):896–903 
44. Bulatao  RA,  Anderson  NB  (2004)  Understanding  racial  and  ethnic  differences  in  health  in  late  life:  a  research  agenda 
45. Pilia  G,  Chen  WM,  Scuteri  A,  Orrú  M,  Albai  G,  Dei  M,  Lai  S,  Usala  G,  Lai  M,  Loi  P,  Mameli  C  (2006)  Heritability  of  cardiovascular  and  personality  traits  in  6148  Sardinians.  PLoS  Genet  2(8):e132 
46. Schrier  L,  Hadjipanayis  A,  Stiris  T,  Ross-Russell  RI,  Valiulis  A,  Turner  MA,  Zhao  W,  De  Cock  P,  de  Wildt  SN,  Allegaert  K,  van  den  Anker  J  (2020)  Off-label  use  of  medicines  in  neonates,  infants,  children,  and  adolescents:  a  joint  policy  statement  by  the  European  academy  of  Paedi- atrics  and  the  European  society  for  developmental  perinatal  and  pediatric  pharmacology.  Eur  J  Pediatr  179(5):839–847 
47. Newton-Cheh  C,  Guo  CY,  Wang  TJ,  O’donnell  CJ,  Levy  D,  Larson  MG  (2007)  Genome-wide  association  study  of  electrocardiographic  and  heart  rate  variability  traits:  the  Framingham  heart  study.  BMC  Med  Genet  8(1):1–8

Earth Mover’s Distance-Based Automated … 19
48. Pfeufer  A,  Van  Noord  C,  Marciante  KD,  Arking  DE,  Larson  MG,  Smith  AV,  Tarasov  KV, 
Müller  M,  Sotoodehnia  N,  Sinner  MF,  Verwoert  GC  (2010)  Genome-wide  association  study 
of  PR  interval.  Nat  Genet  42(2):153–159 
49. Holm  H,  Gudbjartsson  DF,  Arnar  DO,  Thorleifsson  G,  Thorgeirsson  G,  Stefansdottir  H,  Gudjonsson  SA,  Jonasdottir  A,  Mathiesen  EB,  Njølstad  I,  Nyrnes  A  (2010)  Several  common  variants  modulate  heart  rate,  PR  interval  and  QRS  duration.  Nat  Genet  42(2):117–122 
50. Butler  AM,  Yin  X,  Evans  DS,  Nalls  MA,  Smith  EN,  Tanaka  T,  Li  G,  Buxbaum  SG,  Whitsel  EA,  Alonso  A,  Arking  DE  (2012).  Novel  loci  associated  with  PR  interval  in  a  genome-wide  association  study  of  10  African  American  cohorts.  Circ:  Cardiovasc  Genet  5(6):639–646 
51. Smith  JG,  Magnani  JW,  Palmer  C,  Meng  YA,  Soliman  EZ,  Musani  SK,  Kerr  KF,  Schnabel  RB,  Lubitz  SA,  Sotoodehnia  N,  Redline  S,  Candidate-gene  Association  Resource  (CARe)  Consor- tium  (2011)  Genome-wide  association  studies  of  the  PR  interval  in  African  Americans.  PLoS  Genet  7(2):e1001304 
52. Rudin  C  (2019)  Stop  explaining  black  box  machine  learning  models  for  high  stakes  decisions  and  use  interpretable  models  instead.  Nature  Mach  Intell  1(5):206–215 
53. World  Heart  Federation.  Fact  sheet:  cardiovascular  diseases  in  India. https://world-heart-fed 
eration.org/wpcontent/uploads/2017/05/Cardiovascular_diseases_in_India.pdf.  Last  accessed 
13th  May  2019 
54. Singh  KS  (1992)  People  of  India,  vol  42.  Anthropological  Survey  of  India 
55. Chaubey  G,  Metspalu  M,  Kivisild  T,  Villems  R  (2007)  Peopling  of  South  Asia:  investigating  the  caste–tribe  continuum  in  India.  BioEssays  29(1):91–100 
56. Majumder  PP  (2010)  The  human  genetic  history  of  South  Asia.  Curr  Biol  20(4):R184–R187 
57. Watkins  WS,  Thara  R,  Mowry  BJ,  Zhang  Y,  Witherspoon  DJ,  Tolpinrud  W,  Bamshad  MJ,  Tirupati  S,  Padmavati  R,  Smith  H,  Nancarrow  D  (2008)  Genetic  variation  in  South  Indian  castes:  evidence  from  Y-chromosome,  mitochondrial,  and  autosomal  polymorphisms.  BMC  Genet  9(1):1–17 
58. Xing  J,  Watkins  WS,  Hu  Y,  Huff  CD,  Sabo  A,  Muzny  DM,  Bamshad  MJ,  Gibbs  RA,  Jorde  LB,  Yu  F  (2010)  Genetic  diversity  in  India  and  the  inference  of  Eurasian  population  expansion.  Genome  Biol  11(11):1–13

Cardial Disease Prediction 
in Multi-variant Systems Using 
MT-MrSBC Model 
Pandiyan Nandakumar and Subhashini Narayan 
Abstract 
many reasons behind heart disease including smoking, heredity, and diabetes. Day
to day, people face various common symptoms of heart disease which are uncon-
sidered in a lethargic manner. This leads to serious and life-threatening complica-
tions. To predict these diseases in prior, several methods are existing which take in a
certain number of parameters for prognosis. The system proposed here is an ensemble
approach that combines the idea of the MT-MrSBC algorithm along with bagging
and boosting. The algorithm mentioned here overcomes the issues faced by other
algorithms in handling the multi-variant environment. The algorithm deploys itera-
tive techniques indulging bagging and boosting concepts that enhance the system.
The system trained is thus capable of predicting the disease of the patient. This helps
in taking precautionary measures by the patient which are life saving.
Keywords  ·Prediction ·MT-MrSBC algorithm ·Cardiac disease
1  Introduction 
Heart disease is the most pressing problem and a pivotal cause of death for both men and women. An enormous amount of findings is elaborated in medical associations (healing facility, medicinal focuses); still, this observation is not utilized properly.
The healthcare system is “data rich” however “knowledge poor”. Nowadays, heart disease was the main problem in most developed countries as well as for developing countries too. The rate of heart disease increases due to improper care and being
unaware of the condition. This kind of illness becomes life threatening when it is not monitored regularly. There are a few kinds of heart disease in this world, such as
P. Nandakumar (B) ·
School of Information Technology and Engineering, Vellore Institute of Technology,
Vellore 632014, Tamil Nadu, India
e-mail: [email protected]
S. Narayan e-mail:
[email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. K. Deva Sarma et al. (eds.),
Technology
https://doi.org/10.1007/978-981-19-5090-2_2
21

22 P. Nandakumar and S. Narayan
arrhythmia, coronary heart disease, congenital heart defects, cardiovascular disease,
and cardiomyopathy disease. Cardiovascular disease prompts different issues in the
human body such as sudden BP is raised and unexpected stroke. Cardiovascular
disease may also lead to the demise of the person in a short time duration.
As per recent statistics [1], it is found that due to heart diseases, 6.5 million
demises occur globally per annum. 0.9 million death rates (64%) are due to coronary heart disease and 0.4 million death rates (28%) by stroke. The death rate of 300 per
100,000 population in India is higher than the global average of 240 per 100,000 population. Adults born after 1970 are much more vulnerable to such deaths than those born earlier.
This paper is an initial attempt to reduce the time and efforts taken by the doctor by
systematizing the risk prognostic with the help of a concept called a binary classifier. The imminence of cardiovascular disease is affected by environmental problems and
health services constraints. The heart attack increases due to various risk factors [
2]
including smoking, lack of exercise, high blood pressure, cholesterol, sugar levels, and pulse, an improper diet that leads to obesity, family history of coronary illness,
physical inertia, etc. Primal findings and treatment can keep heart disease from getting worse. Doctors will always depend on his or her knowledge and exposure rather than
an idea available in data that are covered in the database to make clinical decisions. Despite that, many doctors with their adept learning in all sub-profession are lacking resource. The information given by the patients may have repetitive symptoms which
leads them to suffer from varied diseases, and it may have matching symptoms. For this reason, the physicians cannot able to diagnose it validly since they are less experienced in handling it. Diagnosing heart disease perfectly at an earlier stage is
a very challenging task for many doctors because of the tough reliance on diverse factors.
Pertinent computer-based information and decision system can aid in achieving
the above test at a reduced cost. To prevent the increase in death rate, this algorithm is being used to predict the disease before undergoing the treatment on time and prevent itself from the severity of that disease. The term heart disease encompasses
the different kind of disease that affects the heart. The work done in this paper is as follows:
• In this paper, we present the experiment that was executed with the MT-MrSBC algorithm to build a predictive model for heart disease based on the available dataset.
• It also deploys the ensemble concepts of bagging and boosting that help to reduce the bias and error in prediction.
• The experimental result will show the better accuracy obtained using the proposed
model that achieves around 97% on three different datasets such as UCI Statlog,
Cleveland, and Framingham.
• It also compares with existing models such as logistic regression, K-nearest
neighbor, and decision tree.
The balance concept of this paper is organized as follows: In Sect. 2, related
works on the proposed model are surveyed. In Sects. 3 and 4, the proposed model is

Cardial Disease Prediction in Multi-variant … 23
explained with its architecture. In Sect. 5, the experimental results are shown in visual
representation. The future work is narrated in Sect. 6 that guides research toward
health sectors in disease classification. Finally, in Sect. 7, the work is concluded with
the proposed model necessity in detecting heart disease earlier.
2  Related Works 
The proposed method in this paper, MrSBC [3], explained the testing benchmark,
and results will be in an accurate manner. They explained about how it (MrSBC)
is efficient in solving problems first in an integrated approach, second in discrete
and continuous attributes, and third in the area of database schema by embedding
data model knowledge in generating rules. In the future, this model is useful for
comparison between other related data mining systems on a larger dataset.
The heart disease prediction [4] model is explained about how it was predicted
by advanced data mining techniques such as fuzzy logic. With the support from the member function, they had used fuzzy logic for execution comprising input variables,
rule-based, and output variables.
Ensemble learning [5] shows the combination of bagging and boosting methods
with six sub-classifiers. In each part, it used the concept of voting methodology for
predicting the final result. The main objective is to integrate numerous puny learners
into an expediently more stable one. Based on the standardized data, there are two
notorious research areas in ensemble learning: They are ensemble clustering and
ensemble classification. For clustering and classification concepts [
6], heterogeneous
networks were used for a multi-type model with one proposed method called HENPC for predictive purposes. For bagging and boosting methods, Quinlan [
7] had applied
the above-said techniques and produced results in the decision tree learning and testing made in the given datasets. Based on the error rates value and the result, it shows that the boosted and bagged classifiers are notably lower than other classifiers
by using 18 datasets out of 27 available in bagging to C4.5.
By mathematical probability solution given in the following section, [8] explained
with Gaussian models providing the result in linear mean part alone. For further
classification, in [9] the authors proposed a method based on the meta-path idea and
classified the objects collectively by describing them in a heterogeneous network.
A meta-path is a path consisting of an ordered link between two classified objects.
For predicting structured outputs [10] is applied by the way of learning models
with global and local tree ensembles employing different predicting tasks, such as hierarchical multi-label classification, multi-target regression, and classification. In
further work with some enhancement in global random forests, it can be useful in
feature ranking for structured outputs.
For more detail, in generating multiple versions of prediction, Breiman [11]
showed that the bagging method is performed well in predictive staging when applied to weak learners, and it provides how the accuracy is increased by doing changes

24 P. Nandakumar and S. Narayan
in the learning set. Classifier type is used with the power of NB and K-NN algo-
rithms, namely called locally weighted Naïve Bayes classifier. In this paper, they
have implemented a self-labeled weighted variant with Naïve Bayes as the base
classifier and performed it with other semi-supervised classification algorithms on
standard datasets. Finally, it concluded that the present technique had better accuracy
in most of the cases. Another technique is called self-labeling techniques which were
easy to implement and apply to almost all existing classifiers. These two points added
together with results found in [
12] used the Naïve Bayes model which guided us to
utilize the combination of bagging and boosting methods in the identical ensemble learning algorithm.
An extra idea for the issue we propose came from viable information by consid-
ering the prediction got at every cycle depending on various training sets. Each training set might zero in on various target attributes or on various properties of
the idea that should have been learned. Thus, we are normally ready to manage the multi-type classification strategy, additionally by enhancing predictions of various types of objects types.
3  Methodology 
In this part, we propose the disease prediction system related to cardiology which is based on an ensemble learning concept by MrSBC classifiers. Our proposed system is capable of predicting the disease of the patient with the help of the input, i.e.,
symptoms, given by them. To build a system that is capable of handling multi- variant data, we take in partially labeled heterogeneous data as our input and work iteratively on it. The ensemble approach is being richly exploited in our system.
The prior algorithm used here is multi-type multi-relational structural Naïve Bayes classifier (MT-MrSBC). The proposed model description is shown in (Fig.
1). It is an
enhancement of the MrSBC algorithm that is proved to give better results due to its
iterative nature. Initially, the partially unlabeled heterogeneous data gathered is being
segregated into labeled and unlabeled separately. The purpose of the segregation
is to train the system with labeled data first, followed by the unlabeled data. The
labeled data is being used to build a weak MrSBC model. This model cannot handle
diverse situations of the heterogenic environment. Hence, the above-said algorithm
is deployed along with the concepts of bagging and boosting using the unlabeled
data.
To express the concept of an ensemble in detail, the process of building and
combining numerous varied models to predict an outcome using a variety of modeling algorithms or training datasets. With the support of artificial intelligence where it is useful in an automatic diagnosis in healthcare sectors which is further subdivided into
machine learning that came into the picture with a lot of models focused on various human health-related problems. For more computational performance in diagnosing cardiac disease classification types, many traditional machine learning methods are
used and trained with multiple models that use the identical learning algorithm to

Cardial Disease Prediction in Multi-variant … 25
Fig. 1 
MT-MrSBC
produce an optimal solution. The ensembles take part in several groups of methods,
such as sequential and parallel which are called multi-classifiers, where a bunch of
hundreds or thousands of learners with a typical goal are combined to tackle the
specific issue.
The main cause of fault in learning comes from noise in data, bias, and variance.
The ensemble method helps it in limiting these variables. These strategies are chiefly intended to work on security and give the precision of machine learning models. The combinations of these numerous classifiers will bring about the decline in variance,
particularly in the space of unsound classifiers, and will create a more dependable classification than a solitary classifier.
The proposed model utilized the following dataset from the UCI repository such
as Cleveland, Statlog, and Framingham heart disease dataset. Cleveland dataset [13]
contains 76 attributes, but the most widely selected features are a subset of 14. In particular, most of the ML researchers used the Cleveland database. The UCI Statlog
(heart) dataset [
14] is a heart disease database that is the partially distinct form to a
database already present in the machine learning repository. The task used here is to detect the presence (1) or the absence (2) of heart disease. It has 270 instances with
13 attributes that have been used with no missing values. The UCI Framingham heart dataset [
15] was founded in the year 1948 under the U.S. Public Health Service and

26 P. Nandakumar and S. Narayan
was moved under the direct operations of the new national heart institute, NIH, in the
year 1949. It contains 4240 instances with 16 attributes. Participants were sampled
from Framingham, Massachusetts, that included both genders.
To use bagging and boosting concepts in the ensemble, a base learner algorithm is
required. The base learner algorithm used in our system is the MT-MrSBC algorithm. Bagging stands for bootstrap aggregation. Boosting refers to a group of algorithms that can convert weak learners to strong learners. Bagging and boosting are used to
get some N learners by producing enhanced data in the beginning phase. Then by the concept of random sampling, N new training datasets are created which has some changes from the original dataset. By sampling with new data, a few observations
might get rehashed in each new training dataset. In the case of bagging, any element with a similar probability has to appear in a new dataset. Nonetheless, the perceptions are scaled in boosting; consequently, some of them will partake in the new sets all
the more regularly.
Thus, we take in the labeled data from the heterogeneous data given and proceed
with first-order classification rules for developing a weak classifier called MrSBC.
On developing the weak model of MrSBC, we find that it is incapable of dealing
with the multi-variant nature of the system. The multi-variant nature of the system
here refers to the numerous symptoms of the patient which includes both primary
and secondary targets. The primary target indicates the common symptoms of the
patient. As cardiology is the mainstream here, common symptoms include chest
pain, sweating, etc. The secondary symptoms are the ones that add up to the decision
made by the system. It supplements the confirmation of the decision taken by the
system. The major risk factors that can predict heart disease in an individual are high
blood pressure that leads to hypertension, high LDL cholesterol that is usually called
as bad cholesterol, low HDL cholesterol that is formerly called good cholesterol,
high triglycerides, high blood glucose level, family history of early heart disease,
smoking, and physical inactivity.
This is followed by training the system with unlabeled data. The unlabeled data is
a mixture of primary and secondary targets. As mentioned in [8], the random picking
of the target types makes the system viable to many circumstances and accustomed to it. Hence, we pick a random sample which may be either a primary or a secondary target type and train the system. By training the system, we mean that a classifier is
built that can handle a similar situation if dealt with later in the system.
To build the classifier, we use the Naïve Bayes theorem which can be stated as
P(C|A 1 A2 ... n ) = ( A 1 A2 ... n|C )P(C)/(A 1 A2 ... n) (1)
Given a record with some attributes 1, 2,…, n, the goal here is to predict class
C
For each given symptom in the target type taken, the probability is calculated
based on the theorem stated above. At this stage, the concept of bagging is deployed here. Bagging stands for bootstrapping and aggregating. Bootstrapping refers to the splitting up of the target type into multiple symptoms and finding the individual

Cardial Disease Prediction in Multi-variant … 27
probability of occurrence of the specific symptom. The average of those probabilities
is found out of which the highest is chosen. This is known as aggregation (Fig. 1).
While calculating the probabilities, there is a possibility for the occurrence of
zero probability which affects the system. This can be handled by using Laplacian smoothing stated as follows:
P i|C (N ic + )/(N c + ) (2)
where N is the number of possible values of the property predicate with which it is
associated and c is the distinct words in the dataset.
This helps to facilitate the inclusion of missing values such that it does not affect
the probabilities of other symptom occurrences. The classifier built is being added to the labeled data to enhance the system in further prediction. The concept deployed
here is known as boosting. Thus, the ensemble concepts help in building the system further. On training the system, when the patient enters his symptoms as input to the system, it can make a prognosis of the disease that is more related to the symptoms
given by them. Due to the algorithm used, the system can cover a wide range of diseases that the patient may suffer. Hence, it is suitable to foresee the disease of the
patient before using our system.
4  System Architecture 
Explanation 
The proposed model architecture is shown in (Fig. 2). The above system undergoes
an initial training stage followed by a prediction stage. The admin trains the system
with partially labeled data using the MT-MrSBC algorithm with ensemble techniques
like bagging and boosting deployed. Though the system is initially a weak predictive
model, it transforms into a strong predictive model in subsequent iterations. Thus,
when the patient gives the symptoms he or she faced, as the input, the system predicts
the disease patient may have. The sequence in which the system works is depicted
above.
5  Experimental Approach 
Let us check how the work explained above will result in practically with a simple example. Suppose we are building a classifier that says the given symptoms lead to which of the given list of diseases. The different types of cardiac disease are shown
in (Fig.
3). Our training data has ten diseases that are given in Table 1 with their
relevant symptoms:

28 P. Nandakumar and S. Narayan
Fig. 2 
Fig. 3 
Now, what condition does S’s symptom belong to?
breath, fainting, stroke
Since Naive Bayes is a method for predicting probabilistic classifier, we need
to calculate the probability the symptoms belong to pericarditis (D1), myocardial
infarction (D2), angina pectoris (D3), etc.,
It can be calculated as

Cardial Disease Prediction in Multi-variant … 29
Table  1 
Disease
Symptoms Pericarditis Sharp pain in the left side of the chest, shortness of breath, heart
palpitation, heavy cough with fever, abdominal swelling, and
leg swelling Myocardial infarctionChest pain radiating to shoulder, arm, back, shortness of breath, cold sweat, fainting, hypertension, high blood cholesterol, smoking habit, diabetes Angina pectorisChest pain for less than 5 min, hypertension, clots in blood vessels ArrhythmiaHeartbeat does not work properly, fatigue or weakness, fainting Coronary arteryDecreasing blood flow, chest pain, shortness of breath Dilated cardiomyopathyShortness of breath, fatigue, and swelling of the ankles, feet, and legs Hypertrophic cardiomyopathy Chest pain, dizziness, shortness of breath, fainting Mitral regurgitationAbnormal heart sound (heart murmur), shortness of breath (dyspnea), fatigue Mitral valve prolapseFluttering or rapid heartbeat called palpitations, shortness of breath especially with exercise, dizziness, passing out or fainting known as syncope, panic, and anxiety Pulmonary stenosisHeart murmur, fatigue, shortness of breath, especially during exertion, chest pain, and loss of consciousness
P(D1 i ) = [P(D1 i ) ∗ (D1)]P(S i ) (3)
P(D2 i ) = [P(D2 i ) ∗ (D2)]P(S i ) (4)
P(D3 i ) = [P(D3 i ) ∗ (D3)]P(S i ) (5)
P(Si /) = (CP) ∗ (SoB) ∗ (F) ∗ (S) (6)
where CP—chest pain, SoB—shortness of breath,
From Fig. 4, out of the ten diseases, D2 (myocardial infarction) has the highest
probability. Then, the given symptoms refer to myocardial infarction. We can
conclude that the patient suffering from above given symptoms (S) (from Fig. 4
and Table 2) suffers from myocardial infarction. This gives a glimpse of how the
system works in training the system using the Naïve Bayes theorem.
The comparative analysis of the proposed model with existing models is given
in Table 3. It exhibits that the earlier machine learning models achieved an accu-
racy of around 78% to 95% with the different datasets by various researchers. The proposed model uses the concept of ensemble approach with MT-MrSBC which is
an extension of the Naïve Bayes classification method to the multi-relational setting

30 P. Nandakumar and S. Narayan
1.2031.101 1.0451.011
Pericarditis
Myocardial
Infarction
Angina
Pectoris
Arrhythmia
Coronary
artery
Dilated
cardiomyopa
thy
Hypertrophic
cardiomyopa
thy
Mitral
regurgitation
Mitral valve
prolapse
Pulmonary
stenosis
123456789
1.256 1.2111.3611.458 1.345 1.321
10
0
0.5
1
1.5
2

Type of Diseases
Fig. 4 
Table  2 
suffering from a different
kind of cardiac disease
S. No.
Disease
Va l u eResult
1 Pericarditis
1.361Medium
2Myocardial infarction
1.458High
3Angina pectoris
1.203Medium
4Arrhythmia
1.101Medium
5Coronary artery
1.256Medium
6Dilated cardiomyopathy
1.345Medium
7Hypertrophic cardiomyopathy
1.211Medium
8Mitral regurgitation
1.045Medium
9Mitral valve prolapses
1.011Medium
10Pulmonary stenosis
1.321Medium
and has achieved an accuracy of around 97%. The model execution starts with the
splitting of the training and testing datasets as labeled and unlabeled data. Based
on the symptoms or features, we have passed the valid input data to the proposed
ensemble model that classifies the final prediction of heart disease. Figure
5 shows
the visual representation of the proposed model performance. The results show that
the proposed MT-MrSBC achieves better accuracy than the earlier research.
6  Future Works 
Nowadays, the health-related issue is a disaster for all human beings. For predicting, analyzing, and treatment purposes, it needs so much cost, as well as for data handling
process, and is difficult. This system helps in predicting the disease of the patient according to the symptoms given by them. This work can be further enhanced by adding various features that include the suggestion of the various hospitals and avail-
able doctors in those hospitals. Patients can choose doctors at their convenience. Also,

Cardial Disease Prediction in Multi-variant … 31
Table  3 
References
Year
Models DatasetAccuracy (%)
Tay et al. [16] 2015
Support vector
machine (SVM)
Cardiovascular health study (CHS) dataset 95.3, 84.8, and 90.1
Mufudza and Erol [
17]
2016
Poisson mixture regression model Cleveland clinic foundation heart disease data set 86.7
Dogan et al. [18]2018
Random forest classification model for symptomatic CHD and integrated genetic-epigenetic algorithms Framingham heart study 78
Dwivedi [19]2018
Artificial neural network (ANN), support vector machine (SVM),
logistic regression, K-nearest neighbor (K-NN), classification tree, and Naïve Bayes
Statlog heart disease dataset 85
Haq et al. [20]2018
K-NN, ANN, SVM, DT, and NB Cleveland heart disease dataset 2016 83–89
Mohan et al. [21]2019
Hybrid random forest with a linear model (HRFLM) Cleveland dataset88.7
Javeed et al. [22]2019
Random search algorithm (RSA), random forest model, grid search algorithm Cleveland dataset93.33
Vankara and Devi [
23]
2020
Ensemble learning by cuckoo search UCI83–86
Proposed model: MT-MrSBC 2021 
Multi-type  multi-relational  structural Naïve  Bayes classifier UCI—Cleveland,  Statlog, and  Framingham  dataset 97.3
they can choose nearby hospitals from a stack of hospitals and select the doctors avail-
able in the cardiology department. The patient will be able to treat his disease in a
nearby available hospital which saves life and time. In real time, the patient has to
wait for the appointment, whereas doing it online saves time in emergencies. To
enhance the system, it can also provide appointments online with the chosen doctor,
and also patients could see the available doctors on that particular day or week. The

32 P. Nandakumar and S. Narayan
Fig. 5 
patient will also be able to fix an appointment and look for availability of doctors
and be provided with temporary diagnosis to be aware of right on the spot within
a few minutes. In the future, our model can be enhanced by applying feature selec-
tion concepts that are not used in existing types. The major problem in machine
learning techniques is the high dimensionality from the datasets that leads to an
analysis of many features which results in a large amount of storage requirements.
For this reason, in the future, the proposed model can be further enhanced with the
bio-inspired algorithm for producing an optimal result.
7  Conclusion 
Heart attack is a crucial health problem in human society. It is the major health challenge in the globe for the last decade. Nowadays, many research works have been
carried out to predict, detect, and diagnose heart disease, which could prove to be vital in combating the disease. Since it is creating a maximum portion of the information that medicos are incapable to deduce and use expertly, machine learning with an
ensemble model has transpired as a more errorless and potent model in a voluminous kind of medical obstacles such as prognosis, prophecy, and invasion. This paper results facilitate taking precautionary measures to control heart disease. The possible
advantages of applying machine learning strategies with a reasonable calculation will lessen the illness rates and passing disappointments of the worldwide populace. The models here are trained and validated against a sample test dataset. The Naïve
Bayes algorithm is the most efficient model to predict patients with heart disease. This paper has summarized the state for predicting cardiac disease using multi-type
multi-relational structural Naïve Bayes classifier (MT-MrSBC) concerning ease of

Cardial Disease Prediction in Multi-variant … 33
model interpretation and accuracy. In our work, good results with improved accuracy
were given in the prediction of cardiac diseases.
Acknowledgements 
Funding No funding for this research work.
Compliance  with  Ethical  Standards 
participants performed by any of the authors.
References 
1. Virani SS et al (2020) Heart disease and stroke statistics—2020 update: a report from the American heart association. Circulation 141(9):e139–e596
2. Psaltopoulou T et al (2017) Socioeconomic status and risk factors for cardiovascular disease: impact of dietary mediators. Hellenic J Cardiol 58(1):32–42
3. Ceci M, Appice A, Malerba D (2003) Mr-SBC: a multi-relational Naive Bayes classifier. In: PKDD. Springer, pp 95–106
4. Kowsigan M et al (2017) Heart disease prediction by analysing various parameters using fuzzy logic. Pak J Biotechnol 14(2):157–161
5. Kotsiantis S (2011) Combining bagging, boosting, rotation forest and random subspace methods. Artif Intell Rev 35(3):223–240
6. Pio G et al (2018) Multi-type clustering and classification from heterogeneous networks. Inf Sci 425(2018):107–126
7. Quinlan JR (1996) Bagging, boosting, and c4. 5. In: AAAI/IAAI, vol 1, pp 725–730
8. Roberts GO, Sahu SK (1997) Updating schemes, correlation structure, blocking and parame- terization for the Gibbs sampler. J R Statis Soc: Ser B (Statis Methodol) 59(2):291–317
9. Kong X et al (2012) Meta path-based collective classification in heterogeneous information networks. In: Proceedings of the 21st ACM international conference on Information and
knowledge management, pp 1567–1571
10. Kocev D et al (2013) Tree ensembles for predicting structured outputs. Patt Recogn 46(3):817– 833
11. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
12. Karlos S et al (2017) Locally application of Naive Bayes for self-training. Evol Syst 8(1):3–18
13. Detrano R et al (1989) International application of a new probability algorithm for the diagnosis of coronary artery disease. The Am J Cardiol 64(5):304–310
14. Xiao L, Wang X et al (2017) A hybrid classification system for heart disease diagnosis based on the RFRS method. Comput Math Meth Med 2017:11. Article ID 8272091.
https://doi.org/
10.1155/2017/8272091
15. Splansky GL, Corey D et al (2007) The third generation cohort of the national heart, lung, and blood institute’s Framingham heart study: design, recruitment, and initial examination. Am J Epidemiol 165(11):1328–1335.
https://doi.org/10.1093/aje/kwm021 Epub 2007 Mar 19
PMID: 17372189
16. Tay D et al (2014) The effect of sample age and prediction resolution on myocardial infarction risk prediction. IEEE J Biomed Health Inform 19(3):1178–1185
17. Mufudza C, Erol H (2016) Poisson mixture regression models for heart disease prediction. Comput Math Meth Med
18. Dogan MV et al (2018) Integrated genetic and epigenetic prediction of coronary heart disease in the Framingham heart study. PloS One 13(1):e0190549

34 P. Nandakumar and S. Narayan
19. Dwivedi AK (2018) Performance evaluation of different machine learning techniques for
prediction of heart disease. Neural Comput Appl 29(10):685–693
20. Haq AUl et al (2018) A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mobile Inf Syst
21. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554
22. Javeed A et al (2019) An intelligent learning system based on random search algorithm and optimized random forest model for improved heart disease detection. IEEE Access 7(2019):180235–180243
23. Vankara J, Devi GL (2020) PAELC: predictive analysis by ensemble learning and classification heart disease detection using beat sound. Int J Speech Technol 23(1):31–43

Random documents with unrelated
content Scribd suggests to you:

The Project Gutenberg eBook of The Romany
Rye

This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Title: The Romany Rye
Author: George Borrow
Editor: William I. Knapp
Release date: January 24, 2017 [eBook #54048]
Most recently updated: August 9, 2019
Language: English
Credits: Transcribed from the 1907 John Murray edition by David
Price
*** START OF THE PROJECT GUTENBERG EBOOK THE ROMANY RYE
***

There are several editions of this ebook in the Project
Gutenberg collection. Various characteristics of each ebook
are listed to aid in selecting the preferred file.
Click on any of the filenumbers below to quickly view each
ebook.
21206(Plain HTML file)
25071(Plain HTML file)
422(Plain HTML file)
54048(Illustrated HTML file)
Transcribed from the 1907 John Murray edition by David Price, email
[email protected]

THE ROMANY RYE
A SEQUEL TO “LAVENGRO”
 
BY GEORGE BORROW
 
A NEW EDITION CONTAINING THE UNALTERED
TEXT OF THE ORIGINAL ISSUE, WITH
NOTES.  ETC., BY THE AUTHOR OF
“THE LIFE OF GEORGE BORROW”
 
LONDON
JOHN MURRAY, ALBEMARLE STREET
1907
First Edition  1857
Second Edition  1858
Third Edition  1872
Fourth Edition  1888
Fifth Edition  1896
Siñth (Definitive ) Edition 6/-March, 1900
Reprinted  June, 1903
Reprinted Thin PaperAug., 1905
Reprinted 6/- Oct., 1906
Reprinted  Sept., 1907

Reprinted 2/6 netSept., 1907

ADVERTISEMENT.
(1857.)
It having been frequently stated in print that the book called
Lavengro was got up expressly against the popish agitation in the
years 1850–51, the author takes this opportunity of saying that the
principal part of that book was written in the year ’43, that the whole
of it was completed before the termination of the year ’46, and that it
was in the hands of the publisher in the year ’48.
[0a]
  And here he
cannot forbear observing, that it was the duty of that publisher to
have rebutted a statement which he knew to be a calumny; and also
to have set the public right on another point dealt with in the
Appendix to the present work, more especially as he was the
proprietor of a Review enjoying, however undeservedly, a certain sale
and reputation.
      But take your own part, boy!
For if you don’t, no one will take it for you.
With respect to Lavengro, the author feels that he has no reason to
be ashamed of it.  In writing that book he did his duty, by pointing
out to his country-people the nonsense which, to the greater part of
them, is as the breath of their nostrils, and which, if indulged in, as it
probably will be, to the same extent as hitherto, will, within a very
few years, bring the land which he most loves beneath a foreign yoke
—he does not here allude to the yoke of Rome.
Instead of being ashamed, has he not rather cause to be proud of a
book which has had the honour of being rancorously abused and

execrated by the very people of whom the country has least reason
to be proud?
[0b]
 
“One day Cogia Efendy went to a bridal festival.  The masters of
the feast, observing his old and coarse apparel, paid him no
consideration whatever.  The Cogia saw that he had no chance of
notice; so going out, he hurried to his house, and, putting on a
splendid pelisse, returned to the place of festival.  No sooner did
he enter the door than the masters advanced to meet him, and
saying, ‘Welcome, Cogia Efendy,’ with all imaginable honour and
reverence, placed him at the head of the table, and said, ‘Please
to eat, Lord Cogia’.  Forthwith the Cogia, taking hold of one of
the furs of his pelisse, said, ‘Welcome my pelisse; please to eat,
my lord’.  The masters looking at the Cogia with great surprise,
said, ‘What are you about?’  Wher eupon the Cogia replied, ‘As it
is quite evident that all the honour paid, is paid to my pelisse, I
think it ought to have some food too’.”—Pleasantries of the Cogia
Nasr Eddin Efendi.

CONTENTS.
  PAGE
CHAPTER I.
The Making of the Linch-pin—The Sound Sleeper—Breakfast—
The Postillion’s Departure
1
CHAPTER II.
The Man in Black—The Emperor of Germany—Nepotism—Donna
Olympia—Omnipotence—Camillo Astalli—The Five Propositions
5
CHAPTER III.
Necessity of Religion—The Great Indian One—Image-worship—
Shakespeare—The Pat Answer—Krishna—Amen
9
CHAPTER IV.
The Proposal—The Scotch Novel—Latitude—Miracles—Pestilent
Heretics—Old Fraser—Wonderful Texts—No Armenian
16
CHAPTER V.
Fresh Arrivals—Pitching the Tent—Certificated Wife—High-flying
Notions
28
CHAPTER VI.
The Promised Visit—Roman Fashion—Wizard and Witch—
Catching at Words—The Two Females—Dressing of Hair—The
New Roads—Belle’s Altered Appearance—Herself Again
32
CHAPTER VII.

The Festival—The Gypsy Song—Piramus of Rome—The
Scotchman—Gypsy Names
40
CHAPTER VIII.
The Church—The Aristocratical Pew—Days of Yore—The
Clergyman—“In what would a Man be Profited?”
48
CHAPTER IX.
Return from Church—The Cuckoo and Gypsy—Spiritual Discourse53
CHAPTER X.
Sunday Evening—Ursula—Action at Law—Meridiana—Married
Already
60
CHAPTER XI.
Ursula’s Tale—The Patteran—The Deep Water—Second Husband72
CHAPTER XII.
The Dingle at Night—The Two Sides of the Question—Roman
Females—Filling the Kettle—The Dream—The Tall Figure
78
CHAPTER XIII.
Visit to the Landlord—His Mortifications—Hunter and his Clan—
Resolution
86
CHAPTER XIV.
Preparations for the Fair—The Last Lesson—The Verb Siriel 89
CHAPTER XV.
The Dawn of Day—The Last Farewell—Departure for the Fair—
The Fine Horse—Return to the Dingle—No Isopel
95
CHAPTER XVI.
Gloomy Forebodings—The Postman’s Mother—The Letter—Bears
and Barons—The Best of Advice
99
CHAPTER XVII.

The Public-house—Landlord on His Legs Again—A Blow in
Season—The Way of the World—The Grateful Mind—The Horse’s
Neigh
106
CHAPTER XVIII.
Mr. Petulengro’s Device—The Leathern Purse—Consent to
Purchase a Horse
113
CHAPTER XIX.
Trying the Horse—The Feats of Tawno—Man with the Red
Waistcoat—Disposal of Property
117
CHAPTER XX.
Farewell to the Romans—The Landlord and his Niece—Set out as
a Traveller
122
CHAPTER XXI.
An Adventure on the Roads—The Six Flint Stones—A Rural Scene
—Mead—The Old Man and his Bees
124
CHAPTER XXII.
The Singular Noise—Sleeping in a Meadow—The Book—Cure for
Wakefulness—Literary Tea Party—Poor Byron
131
CHAPTER XXIII.
Drivers and Front Outside Passengers—Fatigue of Body and Mind
—Unexpected Greeting—My Inn—The Governor—Engagement
136
CHAPTER XXIV.
An Inn of Times gone by—A First-rate Publican—Hay and Corn—
Old-fashioned Ostler—Highwaymen—Mounted Police—Grooming
140
CHAPTER XXV.
Stable Hartshorn—How to Manage a Horse on a Journey—Your
Best Friend
145
CHAPTER XXVI.

The Stage-coachmen of England—A Bully Served Out—
Broughton’s Guard—The Brasen Head
150
CHAPTER XXVII.
Francis Ardry—His Misfortunes—Dog and Lion Fight—Great Men
of the World
158
CHAPTER XXVIII.
Mr. Platitude and the Man in Black—The Postillion’s Adventures—
The Lone House—A Goodly Assemblage
163
CHAPTER XXIX.
Deliberations with Self—Resolution—Invitation to Dinner—The
Commercial Traveller—The Landlord’s Offer—The Comet Wine
170
CHAPTER XXX.
Triumphal Departure—No Season like Youth—Extreme Old Age—
Beautiful England—The Ratcatcher—A Misadventure
175
CHAPTER XXXI.
Novel Situation—The Elderly Individual—The Surgeon—A Kind
Offer—Chimerical Ideas—Strange Dream
179
CHAPTER XXXII.
The Morning after a Fall—The Teapot—Unpretending Hospitality
—The Chinese Student
185
CHAPTER XXXIII.
Convalescence—The Surgeon’s Bill—Letter of Recommendation—
Commencement of the Old Man’s History
191
CHAPTER XXXIV.
The Old Man’s Story continued—Misery in the Head—The
Strange Marks—Tea-dealer from London—Difficulties of the
Chinese Language
201
CHAPTER XXXV.

The Leave-taking—Spirit of the Hearth—What’s o’Clock 209
CHAPTER XXXVI.
Arrival at Horncastle—The Inn and Ostlers—The Garret—The
Figure of a Man with a Candle
211
CHAPTER XXXVII.
Horncastle Fair 214
CHAPTER XXXVIII.
High Dutch 221
CHAPTER XXXIX.
The Hungarian 223
CHAPTER XL.
The Horncastle Welcome—Tzernebock and Bielebock 238
CHAPTER XLI.
The Jockey’s Tale—Thieves’ Latin—Liberties with Coin—The
Smasher in Prison—Old Fulcher—Every one has his Gift—Fashion
of the English
244
CHAPTER XLII.
A Short-tempered Person—Gravitation—The Best Endowment—
Mary Fulcher—Fair Dealing—Horse-witchery—Darius and his
Groom—The Jockey’s Tricks—The Two Characters—The Jockey’s
Song
258
CHAPTER XLIII.
The Church 273
CHAPTER XLIV.
An Old Acquaintance 276
CHAPTER XLV.
Murtagh’s Tale 283

CHAPTER XLVI.
Murtagh’s Story continued—The Priest, Exorcist, and Thimble-
engro—How to Check a Rebellion
290
CHAPTER XLVII.
Departure from Horncastle—Recruiting Sergeant—Kauloes and
Lolloes
300
 
APPENDIX.
CHAPTER I.
A Word for Lavengro 302
CHAPTER II.
On Priestcraft 310
CHAPTER III.
On Foreign Nonsense 317
CHAPTER IV.
On Gentility Nonsense—Illustrations of Gentility 320
CHAPTER V.
Subject of Gentility continued 323
CHAPTER VI.
On Scotch Gentility Nonsense—Charlie o’er the Waterism 334
CHAPTER VII.
Same subject continued 341
CHAPTER VIII.
On Canting Nonsense 346
CHAPTER IX.
Pseudo-Critics 354

CHAPTER X.
Pseudo-Radicals 362
CHAPTER XI.
The Old Radical 368
 
Editor’s Notes 379
Gypsy List 389
Bibliography 393

LIST OF ILLUSTRATIONS.
East Dereham, Norfolk (referred to as “Pretty D—,”
George Borrow’s Birthplace) (photogravure)
Frontispiece
The Old Church, St. Giles, at Willenhall, Staffordshire
(rebuilt 1867)
To face page
48
Porch of St. Nicholas Church, East Dereham 50
The Old “Bull’s Head,” Wolverhampton Street,
Willenhall
106
The “Swan” Inn, Stafford (“My Inn—a very large
Building with an Archway”)
136
High Street, Horncastle 215
The Horse Fair, Horncastle 220
Horncastle Church in 1820 (since restored) 273

CHAPTER I.
I awoke at the first break of day, and, leaving the postillion fast
asleep, stepped out of the tent.  The dingle w as dank and dripping.  I
lighted a fire of coals, and got my forge in readiness.  I then
ascended to the field, where the chaise was standing as we had left it
on the previous evening.  After looking at the cloud-stone near it,
now cold, and split into three pieces, I set about prying narrowly into
the condition of the wheel and axle-tree.  The latter had sustained no
damage of any consequence, and the wheel, as far as I was able to
judge, was sound, being only slightly injured in the box.  The only
thing requisite to set the chaise in a travelling condition appeared to
be a linch-pin, which I determined to make.  Going to the companion
wheel, I took out the linch-pin, which I carried down with me to the
dingle, to serve me as a model.
I found Belle by this time dressed, and seated near the forge.  With a
slight nod to her like that which a person gives who happens to see
an acquaintance when his mind is occupied with important business,
I forthwith set about my work.  S electing a piece of iron which I
thought would serve my purpose, I placed it in the fire, and plying
the bellows in a furious manner, soon made it hot; then seizing it with
the tongs, I laid it on the anvil, and began to beat it with my
hammer, according to the rules of my art.  The dingle resounded with
my strokes.  Belle sat still, and occasionally smiled, but suddenly
started up, and retreated towards her encampment, on a spark which
I purposely sent in her direction alighting on her knee.  I f ound the
making of a linch-pin no easy matter; it was, however, less difficult
than the fabrication of a pony-shoe; my work, indeed, was much
facilitated by my having another pin to look at.  I n about three-

quarters of an hour I had succeeded tolerably well, and had produced
a linch-pin which I thought would serve.  During all this time,
notwithstanding the noise which I was making, the postillion never
showed his face.  His non-appearance at first alarmed me: I was
afraid he might be dead, but, on looking into the tent, I found him
still buried in the soundest sleep.  “He must surely be descended
from one of the seven sleepers,” said I, as I turned away, and
resumed my work.  My work finished, I took a little oil, leather and
sand, and polished the pin as well as I could; then, summoning Belle,
we both went to the chaise, where, with her assistance, I put on the
wheel.  The l inch-pin which I had made fitted its place very well, and
having replaced the other, I gazed at the chaise for some time with
my heart full of that satisfaction which results from the consciousness
of having achieved a great action; then, after looking at Belle in the
hope of obtaining a compliment from her lips, which did not come, I
returned to the dingle, without saying a word, followed by her.  Belle
set about making preparations for breakfast; and I taking the kettle
went and filled it at the spring.  Ha ving hung it over the fire, I went
to the tent in which the postillion was still sleeping, and called upon
him to arise.  He a woke with a start, and stared around him at first
with the utmost surprise, not unmixed, I could observe, with a
certain degree of fear.  At last, looking in my face, he appeared to
recollect himself.  “I had quite forgot,” said he, as he got up, “where I
was, and all that happened yesterday.  However, I remember now the
whole affair, thunder-storm, thunder-bolt, frightened horses, and all
your kindness.  Come, I must see after m y coach and horses; I hope
we shall be able to repair the damage.”  “The damage is already quite
repaired,” said I, “as you will see, if you come to the field above.” 
“You don’t say so,” said the postillion, coming out of the tent; “well, I
am mightily beholden to you.  Good-morning, young gentlewoman,”
said he, addressing Belle, who, having finished her preparations was
seated near the fire.  “Good morning, young man,” said Belle, “I
suppose you would be glad of some breakfast; however, you must
wait a little, the kettle does not boil.”  “Come and look at your
chaise,” said I; “but tell me how it happened that the noise which I
have been making did not awake you; for three-quarters of an hour

at least I was hammering close at your ear.”  “I heard you all the
time,” said the postillion, “but your hammering made me sleep all the
sounder; I am used to hear hammering in my morning sleep.  There’s
a forge close by the room where I sleep when I’m at home, at my
inn; for we have all kinds of conveniences at my inn—forge,
carpenter’s shop, and wheelwright’s—so that when I heard you
hammering I thought, no doubt, that it was the old noise, and that I
was comfortable in my bed at my own inn.”  We now ascended to the
field, where I showed the postillion his chaise.  He look ed at the pin
attentively, rubbed his hands, and gave a loud laugh.  “I s it not well
done?” said I.  “It wi ll do till I get home,” he replied.  “And that is all
you have to say?” I demanded.  “ And that’s a good deal,” said he,
“considering who made it.  But don’t be offended,” he added, “I shall
prize it all the more for its being made by a gentleman, and no
blacksmith; and so will my governor, when I show it to him.  I shan’ t
let it remain where it is, but will keep it, as a remembrance of you, as
long as I live.”  He then again rubbed his hands with great glee, and
said: “I will now go and see after my horses, and then to breakfast,
partner, if you please”.  Suddenly, however, looking at his hands, he
said, “Before sitting down to breakfast I am in the habit of washing
my hands and face; I suppose you could not furnish me with a little
soap and water”.  “As much water as you please,” said I, “but if you
want soap, I must go and trouble the young gentlewoman for some.” 
“By no means,” said the postillion, “water will do at a pinch.”  “Follow
me,” said I, and leading him to the pond of the frogs and newts, I
said, “this is my ewer; you are welcome to part of it—the water is so
soft that it is scarcely necessary to add soap to it;” then lying down
on the bank, I plunged my head into the water, then scrubbed my
hands and face, and afterwards wiped them with some long grass
which grew on the margin of the pond.  “B ravo,” said the postillion, “I
see you know how to make a shift”: he then followed my example,
declared he never felt more refreshed in his life, and, giving a bound,
said, “he would go and look after his horses”.
We then went to look after the horses, which we found not much the
worse for having spent the night in the open air.  My companion

again inserted their heads in the corn-bags, and, leaving the animals
to discuss their corn, returned with me to the dingle, where we found
the kettle boiling.  W e sat down, and Belle made tea, and did the
honours of the meal.  The posti llion was in high spirits, ate heartily,
and, to Belle’s evident satisfaction, declared that he had never drank
better tea in his life, or indeed any half so good.  B reakfast over, he
said that he must now go and harness his horses, as it was high time
for him to return to his inn.  B elle gave him her hand and wished him
farewell.  The postillion shook her hand warmly, and was advancing
close up to her—for what purpose I cannot say—whereupon Belle,
withdrawing her hand, drew herself up with an air which caused the
postillion to retreat a step or two with an exceedingly sheepish look. 
Recovering himself, however, he made a low bow, and proceeded up
the path.  I at tended him, and helped to harness his horses and put
them to the vehicle; he then shook me by the hand, and taking the
reins and whip mounted to his seat; ere he drove away he thus
addressed me: “If ever I forget your kindness and that of the young
woman below, dash my buttons.  If ever either of you should enter
my inn you may depend upon a warm welcome, the best that can be
set before you, and no expense to either, for I will give both of you
the best of characters to the governor, who is the very best fellow
upon all the road.  As for your linch-pin, I trust it will serve till I get
home, when I will take it out and keep it in remembrance of you all
the days of my life”: then giving the horses a jerk with his reins, he
cracked his whip and drove off.
I returned to the dingle, Belle had removed the breakfast things, and
was busy in her own encampment.  Nothing oc curred, worthy of
being related, for two hours, at the end of which time Belle departed
on a short expedition, and I again found myself alone in the dingle.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com