Ppt on voice morphing:two identities in one voice

shreyanikam07 24 views 19 slides Oct 08, 2024
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Ppt presentation on voice morphing:two identities in one voice


Slide Content

PresentedBy
NikamShreya Rajendra.
Third Year of Artificial Intelligence and Data Science
Engineering
Department of Artificial Intelligence and Data Science Engineering
JaihindCollege of Engineering, Kuran
SAVITRIBAI PHULE PUNE UNIVERSITY
2023-2024
Presentation on
“ Voice Morphing : Two Identities In One Voice”

CONTENTS:
•Introduction
•ProblemStatement
•Motivation
•Goal/Objectives
•Literaturesurvey
•SystemArchitecture
•AlgorithmicsurveyandSelectedAlgorithm
•ResultsAnalysis
•Advantages&Application
•Futurescope
•References
•Conclusion

INTRODUCTION :
Biometricsystemsusephysicalorbehavioraltraitstorecognizeindividual.Abiometricsystem
acquiresabiometricsampleofanindividual(e.g.,voice)usingasensor(e.g.,microphone)and
extractsasalientfeatureset(ortemplate).Thistemplateisthenusedtorecognizetheindividual.
Typically,atemplateisassociatedwithasingleidentity.However,overthepastdecade,several
adversarialtechniques,calledmorphattacks,havebeendevelopedtocreatesyntheticbiometric
samplesthatcansuccessfullymatchmultipleidentities.Inrecenttimes,DeepFakebasedsynthetic
imagegeneratorshavebeenusedtolaunchmorphattacksonimage-basedbiometricsystems,viz.,
face,fingerprint,andiris,withhighsuccessrates.Hence,VoiceMorphingisintroduced.

PROBLEM STATEMENT :
Invariousapplicationssuchasentertainment,security,andtelecommunications,thereisagrowing
demandforcreatinghybridvoicesthatcombinecharacteristicsoftwodifferentindividuals.This
process,knownasvoicemorphing,aimstoblenduniquevocaltraitssuchaspitch,tone,accent,and
speechpatternsoftwospeakersintoasingle,coherentoutput.

MOTIVATION :
1.Creative Applications in Entertainment
2. Personalization in Virtual Assistants
3. Medical and Therapeutic Applications
4. Security and Identity Protection
5. Psychological and Emotional Impact

GOALS/OBJECTIVES :
1.CreateaDistinctHybridVoice
2.EnsureNaturalnessandIntelligibility
3.CulturalandCreativeProjects
4.EntertainmentandMediaEffects
5.CustomizationAcrossLangaugesandAccents

LITERATURESURVEY:
NameofPaperAuthor(s)YearDescription Drawback(s)
1."VoiceMorphing:
TwoIdentitiesinOne
Voice
SushantaK.
Pani,Anurag
Chowdhury,
MorganSandler,
ArunRoss
20201
Thispaperintroducesatechniquecalled
VoiceIdentityMorphing(VIM),which
createssynthesizedvoicesamplesthat
impersonatetwodistinctindividuals.
Thecombinationoftwo
voicesoftenleadstoa
declineinnaturalnessand
intelligibility\whenblending
multipleidentities,asthe
acousticfeaturesofboth
voicescanclash.
2.AnalysisofSpeaker
RecognitionSystems
under Voice
MorphingAttacks
Gaëtan
Boulianne,Taras
Kryze,Richard
Michaud
2022
Thispaperexploreshowdifferent
speakerrecognitionsystemshandle
voicemorphingattacks,focusingonhow
identitymorphingcanfoolsystems
Speakerrecognitionsystems
arebecomingincreasingly
adeptatdetectingsyntheticor
morphedvoices.

3."Vulnerability of
Speaker Verification
SystemstoVoice
Morphing"
Nils
Bomash,
MattHayes,
Alexander
Yantsevich
2023
Thisresearchexamineshowvoice
morphingtechniquescanbeusedto
compromisespeakerverification
systems,especiallyinsecure
communicationsystems.
Voicemorphingposes
significantethicaland
securityconcerns.Ifmisused,
itcanbeatoolforidentity
theftorfraud.
4.Towards Improved
DetectionofVoice
MorphingAttacksin
BiometricSystems
Guojun
Chen
2022
Thefocusofthisstudyisoncreating
detectionmechanismstoidentify
morphedvoicesinbiometricsystems,
particularlyinsettingswheremorphing
twoidentitiesintoonevoicecanbea
securitythreat.
Generating high-quality
morphedvoicesrequires
significantcomputational
powerandtime.

SYSTEM ARCHITECTURE :

ALGORITHMIC SURVEY :
1.Voice Preprocessing:
Input: Two distinct voice samples (Voice A and Voice B).
Noise Removal: Apply noise reduction techniques to clean the input voices for higher quality processing.
2. Feature Mapping:
Normalize Audio Features: Normalize the extracted features from both voices to ensure they are in a
compatible format for blending.
3. Voice Morphing Process:WeightingFactor Selection: Choose weighting factors (α, β) to control
the blending ratio between Voice A and Voice B. The formula for the morphed voice (Voice C) can be
represented as:
\text{Voice C} = \alpha \cdot\text{Voice A} + \beta \cdot\text{Voice B}
4. Emotion and Expression Blending:EmotionMapping: Analyze and map the emotional states in
both voices (e.g., happy, sad, angry) to preserve and blend emotional content in the morphed voice.Smooth
Transitions: Ensure smooth transitions between phonemes, prosody, and emotional expressions to avoid
abrupt changes in the final voice output.

ALGORITHMIC SURVEY :
5.Post-Processing:
NoiseAddition:Re-addbackgroundnoiseorreverbifnecessarytomatchtheintendedenvironment(e.g.,live
setting,studioquality).
PitchandSpeedAdjustment:Fine-tunethepitch,speechspeed,andtonetofurtherenhancethenaturalnessof
themorphedvoice.
6.Output:MorphedVoice:Thefinalvoiceoutput,whichisahybridofVoiceAandVoiceB,isready
forplayback,real-timeapplications,orfurthermodifications.
OptionalSteps:Real-TimeImplementation:Optimizethealgorithmforreal-timevoicemorphing,reducing
latencyforliveapplicationssuchastelecommunicationorlivebroadcasting.
Cross-LanguageSupport:Expandthealgorithmtoworkacrossdifferentlanguages,accents,anddialectsby
integratinglanguagemodels.

FIGURE . overview of voice conversion on the basis of voice morphing.

RESULT ANALYSIS :
Analysis Parameter Description Observation
1.Voice Quality Resemblanceof the morphed
voice to the original identities.
Degree of similarity to original
voices;naturalnessscore.
2.Smoothness of TransitionSeamlessblending of vocal
characteristics.
No noticeable breaks or
artifactsbetween transition.
3.Pitch Average pitch and pitch
variation in morphed voices.
Pitch lies between two voices
4.Emotion Transfer Ability of morphed voicesto
convey emotions from original
voices.
Retaionsemotional tone from
both identities.
5.Word Recognization
Accuracy
Percentage of correctly
recognizedwords from the
morphed voice.
High accuracy in word
recognizationon by listeners.

ADVANTAGES & APPLICATIONS :
Advantages:
1.UniqueVocalEffects
2.EntertainmentIndustry
3.LanguageLearningandTranslation
4.MarketingandBranding
5.Privacy
6.Personalization

ADVANTAGES & APPLICATIONS :
Applications:
1. Entertainment and Media : Animated Movies and TV Shows.
2. Film Post-Production : Dubbing and ADR.
3.Gaming and Virtual Reality.
4.Personalized Voices for Assistants.
5. Marketing and Branding.
6. Language Learning and Translation.

FUTURE SCOPE:
Thefuturescopeofvoicemorphingtwoidentitiesintoonevoiceholdsimmensepotential,drivenby
advancementsinAI,machinelearning,anddeeplearningtechnologies.Inthecomingyears,wecan
expectmoreseamlessandreal-timevoicemorphingapplications,enhancingpersonalizationand
customizationacrossvariousindustries.Virtualassistants,entertainment,andsocialmediaplatforms
couldofferhyper-personalizedvoices,blendinguserpreferenceswithfamiliarorsyntheticidentities.
Inentertainment,voiceactorscouldleveragevoicemorphingtoportraymultiplecharacters,while
gamingandvirtualrealityenvironmentscouldfeaturedynamiccharactervoicesthatadaptbasedon
userinteractions

REFERENCES :
1)Chowdhury,Anurag;Ross,Arun:DeepVOX:DiscoveringFeaturesFromRawAudioForSpeaker
RecognitioninNon-IdealAudioSignals.arXivpreprintarXiv:2008.11668,2020.
2)Chowdhury,Anurag;Ross,Arun;David,Prabu:DeepTalk:VocalStyleEncodingforSpeaker
RecognitionandSpeechSynthesis.In:IEEEInternationalConferenceonAcoustics,SpeechandSignal
Processing(ICASSP).pp.6189–6193,2021.
3)Ferrara,Matteo;Franco,Annalisa;Maltoni,Davide;Busch,Christoph:MorphingAttackPotential.
In:InternationalWorkshoponBiometricsandForensics(IWBF).IEEE,pp.1–6,2022.
4)Chung,J.S.;Nagrani,A.;Zisserman,A.:VoxCeleb2:DeepSpeakerRecognition.In:
INTERSPEECH.2018
5)Desplanques,Brecht;Thienpondt,Jenthe;Demuynck,Kris:ECAPA-TDNN:EmphasizedChannel
Attention,PropagationandAggregationinTDNNBasedSpeakerVerification.In:INTERSPEECH
2020.pp.3830–3834.
.

CONCLUSION :
Inconclusion,voicemorphingthatcombinestwoidentitiesintoonevoicestandsattheforefrontof
technologicalinnovation,offeringexcitingpossibilitieswhilealsonecessitatingcarefulconsiderationofits
limitationsandethicalimplications.Asthetechnologymatures,ithasthepotentialtorevolutionizehowwe
interactwithmachinesandeachother,makingcommunicationmorepersonalized,inclusive,and
expressive

THANK YOU !