Audio Signal Processing Basics, mirtoolbox contains many useful audio processing library functions

nisharobinrohit 69 views 43 slides Aug 17, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Audio Signal Processing Basics, mirtoolbox contains many useful audio processing library functions


Slide Content

Unit 3 Audio Signal Processing Basics, mirtoolbox contains many useful audio processing library functions, VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . 1

Audio Signal Processing Basics What is audio data? Audio data represents analog sounds in a digital form, preserving the main properties of the original. As we know from school lessons in physics, a sound is a wave of vibrations traveling through a medium like air or water and finally reaching our ears. It has three key characteristics to be considered when analyzing audio data — time period, amplitude, and frequency. Time period is how long a certain sound lasts or, in other words, how many seconds it takes to complete one cycle of vibrations 2

Audio Signal Processing Basics Amplitude is the sound intensity measured in decibels (dB) which we perceive as loudness. Frequency measured in Hertz (Hz) indicates how many sound vibrations happen per second. People interpret frequency as low or high pitch. While frequency is an objective parameter, the pitch is subjective. The human  hearing range  lies between 20 and 20,000 Hz. Scientists claim that most people perceive as low pitch all sounds below 500 Hz — like the plane engine roar. In turn, high pitch for us is everything beyond 2,000 Hz (for example, a whistle.) 3

Audio Signal Processing Basics Audio data file formats Similar to texts and images, audio is  unstructured data  meaning that it’s not arranged in tables with connected rows and columns. Instead, you can store audio in various file formats like WAV or WAVE (Waveform Audio File Format) developed by Microsoft and IBM. It’s a lossless or raw file format meaning that it doesn’t compress the original sound recording; AIFF (Audio Interchange File Format) developed by Apple. Like WAV, it works with uncompressed audio; FLAC (Free Lossless Audio Codec) developed by Xiph.Org Foundation that offers free multimedia formats and software tools. FLAC files are compressed without losing sound quality. 4

Audio Signal Processing Basics MP3 ( mpeg ( Moving Picture Experts Group ) -1 audio layer 3) developed by the Fraunhofer Society in Germany and supported globally. It’s the most common file format since it makes music easy to store on portable devices and send back and forth via the Internet. Though mp3 compresses audio, it still offers an acceptable sound quality. We recommend using AIFF and WAV files for analysis as they don’t miss any information present in analog sounds. At the same time, keep in mind that neither of those and other audio files can be fed directly to machine learning models. To make audio understandable for computers, data must undergo a transformation. 5

Audio Signal Processing Basics What are audio signals ? Audio signals are signals that vibrate in the audible frequency range. When someone talks, it generates air pressure signals; the ear takes in these air pressure differences and communicates with the brain. That's how the brain helps a person recognize that the signal is speech and understand what someone is saying. There are a lot of MATLAB tools to perform audio processing, but not as many exist in Python. Before we get into some of the tools that can be used to process audio signals in Python, let's examine some of the features of audio that apply to audio processing and machine learning. 6

7

Audio Signal Processing Basics Some data features and transformations that are important in speech and audio processing are Mel-frequency cepstral coefficients ( MFCCs ), Gammatone -frequency cepstral coefficients (GFCCs), Linear-prediction cepstral coefficients (LFCCs), Bark-frequency cepstral coefficients (BFCCs), Power-normalized cepstral coefficients (PNCCs), spectrum, cepstrum , spectrogram, and more. We can use some of these features directly and extract features from some others, like spectrum, to train a machine learning model. 8

Audio Signal Processing Basics What are spectrum and cepstrum ? Spectrum and cepstrum are two particularly important features in audio processing. 9

Audio Signal Processing Basics Mathematically, a spectrum is the  Fourier transform  of a signal. A Fourier transform converts a time-domain signal to the frequency domain. In other words, a spectrum is the frequency domain representation of the input audio's time-domain signal. A  cepstrum  is formed by taking the log magnitude of the spectrum followed by an inverse Fourier transform. This results in a signal that's neither in the frequency domain (because we took an inverse Fourier transform) nor in the time domain (because we took the log magnitude prior to the inverse Fourier transform). The domain of the resulting signal is called the quefrency . 10

Audio Signal Processing Basics To start, we want pyAudioProcessing to classify audio into three categories: speech, music, or birds. 11

Audio Signal Processing Basics Using a small dataset (50 samples for training per class) and without any fine-tuning, we can gauge the potential of this classification model to identify audio categories. 12

Audio Signal Processing Basics What is audio analysis? Audio analysis is a process of transforming, exploring, and interpreting audio signals recorded by digital devices. Aiming at understanding sound data, it applies a range of technologies, including state-of-the-art  deep learning  algorithms. Audio analysis has already gained broad adoption in various industries, from entertainment to healthcare to manufacturing. 13

Audio Signal Processing Basics Speech recognition Speech recognition is about the ability of computers to distinguish spoken words with  natural language processing  techniques. It allows us to control PCs, smartphones, and other devices via voice commands and dictate texts to machines instead of manual entering. Siri by Apple, Alexa by Amazon, Google Assistant, and Cortana by Microsoft are popular examples of how deeply the technology has penetrated into our daily lives. 14

Audio Signal Processing Basics Voice recognition Voice recognition is meant to identify people by the unique characteristics of their voices rather than to isolate separate words. The approach finds applications in security systems for user authentication. For instance,  Nuance Gatekeeper  biometric engine verifies employees and customers by their voices in the banking sector. 15

Audio Signal Processing Basics Music recognition Music recognition is a popular feature of such apps as  Shazam  that helps you identify unknown songs from a short sample. Another application of musical audio analysis is genre classification: Say,  Spotify  runs its proprietary algorithm to group tracks into categories (their database holds more than  5,000 genres ) 16

Audio Signal Processing Basics Environmental sound recognition Environmental sound recognition focuses on the identification of noises around us, promising a bunch of advantages to automotive and manufacturing industries. It’s vital for understanding surroundings in  IoT applications . Systems like  Audio Analytic  ‘listen’ to the events inside and outside your car, enabling the vehicle to make adjustments in order to increase a driver’s safety. Another example is  SoundSee  technology by Bosch that can analyze machine noises and facilitate  predictive maintenance  to monitor equipment health and prevent costly failures. 17

Audio Signal Processing Basics Healthcare is another field where environmental sound recognition comes in handy. It offers a non-invasive type of  remote patient monitoring  to detect events like falling. Besides that, analysis of coughing, sneezing, snoring, and other sounds can facilitate pre-screening, identifying a patient’s status, assessing the infection level in public spaces, and so on. 18

Audio Signal Processing Basics A real-life use case of such analysis is  Sleep.ai  which detects teeth grinding and snoring sounds during sleep. The solution created by AltexSoft for a Dutch healthcare startup helps dentists identify and monitor bruxism to eventually understand the causes of this abnormality and treat it. No matter what type of sounds you analyze, it all starts with an understanding of audio data and its specific characteristics. 19

Audio Signal Processing Basics Audio data analysis steps Obtain  project-specific audio data stored in standard file formats. Prepare data  for your machine learning project, using software tools Extract  audio features from visual representations of sound data. Select  the machine learning model and  train  it on audio features. 20

Audio Signal Processing Basics 21

Audio Signal Processing Basics Audio analysis software Audacity  is a free and open-source audio editor to split recordings, remove noise, transform waveforms to spectrograms, and label them. Audacity doesn’t require coding skills. Tensorflow-io package  for preparation and augmentation of audio data lets you perform a wide range of operations — noise removal, converting waveforms to spectrograms, frequency, and time masking to make the sound clearly audible, and more. Librosa  is an open-source Python library that has almost everything you need for audio and music analysis. 22

Audio Signal Processing Basics Audio Toolbox by MathWorks  offers numerous instruments for audio data processing and analysis, from labeling to estimating signal metrics to extracting certain features. 23

MIRtoolbox What is MIRtoolbox ? Mirtoolbox . MIRtoolbox offers  an integrated set of functions written in Matlab , dedicated to the extraction from audio files of musical features such as tonality, rhythm, structures, etc . The objective is to offer an overview of computational approaches in the area of Music Information Retrieval. 24

MIRtoolbox What features does Mir toolbox have? In short, the MIR toolbox allows us to extract data about musical features dealing with  waveform and spectral analysis, tonality, pitch, dynamics, rhythm, tempo, timbre, and other high-level audio features 25

MIRtoolbox What is MATLAB tool? MATLAB ®  is  a programming platform designed specifically for engineers and scientists to analyze and design systems and products that transform our world . The heart of MATLAB is the MATLAB language, a matrix-based language allowing the most natural expression of computational mathematics. 26

MIRtoolbox How many toolboxes are there in MATLAB? Access MATLAB Add-On Toolboxes Statistics and Machine Learning Toolbox™ (Statistics and Machine Learning Toolbox) Curve Fitting Toolbox™ (Curve Fitting Toolbox) Control System Toolbox™ (Control System Toolbox) Signal Processing Toolbox™ (Signal Processing Toolbox) Mapping Toolbox™ (Mapping Toolbox) 27

MIRtoolbox The toolbox is available free of charge under the GNU General Public License. This distribution actually includes, besides MIRtoolbox itself, three other toolboxes: the  Auditory toolbox , version 2, by Malcolm Slaney, the Netlab toolbox, version 3.3, by Ian Nabney , the  SOM toolbox , version 2.0, by Esa Alhoniemi , Johan Himberg , Jukka Parviainen and Juha Vesanto . MIRtoolbox   requires Matlab version 7 and Mathworks ' Signal Processing toolbox. 28

MIRtoolbox Why use MATLAB for Audio Processing ? MATLAB consists of toolboxes used in different domains like Deep Learning, Machine Learning, Image Processing, etc. Such an example of a toolbox is the Audio Toolbox. The audio toolbox hosts many tweaking for audio files, such as speech analysis, acoustic measurement, etc. It has a set of predefined algorithms used for audio Processing, such as equalization and extracting the audio pitch. 29

MIRtoolbox Why use MATLAB for Audio Processing? The audio toolbox can be used to import, label, analyze and experiment on datasets, and these can also be used for training models for machine learning and deep learning. So overall, a host of features can be done using the Audio Toolbox in MATLAB, which very few software provide. 30

MIRtoolbox What are library functions? A library function is  accessed by simply writing the function name, followed by a list of arguments, which represent the information being passed to the function . The arguments must be enclosed in parentheses, and separated by commas: they can be constants, variables, or more complex expressions. 31

MIRtoolbox What is library function in MATLAB? A shared library is  a collection of functions dynamically loaded by an application at run time . The MATLAB interface supports libraries containing functions defined in C header files. To call functions in C++ libraries, use the interface described in Call C++ from MATLAB 32

33 MIRToolbox -Library Functions Blocks Subsystem - Group blocks to create model hierarchy Functions Libinfo - Get information about library blocks referenced by model g cb - Get path name of current block g cbh -Get handle of current block Tools Library Browser - Find and add blocks to model Objects LibraryBrowser.LBStandalone - Display , hide, size, and position Simulink Library Browser

MIRToolbox Create Custom Library 1. From the Simulink start page, select Blank Library and click Create Library 2. (Optional) Define data types to be used on block interfaces in a Simulink data dictionary 3. Add blocks to the new library 4. Add annotations or images 5. If you plan to add the library to the Library Browser, you can order the blocks and annotations in your library 6. If you want the library to appear in the Library Browser, enable the EnableLBRepository library property before you save the library. 7. Save the library 34

MIRtoolbox Create a Sublibrary If your library contains many blocks, you can group the blocks into subsystems or separate sublibraries . To create a sublibrary , you create a library of the sublibrary blocks and reference the library from a Subsystem block in the parent library. 1.In the library you want to add a sublibrary to, add a Subsystem block. 2.Inside the Subsystem block, delete the default input and output ports. 3.If you want, create a mask for the subsystem that displays text or an image that conveys the sublibrary purpose. 4.In the subsystem block properties, set the OpenFcn callback to the name of the library you want to reference. 35

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . What is voice recognition in MATLAB? Voice Recognition system is a method of analyzing the input voice of the person with the help of its features . It then compares it with the features saved in the database for prerecorded signals. It displays an output that tells if any other audio of the same person is present in the database or not. 36

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . What is a voice processing system? The computerized handling of voice, which includes voice store and forward, voice response , voice recognition and text to speech technologies. 37

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . How does voice processing work ? Voice recognition software on computers requires analog audio to be converted into  digital signals, known as analog-to-digital (A/D) conversion. For a computer to decipher a signal, it must have a digital database of words or syllables as well as a quick process for comparing this data to signals. The speech patterns are stored on the hard drive and loaded into memory when the program is running. A comparator checks these stored patterns against the output of the A/D converter -- an action called pattern recognition. 38

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . 39

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . Audio also must be processed for clarity, so some devices may filter out background noise. In some voice recognition systems, certain frequencies in the audio are emphasized so the device can recognize a voice better. Voice recognition systems analyze speech through one of two models: the hidden Markov model and neural networks.  The hidden Markov model breaks down spoken words into their phonemes (characters), while recurrent neural networks use the output from previous steps to influence the input to the current step . 40

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . As uses for voice recognition technology grow and more users interact with it, the organizations implementing voice recognition software will have more data and information to feed into  neural networks  for voice recognition systems. This improves the capabilities and accuracy of voice recognition products. 41

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . Voice recognition uses The uses for voice recognition have grown quickly as AI,  machine learning  and consumer acceptance have matured. Examples of how voice recognition is used include the following : Virtual assistants.  Siri, Alexa and Google  virtual assistants  all implement voice recognition software to interact with users. The way consumers use voice recognition technology varies depending on the product. But they can use it to transcribe voice to text, set up reminders, search the internet and respond to simple questions and requests, such as play music or share weather or traffic information. 42

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . Smart devices.  Users can control their smart homes – including smart thermostats and smart speakers -- using voice recognition software. Automated phone systems.  Organizations use voice recognition with their phone systems to direct callers to a corresponding department by saying a specific number. Conferencing.  Voice recognition is used in live captioning a speaker so others can follow what is said in real time as text. 43

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . Bluetooth.  Bluetooth systems in modern cars support voice recognition to help drivers keep their eyes on the road. Drivers can use voice recognition to perform commands such as "call my office." Dictation and voice recognition software.  These tools can help users dictate and transcribe documents without having to enter text using a physical keyboard or mouse. Government.  The  National Security Agency  has used voice recognition systems dating back to 2006 to identify terrorists and spies or to verify the audio of anyone speaking. 44

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . Voice recognition advantages and disadvantages Voice recognition offers numerous benefits: Consumers can multitask by speaking directly to their voice assistant or other voice recognition technology. Users who have trouble with sight can still interact with their devices. Machine learning and sophisticated  algorithms  help voice recognition technology quickly turn spoken words into written text. This technology can capture speech faster than some users can type. This makes tasks like taking notes or setting reminders faster and more convenient. 45

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . D isadvantages of the technology include the following: Background noise can produce false input. While accuracy rates are improving, all voice recognition systems and programs make errors. There's a problem with words that sound alike but are spelled differently and have different meanings -- for example, hear and here. This issue might be largely overcome using stored contextual information. However, this requires more RAM and faster  processors . 46

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . What are 3 uses for voice recognition software? Voice recognition can be used to   control a smart home, instruct a smart speaker, and command phones and tablets . In addition, we can set reminders and interact hands-free with personal technologies. The most significant use is for the entry of text without using an on-screen or physical keyboard. 47

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . VOICEBOX is a speech processing toolbox consisting of MATLAB routines that are maintained by and mostly written by Mike Brookes, Speech and Audio Processing Lab, CSP Group, EEE Dept , Imperial College London . The routines are available as a  GitHub repository  or a  zip archive  and are made available under the terms of the  GNU Public License . To avoid conflicts, all routine names begin with a "v_" prefix. For compatibility with legacy code, aliased versions without the prefix are included but these are likely to be removed in the future (the routine  v_voicebox_update.m  is included to update legacy code to the new names) 48

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . Audio Toolbox™  enables real-time audio signal processing and analysis in  MATLAB ®  and  Simulink ® (Graphical Programming language). It provides low-latency(delay) connectivity for streaming audio from and to sound cards via the following driver standards: Windows: DirectSound, WASAPI, ASIO™ Apple Mac OS X: Core Audio Linux ® : ALSA 49

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in Matlab . All audio device interfaces in both MATLAB and Simulink support C code generation for acceleration and desktop prototyping. For example, you can generate libraries or standalone applications that process audio in real-time on the desktop. Audio Toolbox also enables you to tune algorithm parameters interactively during simulations using external  MIDI  controls. 50
Tags