Pangborn Sensory Science Conference (2021)

Data Integration strategies for chemical and sensory product spaces: a craft beers and gins case study Mpho Mafata 1,2 , Cody Williams 1 , Markus Kruger 2,3 , Jeanne Brand 1 , Bruce Watson 3 , and Astrid Buica 1 1 South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University, South Africa 2 School for Data Science and Computational Thinking, Stellenbosch University, South Africa 3 Department of Information Science, Stellenbosch University, South Africa

Motivation Craft market allure/ interest & Investigation of new product spaces No scientific studies on the chemical and sensory space of (South African) craft beers and gins. How do you investigate unknown product space where everybody tries to be as different as possible? Be highly inclusive (open-ended → unsupervised/ non-confirmatory ) Add as many as possible (variability), from as far and as different as possible (variation) Capture different types of data

Background Colour Smell Taste Mouthfeel Photometric measurements of colour Gas chromatography - Volatile compounds Liquid chromatography for non-volatile compounds What is captured and how is it captured? What we know The chemistry is the source of sensory stimuli How to capture the sensory stimuli and chemical compounds Which compound classes are responsible for which stimuli What we are trying to figure out Why the correlation between two modes falls short of theoretical expectations Hypotheses Due to complex nature of sensory interactions between the chemical components Quantity: capture more – samples, measurements Quality: capture the vital stuff Better modelling

Background Multiblock Approaches Designate data blocks Treat blocks separately Find joint or unique information i.e. Optimize unique information between different classes, for discrimination analyses i.e. Optimize joint information between dependent and independent variables, for regression analyses and prediction Can be used for exploratory or confirmatory purposes Phase 1: acquisition Phase 2: pre-processing to remove artifacts, reduce noise, or address peak shifts Phase 3: standardization, scaling, weighing, and rank Phase 4: optimization Phase 5: final model

Background Multiblock Approaches Purpose of exploratory Reduce number of dimensions *Used prior to confirmatory analyses Purpose of confirmatory Prediction *Calibration, validation, and testing *Large sample variation and variability Examples Variations of PCA: sum-PCA, m-PCA, h-PCA, etc. Factor analysis: PARAfac , PARAdise , and variations, MFA, ComDim , etc. Predictive analysis: PLS variations OPLS, OPLS-DA, P- ComDim , LDA, etc. Practical optimization criteria Factors/dimensions with Eigenvalue less than 1 Cumulative variation of 70% Dimension/point of first inflection in eigenvalue decay/ scree plot Optimize particular criterion ex. indices such as coefficients of fit (covariance, correlation or regression) Matt C. Howard (2016) A Review of Exploratory Factor Analysis Decisions and Overview of Current Practices: What We Are Doing and How Can We Improve?, International Journal of Human-Computer Interaction, 32:1, 51-62, DOI: 10.1080/10447318.2015.1087664 James B. Schreiber (2021). Issues and recommendations for exploratory factor analysis and principal component analysis. Research in Social and Administrative Pharmacy 17, 1004–1011.

Problem statement Choice: Large number of techniques and their variants Execution: advanced options require programming skills Optimization: highly experimental and laborious Visualization: large data sets fused, difficult to interpret

Aim Comprehensive data fusion of craft beer and gin sensory descriptors and chemical data Choice – must be comprehensive*, exploratory, and unsupervised Optimization – criteria must be exploratory (not blanketed) Interpretation – unambiguous visual representation *Comprehensive meaning it contains both joint and unique information

Materials and methods Sampling Commercially available industrial and craft beers. Local and international gins. & 68 beers, 23 breweries 36 ales and 32 lagers – ‘crafty’ categorization, blurry lines 61 gins, 37 producers South Africa, UK, USA, Belgium, Denmark, Italy, and Japan

Materials and methods Sensory data Mining for the products chemically analysed Poster presentation: P06.014 & Distinguish flavour/ aroma descriptors from emotional/ marketing terms Data consolidation – crafty drinks, crafty descriptions Sampling Commercially available industrial and craft beers. Local and international gins.

Materials and methods Chemical data Headspace-solid phase microextraction - gas chromatography – mass spectroscopy (HS-SPME-GC-MS) Untargeted analysis, Scripps XCMS alignment Terpenoid analysis by Williams & Buica (2021) & Sampling Commercially available industrial and craft beers. Local and international gins. Sensory data Mining for the products chemically analysed Poster presentation: P06.014

Materials and methods Data analysis Sensory attributes – multiple correspondence analysis (MCA) Chemical data – principal component analysis (PCA) Data fusion – multiple factor analysis (MFA) Cluster analysis – agglomerative hierarchical analysis (AHC) & Sampling Commercially available industrial and craft beers. Local and international gins. Sensory data Mining for the products chemically analysed Poster presentation: P06.014 Chemical data Headspace-solid phase microextraction - gas chromatography – mass spectroscopy (HS-SPME-GC-MS) Untargeted analysis, Scripps XCMS alignment Terpenoid analysis by Williams & Buica (2021)

Results 44 Sensory attributes Categorical & Nominal 598 GCMS signals Spectral data 52 Terpenoids Discreet BEER

Results 39 Sensory attributes Categorical & Nominal 1275 GCMS signals Spectral data 52 Terpenoids Discreet GIN

Results GIN BEER Sensory Attributes Targeted Untargeted MFA Sensory Attributes 1.0000 0.2032 0.2177 0.8274 Targeted 0.2032 1.0000 0.4425 0.6200 Untargeted 0.2177 0.4425 1.0000 0.6519 MFA 0.8274 0.6200 0.6519 1.0000 Sensory Attributes Targeted Untargeted MFA Sensory Attributes 1.0000 0.2166 0.2458 0.8571 Targeted 0.2166 1.0000 0.2316 0.5785 Untargeted 0.2458 0.2316 1.0000 0.5903 MFA 0.8571 0.5785 0.5903 1.0000 RV Coefficients → configurational similarity

Results Overall data fusion model Cumulative contribution (%) F19 Total eigenvalue 19.7498 Sensory Attributes 39.11 Cumulative eigenvalue at F19 14.0803 Targeted 13.04 Cumulative %EV at F19 71.2935 Untargeted 19.13 Overall data fusion model Cumulative contribution (%) F18 Total Eigenvalue 19.3407 Sensory Attributes 40.96 Cumulative Eigenvalue at F18 13.6204 Targeted 22.45 Cumulative %EV at F18 70.4235 Untargeted 7.01 GIN BEER Optimization criteria Eigenvalue ≥1 at F3( 22,2%EV ) Beers and F3( 19,7%EV ) Gins Cumulative variation ≥70% at F18 for Beers and F19 for Gins Inflection point at F39 (95%EV) for Beers and F48 (98%EV) for Gins Variation Captured

Results Interpretation – Pattern Recognition After optimization, still many dimensions left to visualise, … … to avoid ambiguous assignment by naked-eye observations Hierarchical Cluster Analysis on projected points biplots BEER 18 dimension, 805 projected points, 91 clusters Cophenetic correlation: 0.7337 Variance decomposition for the optimal classification: Absolute Percent Within-class 0.1968 0.64% Between-classes 30.7731 99.36% Total 30.9699 100.00%

Results BEER Cluster Members 46 alpha-Pinene ß-Pinene ß-Myrcene alpha-Phellandrene Humulene Caryophyllene 91 WBB_IPA 40 Mint-1 NS_L 47 Camphene alpha and sigma-terpinene cis-Beta-Ocimene trans-beta-Ocimene Isophorone Pulegone Beta-citral alpha-terpineol Valencene alpha-citral Nerol 6-methyl-5-hepten-2-one Geraniol LW_TB SF_IPA Citronellol 75 KC_APA 87 SS_PA 66 FB_OPA 49 alpha-thujone Beta-cyclocitral Farnesene Carvone Geranyl acetate Farnesol(E,E) cis-PseudoIonone DB_NA DB_PA CB_PR SS_L SS_HB DB_PR JB_IPA NG_PR RB_PR SB_L 56 DB_BS 1 Fruity-0 Gooseberry-0 Banana-0 Lychee-0 Passion Fruit-0 Citrus-0 Grapefruit-0 Orange-0 Lemon-0 Floral-0 Spicy-0 Clove-0 3-Carene Limonene gamma-Terpinene cis-Rose oxide trans-rose-oxide cis-linalool oxide trans-linalool-oxide Linalyl acetate Beta-Damascenone 2-(1-Mercapto-1-methylethyl)-5-methylcyclohexanone Caryophyllene Oxide Thymol alpha-bisabolol HB_YL JB_L_B JB_L_C NG_IPA NG_L PC_EPA RB_L RB_PA 48 1,8-cineole Camphor Linalool Neryl acetate Carvacrol PC_CL 51 Alpha-Ionone trans-pseudoionone 42 Hay-1 3 Berry-0 Strawberry-0 Black Currant-0 Tropical-0 Mango-0 Papaya-0 Guava-0 Peach-0 Apricot-0 Cherry-0 Plum-0 Red Apple-0 Pear-0 Dried Fruit-0 Melon-0 Sweet Aromatic-0 Caramel-0 Honey-0 Toffee-0 Vanilla-0 Coriander-0 White Bread-0 Biscuit-0 Toast-0 Pine-0 Herbaceous-0 Mint-0 Earthy-0 Hay-0 Smoke-0 1,4-Cineole Fenchone UT_1 52 UT_77 to UT_111 Hierarchical Cluster Analysis on projected points biplots

Results GIN 19 dimension, 1465 projected points, 149 clusters Cophenetic correlation: 0.877 Variance decomposition for the optimal classification: Absolute Percent Within-class 0.0219 0.21% Between-classes 10.2381 99.79% Total 10.2601 100.00% Hierarchical Cluster Analysis on projected points biplots

Results GIN Hierarchical Cluster Analysis on projected points biplots Cluster Members 58 Cassia-0 UT_1 to UT_4 UT_18 UT_22 UT_28 UT_32 UT_36 UT_59 to UT_84 . . . Continued 88 UT_5 to UT_17 UT_19 UT_20 UT_21 UT_23 UT_24 UT_25 UT_26 . . . Continued 89 UT_351 UT_526 UT_534 UT_547 UT_548 UT_671 UT_697 UT_725 . . . Continued

Conclusion Created comprehensive MFA data fusion models of sensory descriptors, terpenoids, and untargeted GC-MS features of craft beers and gins Choice – unsupervised allowed us to see common and unique products and their related chemical compounds/signals and sensory descriptors Practical optimization criteria (not blanketed) were used Cluster analysis on projected points biplots allowed for unambiguous interpretation Recommendation – confirmatory cluster analyses ( ex. k-NN or fuzzy c-means) if there are natural clustering by categories (ale and lager), design further studies to exploit each category *Not included here: contextual descriptions of the chemical and sensory space

Acknowledgements Funding: Office of the Vice Rector, Stellenbosch University School for Data Science and Computational Thinking, Stellenbosch University

Pangborn Sensory Science Conference (2021)

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pangborn Sensory Science Conference (2021)

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Earthquakes_Type of Faults_Science G8.pptx

Quiz #1 Science 10 in the first quarter for jhs

Astronomy history from long ago till doday

Great history of astronomy from long ago till today

EARTHQUAKE-DRILL.powerpoint.............

History of astronomy from old times to the present times