d any for profit sale of the slides (in any form) requires the consent of the copyright owners Reference-Presentation.pptx

Hateful Content Detection Mohd Zia Ur Rehman Advisor: Dr. Nagendra Kumar 13 -04-2022 IIT Indore COMPUTER SCIENCE & ENGINEERING

Data Access in E-Commerce Users struggle to access e-commerce databases due to Lack of technical expertise Limited familiarity with database content User-friendly data access techniques Keyword search Forms, faceted search 2

Faceted Search 3

Background Query and result panel Facet Queriable (shown) Non- queriable (hidden) Facet value Categorical Numerical Summary digest Exploration path 4

Background Faceted search can be used for both documents and structured data Assume a relational database D with n numerical or categorical attributes Queriable facets F ={ f 1 , f 2 , …, f m } Hidden facets H ={ h 1 , h 2 , …, h (n-m) } 5

Mary’s Predicament Consider a used-car database of an auto-dealer Has many attributes Many values for each attribute Attributes have complex dependencies Mary: A potential used-car buyer unfamiliar with car domain Mary’s initial choice Year  2012 Transmission = Automatic 6

Mary’s Predicament (contd.) Faceted navigation helps Mary to Form queries Get organized result But she cannot Gain good familiarity with database content Efficiently browse the result panel 7

Limitations of Faceted Navigation Limited feedback Query panel Low adaptivity Result panel Hidden facets Query & result panel 8

Limitation 1: Limited Feedback Faceted interface Self- explorable design Exponential choices But… Users feel lost Too much freedom without any feedback 9

Limitation 2: Low Adaptivity Result panel not adaptive to users’ need Variable-speed browsing not possible 10 Variable-speed Browsing in Google Maps

Limitation 3: Hidden Facet Only few queriable facets Limited screen-estate User fatigue Limitations of non- queriable facets No way to express selections No summary digest 11

Challenges in Overcoming Limitations Need to retain fundamental structure of faceted interface Difficult to quantify interface improvement Search interfaces must meet real-time performance 12

1 2 3 4 5 6 Add-on Extensions at Amazon.com 13

Research Overview 14 ICSummary SFI View Skimmer

Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap 15 Single Exploration Path Multiple Exploration Paths Result Set Presentation

Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap: ICSummary 16 Single Exploration Path Multiple Exploration Paths Result Set Presentation

Mary’s Predicament (contd.) Mary’s initial search parameters Year  2012 Transmission = Automatic How should Mary choose the right body style(s) 12 different body styles Each body style has complex dependencies with other attributes 17

Struggle in Choosing Body Style Choosing SUV Pros : Cargo capacity, Ground clearance, Engine power Cons : Price, Fuel economy Choosing Hatchback Pros : Fuel economy, Price Cons : Cargo capacity, Engine power Independent facets Mileage (number of miles), Color 18

Cause of Mary’s Predicament Mary’s unknown preference function Spans across multiple attributes Changes on seeing the comparison between available choices Hard to find optimal choices Optimal choices are contradictory One attribute’s optimal choice is non-optimal for others 19

Browsing Individual Tuples Suitable for Small result set Detailed and pairwise comparison between tuples 20

Browsing Aggregated Summaries Suitable for Large result set Comparing a given tuple relative to the whole dataset Quickly determine relevance of search result But… Summarizing complex datasets is challenging Simple data summaries, such as min, max, mean, etc., are not effective Single summary cannot satisfy all the needs of all users 21

Data Summary Using Parallel Coordinates 22 Standard parallel coordinates Can show interaction between attributes Cannot scale to large dataset Similar to faceted navigation summary digest Cannot show interaction between attributes Can scale to large dataset

Limitations of Query Panel Information in non- queriable facets not displayed in the query panel Need considerable back-and-forth browsing across the query panel Count information not shown in multi-selection faceted interface 23

Problem Overview Problem : Query panel of faceted interface cannot show implicit selections due to explicit selections Solution : A novel data summary called Implicit Choices Summary ( ICSummary ) Integrated with faceted navigation Summarizes the interaction of a given facet value with facet values in top- L related facets 24

What is ICSummary ? Given a facet f , for values V and v selected in f , consider result sets R V and R v such that v  V R v  R V 25 R V R v ICSummary for facet value v is a statistical comparison of R V and R v

Mary’s selection: Year ≥ 2012 v = SUV V = {Sedan, SUV, Coupe, Hatchback,…} Font colors: indicative of likelihood Green: increased Blue: unaltered Red: decreased 26

Observed vs. Expected Count Given v = SUV, V = {Sedan, SUV} AWD observed count = 500 AWD expected count = 2200/7400 *700 = 208 Increased likelihood 27

28 Mary’s selection: Year ≥ 2012 v = Sedan V = {Sedan, SUV, Coupe, Hatchback,…} Font colors: indicative of likelihood Green: increased Blue: unaltered Red: decreased

ICSummary Design Comparison between tuple count and ICSummary Both provide preview about the next result Tuple count is preview of next result set size ICSummary is a preview of change in facet value distribution in other facets Structure of ICSummary Ordered list of dependent facets Change in facet value distribution of each dependent facet 29

User Interaction Two modes of user-interaction Enable ICSummaries Disable ICSummaries ICSummaries cannot be shown as tuple counts Show ICSummary as pop-up window 30

Problem Definition Problem 1 : Creating ICSummary Find dependent facets Summarize dependent facets Problem 2 : Selecting top- K queriable facets Must fit within a given query panel space Minimize information loss for non- queriable facets 31

Problem 1a: Find Dependent Facets Two contradictory goals in creating ICSummaries Maximize information quality Maximize comparability Solution : Show same set of dependent facets for all facet values currently selected within a facet Algorithms used chi-square statistical test for independence p-scores 32

Problem 1b: Summarize Dependent Facets Solution : Use a categorization function C to uniformly summarize the distribution change in each dependent facet C has two dimensions Frequency based on observed count Likelihood based on observed and expected count difference E.g. obs. count > exp. count implies increased likelihood Chi-square test to determine significance of count difference 33

Problem 2: Top-K Queriable Facets Solution Assume S pre-selected facets Select K-|S| more facets such that they Have less than C values Are central nodes in the graph Example Year → Mileage BodyType → Model Make → Model 34 Facet Dependency Using Bayesian Network

Efficient Implementation of ICSummary Faceted search engine (FSE) Built on top of inverted index Fast response through efficient bitset operations Constant retrieval time for any query (assuming few retrieved tuples ) Limitations of FSEs in creating ICSummaries Not suitable for retrieving large number of tuples Summary digest has commingled count information 35

Extended Summary Digest (ESD) ESD can improve system performance by two orders of magnitude Efficient computation of contingency table required for computing dependent facets Dependent facets summarized without additional queries ESD stores the summary digest of all next possible result sets in form of hash tables Normal ESD: O( n 2 R ) Optimized ESD: O( lmR ) , where l,m << n 36

Evaluation Performance Evaluation Static computations Once per query result Dynamic computations Computing ICSummaries as requested by the user Datasets 37 Dataset # Tuples # Dimensions Yahoo Used Car 40,000 11 Census 2,458,285 64

Static Computations Retrieving tuples takes maximum time Optimization: Retrieve at most 5K-10K tuples Computing ESD takes maximum time Optimized ESD is 20 times faster compared to ESD 38 Low Dimensional Dataset High Dimensional Dataset

Dynamic Computations Amortized cost analysis Assumption : Caching and lazy evaluation For N ICSummaries from M queriable facets the computation cost is: k 1 M + k 2 N 39 Algorithms Find Dependent Facets (in ms) Summarize Dependent Facets (in ms) Pre-computed Dependent Facets 130 Baseline 230 130 ESD Based 2 3 Time to Compute a Single ICSummary

Conclusion: ICSummary Identified three limitations in the query panel for complex datasets Created an add-on extension, ICSummary , addressing those limitations Implemented algorithms and optimizations to make ICSummary computation real-time 40

Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap: SFI View 41 Single Exploration Path Multiple Exploration Paths Result Set Presentation

Mary’s Predicament (contd.) Mary’s initial search parameters Year  2012 Transmission = Automatic Let’s assume that Mary prefers SUVs. She has narrowed her choices to: SUVs, Sedans, Convertibles, Hatchbacks and Coupes. She wants to compare SUVs with other four body styles. 42

Limitations in the Query Panel 43 Compare different body styles Find sub-categories within each body style Express selection in hidden facets

Cause of Limitations Missing integrated theme For example, which color is associated with which object? Even ICSummary can show only 2D interactions Need to show higher-dimensional interactions 44 Aquamarine Red Orange Door Doorway Wall

Solution Overview Create an extension of the query panel that can show users integrated information. 45

Two-Phased Faceted Browsing Add a new component: Summarized Facet Interaction (SFI) View Two browsing phases Query revision phase : Query panel + SFI View Result set phase : Query panel + Result panel The appropriate phase can be explicitly toggled by the user or intelligently chosen by the system 46

Components of an SFI View 47

Interaction Support Find similar top ranked IUnits Find the sub-categories of SUVs that are most similar to Sedans and Hatchbacks. Find similar facet values within the Pivot Facet Which body styles are most similar to SUVs? Which manufacturers are most similar to Ford? 48

Components of an SFI View 49

Addressing Limitations Using SFI View Compare different body styles Find sub-categories within each body style Express selection in hidden facets 50

Algorithms Creating SFI View Generate candidate IUnits Find informative facets Compute important facet interactions Find top- K IUnits Browsing SFI View Find similar IUnits Find similar facet values 51

Experimental Evaluation User study Mushroom dataset from UCI repository 8124 tuples with 23 features It is a simple to understand but unfamiliar dataset Compare TPFacet with faceted interface in Apache Solr Performance evaluation Yahoo-used car dataset with 40K tuples and 11 attributes 52

User Study Three exploratory search tasks that tests users understanding of database Build a simple binary classifier Find the most similar facet value pair Find an alternative search condition Conclusions from the user study 4-5 times gain in users efficiency and accuracy using SFI View Useful feedback in both SFI and existing query panel 53

User Study Results 54 Binary Classifier Similar Facet Value Alternative Search Condition

Performance Results SFI View for 40K tuples with 11 facets 4.5 secs without optimizations 0.5 secs with optimizations 55

Conclusion: SFI View Identified three limitations in the query panel due to missing integrated information Designed a two-phased faceted interface that provides facet-wise comparison of multiple exploration paths Evaluated the benefits of SFI View for three exploratory search tasks. 56

Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap: Skimmer 57 Single Exploration Path Multiple Exploration Path Result Set Presentation

Mary’s Predicament (contd.) Information navigation has two components Searching Browsing Mary would like to look at tuples in the result panel to alter her search query But exploring large number of relational tuples is difficult 58

Relational Query Result Relational result set Boolean retrieval model Lacks visual aids Users use non-optimal solutions, such as ordering or clustering 59

Fast Browsing Using Scrolling Interface Scrolling is widely used interface Constraints in fast scrolling Technical constraints : Data distortion Bodily constraints : Visual perception, memory retention etc. Solution : Integrate scrolling with automatic zooming Easy transition between overview and detailed information. 60

Scrolling Large Text Documents Our eyes follows visual hints, such as Difference in font size Section headings, graphics etc. Known structural outline Variable speed scrolling Automatic zooming based on scrolling rate 61 [Igarashi, UIST’00]

Text Documents Vs. Relational Data Alphanumeric with few visual markers Lacks structural outline Ineffective ranking 62

Problem Definition Design a variable-speed scrolling system for relational data Use user’s scrolling behavior to determine how much and what information to display Skimmer : Allow users to skim through information at different browsing speeds. 63

Outline Motivation Scrolling Interface User Interface Goodness Metric Algorithms Experimental Evaluation 64

User Interface 65

User Interface Input : Ordered or clustered query result R Output : R requires S pages { P 1 , P 2 , …, P S } for display Display representatives: { D 1 , D 2 , …, D S } D i is computed based on: User’s current scrolling speed Contents of page P i User’s current browsing history Goal : To support fast browsing, we show users summarized , non-redundant and diverse information. 66

Goodness Metric: Information Loss Tuplewise information loss of a non-displayed tuple , t nd from P i where t d is most similar tuple from D i and H( sid ) Pagewise information loss score of page P i Cumulative information loss for result set R and scroll log SL 67

Outline Motivation Scrolling Interface Algorithms Two Naïve Algorithms Two K- Medoids Based Algorithms Relationship between K- Medoid and PIL Score Three K-Means Based Algorithms Experimental Evaluation 68

Algorithms Naïve sampling Random Uniform K- Medoids based sampling Local K- Medoid ( LKMed ) Historical K- Medoid ( HKMed ): Minimizes exact PIL score. K-Means based sampling (for better performance) Local K-Means ( LKMeans ) Historical K-Means ( HKMeans ) Two-phase K-Means ( TPKMeans ) 69

Outline Motivation Scrolling Interface Algorithms Experimental Evaluation Performance Information Quality User Study 70

Experimental Goals Computational performance Page size Number of dimensions Sampling rate Information quality User study 71

Performance 72 LKMed and HKMed need more time Not suitable for large page size or high sampling rate HKMed is faster than LKMed All algorithms satisfy interactive response constraint

Experimental Goals Computational Performance Information Quality Information Gain : Random Sampling as baseline B Page size Number of dimensions Sampling rate User Study 73

Information Quality 74 HKMed is best followed by TPKMeans and LKMed HKMeans is almost close to random sampling Information gain decreases with increasing # dimensions

Summary Recommendations 75 Sampling Rate Page Size Two Phase K-Means Two Phase K-Means Two Phase K-Means Historical K- Medoids

User Study 76 Interesting Patterns Regression Task Discriminating Features Almost similar or better quality of response for all three tasks Users are able to do the tasks 1.5 - 2 times faster

Conclusion: Skimmer Designed a scrolling-aware result panel Defined information loss metric Implemented five algorithms to minimize information loss Evaluated algorithms for computational efficiency 77

Conclusion: Overall Thesis Designed three add-on extensions to improve faceted navigation ICSummary : Summary of implicit selections SFI View : Top-k sub-categories within each facet value Skimmer : Scrolling aware result panel These extensions blend multiple navigation techniques 78 ICSummary Faceted Navigation OLAP Ranking Categorization SFI View Faceted Navigation Clustering Ranking Skimmer Faceted Navigation Scrolling Clustering Sampling

Future Work Improving the result panel in faceted interface Query biased snippet, ranking, personalization etc. Data exploration in text and graph data Workflow in healthcare and similar domains 79

Publications Manish Singh, Qiang Zhu, H. V. Jagadish : SWST: A Disk Based Index for Sliding Window Spatio -Temporal Data. ICDE 2012: 342-353 Manish Singh, Arnab Nandi, H. V. Jagadish : Skimmer: rapid scrolling of relational query results. SIGMOD Conference 2012: 181-192 Manish Singh, Michael Cafarella , H. V. Jagadish : TPFacet : Two-Phase Faceted Browsing (under preparation) Manish Singh, H. V. Jagadish : Querying High-Dimensional Data Using Faceted Interface (under preparation) 80

Thank You! 81

d any for profit sale of the slides (in any form) requires the consent of the copyright owners Reference-Presentation.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

d any for profit sale of the slides (in any form) requires the consent of the copyright owners Reference-Presentation.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77