d any for profit sale of the slides (in any form) requires the consent of the copyright owners Reference-Presentation.pptx
bhaveshagrawal35
0 views
81 slides
Oct 10, 2025
Slide 1 of 81
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
About This Presentation
The slides below are copyright Silberschatz, Galvin and Gagne, 2008. The slides are authorized for personal use, and for use in conjunction with a course for which Operating System Concepts is the prescribed text. Instructors are free to modify the slides to their taste, as long as the modified slid...
The slides below are copyright Silberschatz, Galvin and Gagne, 2008. The slides are authorized for personal use, and for use in conjunction with a course for which Operating System Concepts is the prescribed text. Instructors are free to modify the slides to their taste, as long as the modified slides acknowledge the source and the fact that they have been modified. Paper copies of the slides may be sold strictly at the price of reproduction, to students of courses where the book is the prescribed text. Any use that differs from the above, and any for profit sale of the slides (in any form) requires the consent of the copyright owners; contact Avi Silberschatz ([email protected]) to obtain the copyright owners consent.
line separator
Size: 6.41 MB
Language: en
Added: Oct 10, 2025
Slides: 81 pages
Slide Content
Hateful Content Detection Mohd Zia Ur Rehman Advisor: Dr. Nagendra Kumar 13 -04-2022 IIT Indore COMPUTER SCIENCE & ENGINEERING
Data Access in E-Commerce Users struggle to access e-commerce databases due to Lack of technical expertise Limited familiarity with database content User-friendly data access techniques Keyword search Forms, faceted search 2
Faceted Search 3
Background Query and result panel Facet Queriable (shown) Non- queriable (hidden) Facet value Categorical Numerical Summary digest Exploration path 4
Background Faceted search can be used for both documents and structured data Assume a relational database D with n numerical or categorical attributes Queriable facets F ={ f 1 , f 2 , …, f m } Hidden facets H ={ h 1 , h 2 , …, h (n-m) } 5
Mary’s Predicament Consider a used-car database of an auto-dealer Has many attributes Many values for each attribute Attributes have complex dependencies Mary: A potential used-car buyer unfamiliar with car domain Mary’s initial choice Year 2012 Transmission = Automatic 6
Mary’s Predicament (contd.) Faceted navigation helps Mary to Form queries Get organized result But she cannot Gain good familiarity with database content Efficiently browse the result panel 7
Limitations of Faceted Navigation Limited feedback Query panel Low adaptivity Result panel Hidden facets Query & result panel 8
Limitation 1: Limited Feedback Faceted interface Self- explorable design Exponential choices But… Users feel lost Too much freedom without any feedback 9
Limitation 2: Low Adaptivity Result panel not adaptive to users’ need Variable-speed browsing not possible 10 Variable-speed Browsing in Google Maps
Limitation 3: Hidden Facet Only few queriable facets Limited screen-estate User fatigue Limitations of non- queriable facets No way to express selections No summary digest 11
Challenges in Overcoming Limitations Need to retain fundamental structure of faceted interface Difficult to quantify interface improvement Search interfaces must meet real-time performance 12
1 2 3 4 5 6 Add-on Extensions at Amazon.com 13
Research Overview 14 ICSummary SFI View Skimmer
Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap 15 Single Exploration Path Multiple Exploration Paths Result Set Presentation
Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap: ICSummary 16 Single Exploration Path Multiple Exploration Paths Result Set Presentation
Mary’s Predicament (contd.) Mary’s initial search parameters Year 2012 Transmission = Automatic How should Mary choose the right body style(s) 12 different body styles Each body style has complex dependencies with other attributes 17
Struggle in Choosing Body Style Choosing SUV Pros : Cargo capacity, Ground clearance, Engine power Cons : Price, Fuel economy Choosing Hatchback Pros : Fuel economy, Price Cons : Cargo capacity, Engine power Independent facets Mileage (number of miles), Color 18
Cause of Mary’s Predicament Mary’s unknown preference function Spans across multiple attributes Changes on seeing the comparison between available choices Hard to find optimal choices Optimal choices are contradictory One attribute’s optimal choice is non-optimal for others 19
Browsing Individual Tuples Suitable for Small result set Detailed and pairwise comparison between tuples 20
Browsing Aggregated Summaries Suitable for Large result set Comparing a given tuple relative to the whole dataset Quickly determine relevance of search result But… Summarizing complex datasets is challenging Simple data summaries, such as min, max, mean, etc., are not effective Single summary cannot satisfy all the needs of all users 21
Data Summary Using Parallel Coordinates 22 Standard parallel coordinates Can show interaction between attributes Cannot scale to large dataset Similar to faceted navigation summary digest Cannot show interaction between attributes Can scale to large dataset
Limitations of Query Panel Information in non- queriable facets not displayed in the query panel Need considerable back-and-forth browsing across the query panel Count information not shown in multi-selection faceted interface 23
Problem Overview Problem : Query panel of faceted interface cannot show implicit selections due to explicit selections Solution : A novel data summary called Implicit Choices Summary ( ICSummary ) Integrated with faceted navigation Summarizes the interaction of a given facet value with facet values in top- L related facets 24
What is ICSummary ? Given a facet f , for values V and v selected in f , consider result sets R V and R v such that v V R v R V 25 R V R v ICSummary for facet value v is a statistical comparison of R V and R v
Mary’s selection: Year ≥ 2012 v = SUV V = {Sedan, SUV, Coupe, Hatchback,…} Font colors: indicative of likelihood Green: increased Blue: unaltered Red: decreased 26
Observed vs. Expected Count Given v = SUV, V = {Sedan, SUV} AWD observed count = 500 AWD expected count = 2200/7400 *700 = 208 Increased likelihood 27
28 Mary’s selection: Year ≥ 2012 v = Sedan V = {Sedan, SUV, Coupe, Hatchback,…} Font colors: indicative of likelihood Green: increased Blue: unaltered Red: decreased
ICSummary Design Comparison between tuple count and ICSummary Both provide preview about the next result Tuple count is preview of next result set size ICSummary is a preview of change in facet value distribution in other facets Structure of ICSummary Ordered list of dependent facets Change in facet value distribution of each dependent facet 29
User Interaction Two modes of user-interaction Enable ICSummaries Disable ICSummaries ICSummaries cannot be shown as tuple counts Show ICSummary as pop-up window 30
Problem Definition Problem 1 : Creating ICSummary Find dependent facets Summarize dependent facets Problem 2 : Selecting top- K queriable facets Must fit within a given query panel space Minimize information loss for non- queriable facets 31
Problem 1a: Find Dependent Facets Two contradictory goals in creating ICSummaries Maximize information quality Maximize comparability Solution : Show same set of dependent facets for all facet values currently selected within a facet Algorithms used chi-square statistical test for independence p-scores 32
Problem 1b: Summarize Dependent Facets Solution : Use a categorization function C to uniformly summarize the distribution change in each dependent facet C has two dimensions Frequency based on observed count Likelihood based on observed and expected count difference E.g. obs. count > exp. count implies increased likelihood Chi-square test to determine significance of count difference 33
Problem 2: Top-K Queriable Facets Solution Assume S pre-selected facets Select K-|S| more facets such that they Have less than C values Are central nodes in the graph Example Year → Mileage BodyType → Model Make → Model 34 Facet Dependency Using Bayesian Network
Efficient Implementation of ICSummary Faceted search engine (FSE) Built on top of inverted index Fast response through efficient bitset operations Constant retrieval time for any query (assuming few retrieved tuples ) Limitations of FSEs in creating ICSummaries Not suitable for retrieving large number of tuples Summary digest has commingled count information 35
Extended Summary Digest (ESD) ESD can improve system performance by two orders of magnitude Efficient computation of contingency table required for computing dependent facets Dependent facets summarized without additional queries ESD stores the summary digest of all next possible result sets in form of hash tables Normal ESD: O( n 2 R ) Optimized ESD: O( lmR ) , where l,m << n 36
Evaluation Performance Evaluation Static computations Once per query result Dynamic computations Computing ICSummaries as requested by the user Datasets 37 Dataset # Tuples # Dimensions Yahoo Used Car 40,000 11 Census 2,458,285 64
Static Computations Retrieving tuples takes maximum time Optimization: Retrieve at most 5K-10K tuples Computing ESD takes maximum time Optimized ESD is 20 times faster compared to ESD 38 Low Dimensional Dataset High Dimensional Dataset
Dynamic Computations Amortized cost analysis Assumption : Caching and lazy evaluation For N ICSummaries from M queriable facets the computation cost is: k 1 M + k 2 N 39 Algorithms Find Dependent Facets (in ms) Summarize Dependent Facets (in ms) Pre-computed Dependent Facets 130 Baseline 230 130 ESD Based 2 3 Time to Compute a Single ICSummary
Conclusion: ICSummary Identified three limitations in the query panel for complex datasets Created an add-on extension, ICSummary , addressing those limitations Implemented algorithms and optimizations to make ICSummary computation real-time 40
Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap: SFI View 41 Single Exploration Path Multiple Exploration Paths Result Set Presentation
Mary’s Predicament (contd.) Mary’s initial search parameters Year 2012 Transmission = Automatic Let’s assume that Mary prefers SUVs. She has narrowed her choices to: SUVs, Sedans, Convertibles, Hatchbacks and Coupes. She wants to compare SUVs with other four body styles. 42
Limitations in the Query Panel 43 Compare different body styles Find sub-categories within each body style Express selection in hidden facets
Cause of Limitations Missing integrated theme For example, which color is associated with which object? Even ICSummary can show only 2D interactions Need to show higher-dimensional interactions 44 Aquamarine Red Orange Door Doorway Wall
Solution Overview Create an extension of the query panel that can show users integrated information. 45
Two-Phased Faceted Browsing Add a new component: Summarized Facet Interaction (SFI) View Two browsing phases Query revision phase : Query panel + SFI View Result set phase : Query panel + Result panel The appropriate phase can be explicitly toggled by the user or intelligently chosen by the system 46
Components of an SFI View 47
Interaction Support Find similar top ranked IUnits Find the sub-categories of SUVs that are most similar to Sedans and Hatchbacks. Find similar facet values within the Pivot Facet Which body styles are most similar to SUVs? Which manufacturers are most similar to Ford? 48
Components of an SFI View 49
Addressing Limitations Using SFI View Compare different body styles Find sub-categories within each body style Express selection in hidden facets 50
Algorithms Creating SFI View Generate candidate IUnits Find informative facets Compute important facet interactions Find top- K IUnits Browsing SFI View Find similar IUnits Find similar facet values 51
Experimental Evaluation User study Mushroom dataset from UCI repository 8124 tuples with 23 features It is a simple to understand but unfamiliar dataset Compare TPFacet with faceted interface in Apache Solr Performance evaluation Yahoo-used car dataset with 40K tuples and 11 attributes 52
User Study Three exploratory search tasks that tests users understanding of database Build a simple binary classifier Find the most similar facet value pair Find an alternative search condition Conclusions from the user study 4-5 times gain in users efficiency and accuracy using SFI View Useful feedback in both SFI and existing query panel 53
User Study Results 54 Binary Classifier Similar Facet Value Alternative Search Condition
Performance Results SFI View for 40K tuples with 11 facets 4.5 secs without optimizations 0.5 secs with optimizations 55
Conclusion: SFI View Identified three limitations in the query panel due to missing integrated information Designed a two-phased faceted interface that provides facet-wise comparison of multiple exploration paths Evaluated the benefits of SFI View for three exploratory search tasks. 56
Querying High-Dimensional Data Using Faceted Interface TPFacet : Two-Phased Faceted Browsing Skimmer: Skimming through Relational Query Result Thesis Roadmap: Skimmer 57 Single Exploration Path Multiple Exploration Path Result Set Presentation
Mary’s Predicament (contd.) Information navigation has two components Searching Browsing Mary would like to look at tuples in the result panel to alter her search query But exploring large number of relational tuples is difficult 58
Relational Query Result Relational result set Boolean retrieval model Lacks visual aids Users use non-optimal solutions, such as ordering or clustering 59
Fast Browsing Using Scrolling Interface Scrolling is widely used interface Constraints in fast scrolling Technical constraints : Data distortion Bodily constraints : Visual perception, memory retention etc. Solution : Integrate scrolling with automatic zooming Easy transition between overview and detailed information. 60
Scrolling Large Text Documents Our eyes follows visual hints, such as Difference in font size Section headings, graphics etc. Known structural outline Variable speed scrolling Automatic zooming based on scrolling rate 61 [Igarashi, UIST’00]
Text Documents Vs. Relational Data Alphanumeric with few visual markers Lacks structural outline Ineffective ranking 62
Problem Definition Design a variable-speed scrolling system for relational data Use user’s scrolling behavior to determine how much and what information to display Skimmer : Allow users to skim through information at different browsing speeds. 63
User Interface Input : Ordered or clustered query result R Output : R requires S pages { P 1 , P 2 , …, P S } for display Display representatives: { D 1 , D 2 , …, D S } D i is computed based on: User’s current scrolling speed Contents of page P i User’s current browsing history Goal : To support fast browsing, we show users summarized , non-redundant and diverse information. 66
Goodness Metric: Information Loss Tuplewise information loss of a non-displayed tuple , t nd from P i where t d is most similar tuple from D i and H( sid ) Pagewise information loss score of page P i Cumulative information loss for result set R and scroll log SL 67
Outline Motivation Scrolling Interface Algorithms Two Naïve Algorithms Two K- Medoids Based Algorithms Relationship between K- Medoid and PIL Score Three K-Means Based Algorithms Experimental Evaluation 68
Algorithms Naïve sampling Random Uniform K- Medoids based sampling Local K- Medoid ( LKMed ) Historical K- Medoid ( HKMed ): Minimizes exact PIL score. K-Means based sampling (for better performance) Local K-Means ( LKMeans ) Historical K-Means ( HKMeans ) Two-phase K-Means ( TPKMeans ) 69
Outline Motivation Scrolling Interface Algorithms Experimental Evaluation Performance Information Quality User Study 70
Experimental Goals Computational performance Page size Number of dimensions Sampling rate Information quality User study 71
Performance 72 LKMed and HKMed need more time Not suitable for large page size or high sampling rate HKMed is faster than LKMed All algorithms satisfy interactive response constraint
Experimental Goals Computational Performance Information Quality Information Gain : Random Sampling as baseline B Page size Number of dimensions Sampling rate User Study 73
Information Quality 74 HKMed is best followed by TPKMeans and LKMed HKMeans is almost close to random sampling Information gain decreases with increasing # dimensions
Summary Recommendations 75 Sampling Rate Page Size Two Phase K-Means Two Phase K-Means Two Phase K-Means Historical K- Medoids
User Study 76 Interesting Patterns Regression Task Discriminating Features Almost similar or better quality of response for all three tasks Users are able to do the tasks 1.5 - 2 times faster
Conclusion: Skimmer Designed a scrolling-aware result panel Defined information loss metric Implemented five algorithms to minimize information loss Evaluated algorithms for computational efficiency 77
Conclusion: Overall Thesis Designed three add-on extensions to improve faceted navigation ICSummary : Summary of implicit selections SFI View : Top-k sub-categories within each facet value Skimmer : Scrolling aware result panel These extensions blend multiple navigation techniques 78 ICSummary Faceted Navigation OLAP Ranking Categorization SFI View Faceted Navigation Clustering Ranking Skimmer Faceted Navigation Scrolling Clustering Sampling
Future Work Improving the result panel in faceted interface Query biased snippet, ranking, personalization etc. Data exploration in text and graph data Workflow in healthcare and similar domains 79
Publications Manish Singh, Qiang Zhu, H. V. Jagadish : SWST: A Disk Based Index for Sliding Window Spatio -Temporal Data. ICDE 2012: 342-353 Manish Singh, Arnab Nandi, H. V. Jagadish : Skimmer: rapid scrolling of relational query results. SIGMOD Conference 2012: 181-192 Manish Singh, Michael Cafarella , H. V. Jagadish : TPFacet : Two-Phase Faceted Browsing (under preparation) Manish Singh, H. V. Jagadish : Querying High-Dimensional Data Using Faceted Interface (under preparation) 80