T opic: Mining Compressed or Approximate Patterns Nabea Yaseen 2020-uam-1848
Compressed Patterns: Finding Big Lego Shapes: Instead of looking at each small Lego piece, we want to find big shapes made from many pieces. These big shapes help us understand the box without having to check every single piece. Example: You see a lot of red cars and blue houses in the box. Instead of counting each car or house, you just note "lots of red cars" and "lots of blue houses". This makes it easier to know what’s in the box quickly. Approximate Patterns: Finding Similar Lego Shapes: Sometimes, we don't need the exact same Lego shape. Shapes that look almost the same are good enough. Example: If you find a lot of red trucks that look similar to red cars, you can say there are a lot of "red vehicles". Even if trucks and cars are not exactly the same, they are close enough.
Why do we do this? Faster and Easier: It’s faster to look at big shapes and similar shapes rather than counting every single Lego piece. This helps when you have a big box of Legos and you need to know what’s inside quickly. Understanding Better: Knowing that there are many red vehicles gives you a good idea about what you have, even if you don't know the exact number of cars or trucks. Where do we use this? Shopping: Stores look at what people buy a lot together, like milk and cookies. This helps them decide where to place items in the store. Health: Doctors look at patterns in people's health to find out which symptoms often come together, helping them diagnose diseases faster. Detecting Weird Stuff: If something strange happens, like finding a broken Lego piece often in a certain spot, it helps us find and fix problems quickly.
Compressed Patterns by Pattern Clustering Finding Groups of Similar Lego Shapes: Pattern Clustering: This means putting similar Lego shapes into groups. Each group will have shapes that look alike. Example: You find lots of small red cars, big red trucks, and blue airplanes. You group all the red cars together, all the red trucks together, and all the blue airplanes together. Creating a Summary Compressed Patterns: Instead of looking at every single Lego piece, you look at the groups you made. These groups help you understand the contents of your Lego box without having to check each piece. Example: Now, you know you have three main groups: "red cars", "red trucks", and "blue airplanes". This is much faster than counting every car, truck, and airplane separately.
Why Do We Do This ? Saves Time: It’s quicker to look at groups of Legos than to check each piece. This helps when you have a big box of Legos and you need to know what’s inside quickly. Simplifies Understanding: Knowing you have "red cars", "red trucks", and "blue airplanes" gives you a good idea about your Lego collection without needing exact numbers. How Is This Used in the Real World ? Shopping: Stores look at what items people buy together, like bread and butter. They group these items to understand shopping patterns. Health: Doctors group symptoms that often appear together, helping them diagnose diseases faster. Detecting Issues: If something strange happens, like a lot of broken Lego pieces in one group, it helps find and fix problems quickly
Closed Frequent Itemset Imagine You Have a Grocery List: You look at the grocery lists of many people and find that certain items are frequently bought together, like bread and butter. Closed Frequent Itemset : Example: If you find that bread and butter are bought together 100 times, and there's no larger group (like bread, butter, and jam) that is also bought together 100 times, then "bread and butter" is a closed frequent itemset . Maximal Frequent Itemset Imagine You Have the Same Grocery Lists: Again, you're looking at which items people buy together often. Maximal Frequent Itemset : Example: If you find that "bread, butter, and jam" are bought together 80 times, and adding any more items to this group makes it less frequent, then "bread, butter, and jam" is a maximal frequent itemset
Key Differences Closed Frequent Itemset : Focuses on ensuring no larger group has the same frequency. Example : If "bread and butter" appear together 100 times and no larger group (like adding "jam") has the same frequency, it is closed. Maximal Frequent Itemset : Focuses on being the largest group that appears frequently. Example: If "bread, butter, and jam" is bought 80 times and adding any more items (like "milk") reduces its frequency, it is maximal. | ID | Itemset | Support | |-----|---------------------|---------| | 1 | {milk, bread} | 100 | | 2 | {milk, eggs} | 80 | | 3 | {bread, eggs} | 75 | | 4 | {milk, cheese} | 60 | | 5 | {bread, cheese} | 50 | | 6 | {eggs, cheese} | 40 |
We can use the following distance measure between closed patterns. Let P1 and P2 between closed patterns. Their supporting transaction sets are T(P1) and T(P2), respectively. The pattern distance of P1 and P2 , Pat Dist ( P1,P2 ), is defined asPat Dist (P1,P2) = 1 −|T(P1)∩T(P2)|/|T(P1)∪T(P2)|. Example 7.13 Pattern distance. Suppose P1 and P2 are two patterns such that T(P1) = {t1,t2,t3,t4,t5}and T(P2) = {t1,t2,t3,t4,t6}, where tiis a transaction in the database. The distance between P1 and P2 is Pat Dist (P1,P2) = 1 −4/6 =1/3
“Extracting redundancy-aware top-k patterns“ Imagine you have a big box of toys, and you want to find the most interesting ones, but you also want to make sure they are all different and not too similar. Identify Frequent Toys: First, you look at all the toys in the box and see which ones appear a lot. These are like the popular toys that many kids like. Decide How Important They Are: Then, you decide how important each toy is based on how many times it appears. Toys that appear a lot are more important. Check for Similar Toys: Next, you look at all the toys and see if any of them are very similar. For example, if you have two cars that look almost the same, you only keep one to avoid having too many similar toys. Pick the Best Ones: Finally, you pick the top toys based on how important they are and make sure they are all different and interesting. Make Sure They Are Unique: You want to make sure that each toy you pick is different from the others and that you don't have too many toys that are very similar. In the end, you end up with a selection of the best and most unique toys from your box, making sure you have a good variety and avoiding having too many toys that are too similar.