Data Reduction Stratergies

854 views 14 slides Jul 30, 2021

Slide 1 of 14

About This Presentation

Application of Data Reduction Stratergies

Size: 97.56 KB

Language: en

Added: Jul 30, 2021

Slides: 14 pages

Slide Content

DATA REDUCTION STRATEGIES DATA CUBE AGGREGATION ATTRIBUTE SUBSET SELECTION

Why data reduction? Huge amount of data is being created day by day. Development of big data platform. Poor performance of old algorithms. Most of the data mining algorithms are column wise implemented. Pushed for data reduction procedures.

What is data reduction? Data reduction is a process that reduced the volume of original data and represents it in a much smaller volume. It maintains the integrity of the data while reducing. The time required for data reduction should not overshadow the the time saved by data mining on the reduced data set. Data reduction does not affect the result obtained from data mining . Data reduction increases the efficiency of data mining.

Data reduction strategies Data cube aggregation Attribute subset selection Dimensionality reduction Numerosity reduction Discretization and concept hierarchy generation

Data Cube Aggregation This technique is used to aggregate (combine) data in a simpler form. So we can summarize the data in such a way that the data is used as result

Data Cube Aggregation The data is given of states and their profit earned in dollars for selling laptops in each country in different tables by each state .

States Gross Profit($) Arizona 500 Texas 320 Illanoid 430 States Gross Profit($) Kerala 245 Tamil Nadu 380 Goa 950 States Gross Profit($) Alberta 420 Manitoba 200 Ontario 300 Country Gross Profit($) USA 1250 India 1575 Canada 920 Country USA Country Canada Country India

Attribute Subset Selection From a large number of attributes a minimal attribute set is being reduced by eliminating the irrelevant attributes that may not much affect the data . Mining of reduced data makes it easier to understand.

Methods of Attribute Subset Selection are: Stepwise Forward Selection- It starts with an empty set and add the relevant attributes ignoring the rest. Step-wise backward elimination –It starts with full set and removes the irrelevant attributes keeping the rest. Combining forward selection and backward elimination -select the best and removes the worst Decision-tree induction -It is a flowchart like structure to choose best attribute to partition data.

Example A data set is given from which we need to segregate the number of male, female and transgender individuals who are eligible for voting. Initial Attribute Set ={ Name, Age, Gender, Address, Phone}

Forward Selection Initial attribute set ={ Name, Age, Gender, Address, Phone} Initial Reduced Set =>{ } =>{ Age } =>{Age, Gender } Reduced attribute set =>{ Age ,Gender}

Backward Elimination Initial Attribute Set => { Name, Age, Gender, Address, Phone } Initial Reduced Set => { Name, Age, Gender, Address, Phone } => { Age, Gender, Address, Phone } => { Age, Gender, Phone } => { Age, Gender } Reduced Attribute Set => { Age, Gender }

Decision Tree Induction Initial attribute={ Name,Age,Gender,Address,Phone } >=18 <18 Reduced attribute set={Age ,Gender}

Data Reduction Stratergies

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Data Reduction Stratergies

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx