Lecture 3 Data Mining.pptx power points for graduates

Josephmwanika 73 views 20 slides May 31, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Graduate students interested in health informatics


Slide Content

ECU-M 213: HEALTH INFORMATICS By: Patience A. Jaffu Bsc Maths , CSC( Mak 2012) and MHI( Mak 2020) Lecture 3: Data Mining

Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer/patient relationships, reduce risks etc. Data mining finds its application in various domains such as health care, biomedical research, computer research, banking, etc . Data mining is very useful in extracting hidden information e.g interpreting the cause of heart disease.

Data Mining models include: Artificial Neural Networks Decision Trees: https:// www.researchgate.net/publication/11205595_Decision_Trees_An_Overview_and_Their_Use_in_Medicine Random Forests: Medical Ultrasound Image Classification. https:// www.researchgate.net/publication/343009844_Research_on_Application_of_Improved_Random_Forest_in_Medical_Ultrasound_Image_Classification Logistic Regression: U sed to predict  the  categorical dependent variable using a given set of independent variables: Study : Using Logistic Regression to Distinguish Between Fatty and Fibroid Masses in Medical Imaging (Ultrasound Image) Linear Regression: Used to predict the continuous dependent variable using a given set of independent variables

Designing ANNs in WEKA Breast cancer dataset: https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/diabetes.arff

ANNs use “ back propagation ,” which allows networks to adjust their hidden layers of neurons in situations where the outcome doesn’t match what the creator is hoping for like a network designed to recognize dogs, which misidentifies a cat.

Data Mining process

Data Mining process 1. Business understanding This is the first step in the DM process. Establish the goals of the project and how DM can help you reach that goal . Develop a plan that includes timelines , actions and role assignments. 2. Data Understanding: Data is collected from all applicable data sources .Data Visualization tools are also used in this step in order to explore the properties of the data to ensure it will help to achieve the business goals .

Data Mining process 3. Data preparation Data is then cleansed and missing data is included to ensure it is ready to be mined. Data processing can take enormous amounts of time depending on the amount of data analysed and the number of data sources. 4. Data Modeling Mathematical models are then used to find patterns in the data using sophisticated data tools.

5 . Evaluation The findings are evaluated and compared to business objectives to determine if they should be deployed across the organization 6. Deployment This is the last stage. The data mining findings are shared across everyday business operations. An enterprise business intelligence platform can be used to provide a single source of the truth for self-service data discovery.

Data Mining process

Benefits of Data Mining 1. Automated Decision Making DM allows organisations to continually analyze data and automate both routine and critical decisions without the delay of human judgment. DM models can collect , analyze , and act on data independently to streamline decision making and enhance the daily process of the organization. 2. Accurate prediction and forecasting. DM facilitates planning and provides managers with reliable forecasts based on past trends and current conditions. DM models are free of bias which can lead to diagnostic errors with fatal consequences for their patients.

3. Cost Reduction. DM allows for more efficient use and allocation of resources. Organization can plan and make automated decisions with accurate forecasts that will result in maximum cost reduction . E.g reducing the amount of unnecessary tests done, the amount of days spent in hospital care 4 . Patient insights. Organizations deploy DM from patients’ data to uncover key characteristics and differences among them. This is important in treatment selection.

Challenges in Data Mining 1. Big Data: Big data challenges are faced by each organization that collects ,stores and analyses data. Characteristics of Big data Volume describes the challenge of storing and processing the enormous quantity of data collected by organizations. This enormous amount of data presents two major challenges: first, it is more difficult to find the correct data, and second, it slows down the processing speed of data mining tools

Challenges in Data Mining cont’d Variety : This encompasses the many different types of data collected and stored. Data mining tools must be equipped to simultaneously process a wide array of data formats. Velocity: This details the increasing speed at which new data is created, collected, and stored. While volume refers to increasing storage requirement and variety refers to the increasing types of data, velocity is the challenge associated with the rapidly increasing rate of data generation. Value: A bility to extract value from the vast amounts of information , through processing and analysis

The use of big data in the healthcare industry has the potential to create a great amount of value for patients, medical practitioners, hospitals as well as governmental institutions . How? Big data has potential to allow healthcare to be more patient- centered and proactive Technological inventions that track certain aspects of an individual’s health make it possible to collect data that can build a profile about the general condition of an individual’s health

2. Noisy and Incomplete Data Mining is the way toward obtaining information from huge volumes of data. This present reality information is noisy, incomplete, and heterogeneous . Data in huge amounts regularly will be unreliable or inaccurate. These issues could be because of human mistakes blunders or errors in the instruments that measure the data.

Challenges in Data Mining cont’d 3. Cost of Scale: As data velocity continue to increase data’s volume and variety, firms must scale these models and apply them across the entire organization. This requires significant investment in computing, infrastructure and processing power e.g computers ,servers and software designed to handle the firm’s large quantity and variety of data. 4. Privacy and security: The increase storage requirement of data has forced many organisations to turn towards cloud computing and storage. While the cloud has empowered many modern advances in data mining, the nature of the service creates significant privacy and security threats. There is need to protect organization data . e.t.c

Other datasets Datasets: https://archive.ics.uci.edu/ml/datasets.html https ://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/ionosphere.arff http://repository.seasr.org/Data... http://tunedit.org/search?q=arff https://storm.cis.fordham.edu/~gweiss/data-mining/datasets.html
Tags