Lecture 3 Data Mining.pptx power points for graduates
Josephmwanika
73 views
20 slides
May 31, 2024
Slide 1 of 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
About This Presentation
Graduate students interested in health informatics
Size: 1.31 MB
Language: en
Added: May 31, 2024
Slides: 20 pages
Slide Content
ECU-M 213: HEALTH INFORMATICS By: Patience A. Jaffu Bsc Maths , CSC( Mak 2012) and MHI( Mak 2020) Lecture 3: Data Mining
Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer/patient relationships, reduce risks etc. Data mining finds its application in various domains such as health care, biomedical research, computer research, banking, etc . Data mining is very useful in extracting hidden information e.g interpreting the cause of heart disease.
Data Mining models include: Artificial Neural Networks Decision Trees: https:// www.researchgate.net/publication/11205595_Decision_Trees_An_Overview_and_Their_Use_in_Medicine Random Forests: Medical Ultrasound Image Classification. https:// www.researchgate.net/publication/343009844_Research_on_Application_of_Improved_Random_Forest_in_Medical_Ultrasound_Image_Classification Logistic Regression: U sed to predict the categorical dependent variable using a given set of independent variables: Study : Using Logistic Regression to Distinguish Between Fatty and Fibroid Masses in Medical Imaging (Ultrasound Image) Linear Regression: Used to predict the continuous dependent variable using a given set of independent variables
Designing ANNs in WEKA Breast cancer dataset: https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/diabetes.arff
ANNs use “ back propagation ,” which allows networks to adjust their hidden layers of neurons in situations where the outcome doesn’t match what the creator is hoping for like a network designed to recognize dogs, which misidentifies a cat.
Data Mining process
Data Mining process 1. Business understanding This is the first step in the DM process. Establish the goals of the project and how DM can help you reach that goal . Develop a plan that includes timelines , actions and role assignments. 2. Data Understanding: Data is collected from all applicable data sources .Data Visualization tools are also used in this step in order to explore the properties of the data to ensure it will help to achieve the business goals .
Data Mining process 3. Data preparation Data is then cleansed and missing data is included to ensure it is ready to be mined. Data processing can take enormous amounts of time depending on the amount of data analysed and the number of data sources. 4. Data Modeling Mathematical models are then used to find patterns in the data using sophisticated data tools.
5 . Evaluation The findings are evaluated and compared to business objectives to determine if they should be deployed across the organization 6. Deployment This is the last stage. The data mining findings are shared across everyday business operations. An enterprise business intelligence platform can be used to provide a single source of the truth for self-service data discovery.
Data Mining process
Benefits of Data Mining 1. Automated Decision Making DM allows organisations to continually analyze data and automate both routine and critical decisions without the delay of human judgment. DM models can collect , analyze , and act on data independently to streamline decision making and enhance the daily process of the organization. 2. Accurate prediction and forecasting. DM facilitates planning and provides managers with reliable forecasts based on past trends and current conditions. DM models are free of bias which can lead to diagnostic errors with fatal consequences for their patients.
3. Cost Reduction. DM allows for more efficient use and allocation of resources. Organization can plan and make automated decisions with accurate forecasts that will result in maximum cost reduction . E.g reducing the amount of unnecessary tests done, the amount of days spent in hospital care 4 . Patient insights. Organizations deploy DM from patients’ data to uncover key characteristics and differences among them. This is important in treatment selection.
Challenges in Data Mining 1. Big Data: Big data challenges are faced by each organization that collects ,stores and analyses data. Characteristics of Big data Volume describes the challenge of storing and processing the enormous quantity of data collected by organizations. This enormous amount of data presents two major challenges: first, it is more difficult to find the correct data, and second, it slows down the processing speed of data mining tools
Challenges in Data Mining cont’d Variety : This encompasses the many different types of data collected and stored. Data mining tools must be equipped to simultaneously process a wide array of data formats. Velocity: This details the increasing speed at which new data is created, collected, and stored. While volume refers to increasing storage requirement and variety refers to the increasing types of data, velocity is the challenge associated with the rapidly increasing rate of data generation. Value: A bility to extract value from the vast amounts of information , through processing and analysis
The use of big data in the healthcare industry has the potential to create a great amount of value for patients, medical practitioners, hospitals as well as governmental institutions . How? Big data has potential to allow healthcare to be more patient- centered and proactive Technological inventions that track certain aspects of an individual’s health make it possible to collect data that can build a profile about the general condition of an individual’s health
2. Noisy and Incomplete Data Mining is the way toward obtaining information from huge volumes of data. This present reality information is noisy, incomplete, and heterogeneous . Data in huge amounts regularly will be unreliable or inaccurate. These issues could be because of human mistakes blunders or errors in the instruments that measure the data.
Challenges in Data Mining cont’d 3. Cost of Scale: As data velocity continue to increase data’s volume and variety, firms must scale these models and apply them across the entire organization. This requires significant investment in computing, infrastructure and processing power e.g computers ,servers and software designed to handle the firm’s large quantity and variety of data. 4. Privacy and security: The increase storage requirement of data has forced many organisations to turn towards cloud computing and storage. While the cloud has empowered many modern advances in data mining, the nature of the service creates significant privacy and security threats. There is need to protect organization data . e.t.c
Other datasets Datasets: https://archive.ics.uci.edu/ml/datasets.html https ://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/ionosphere.arff http://repository.seasr.org/Data... http://tunedit.org/search?q=arff https://storm.cis.fordham.edu/~gweiss/data-mining/datasets.html