about types of data used in machine learning.pptx

annupriya1295 40 views 19 slides Aug 30, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

data types in machine learning


Slide Content

Statistical Machine Learning

Structured data and Unstructured data

Methods of Data Collection Structured Data is organized in a predefined format, often in rows and columns like a database or spreadsheet . This data is typically labeled, meaning that the features (attributes) and their relationships are well-defined and easily accessible. Examples: Databases (SQL tables), Spreadsheets (Excel files), CSV files etc. Characteristics: Format: Tabular, with a clear schema (rows and columns). Labels: Often comes with labeled features and, in supervised learning, labeled outcomes. Ease of Use: Easier to analyze and apply machine learning algorithms to, due to its well-organized nature. Common Algorithms: Regression, Decision Trees, Random Forests, Gradient Boosting Machines (GBMs), K-Nearest Neighbors (KNN), Support Vector Machines (SVMs). Use Cases in Machine Learning: Predictive Modeling: Predicting sales, prices, or customer behavior based on historical data. Classification: Categorizing emails as spam or not, based on labeled training data. Clustering: Segmenting customers into different groups based on purchasing behavior. Structured data

Methods of Data Collection Unstructured data Unstructured data lacks a predefined structure or format , making it more complex to analyze. It doesn’t fit neatly into traditional rows and columns and may require preprocessing to be usable in machine learning. Examples: Text data (emails, documents, social media posts) Images (photographs, medical scans) Audio files (voice recordings, music) Video files (movies, surveillance footage) Characteristics: Format: Unorganized and non-tabular, often in the form of text, images, or multimedia. Preprocessing Needs: Requires significant preprocessing (e.g., natural language processing for text, image recognition techniques for images) before it can be used in machine learning models. Common Algorithms: Convolutional Neural Networks (CNNs) for image processing, Recurrent Neural Networks (RNNs) and Transformers for text and sequence data, autoencoders, Generative Adversarial Networks (GANs). Use Cases in Machine Learning: Natural Language Processing (NLP): Sentiment analysis, text classification, machine translation. Computer Vision: Image classification, object detection, facial recognition. Speech Recognition: Transcribing spoken language into text, voice-based authentication. Multimedia Analysis: Video content analysis, automated video editing, and summarization .

Methods of Data Collection Choosing Between Structured and Unstructured Data Structured Data is often used in traditional machine learning applications where the data is already well-organized and labeled, making it suitable for models that require clearly defined features and outcomes. Unstructured Data is increasingly important in modern applications, particularly with the rise of deep learning. Unstructured data is prevalent in real-world scenarios, such as social media, customer feedback, multimedia, and more, and it often holds valuable insights that structured data cannot provide. * We will focus on structured data with Traditional machine learning algorithms here.

Methods of Data Collection Semi structured Data D oes not reside in a traditional relational database (like SQL) but still has some organizational properties, S uch as tags or markers, that make it easier to analyze than completely unstructured data. Doesn't follow a strict schema like structured data, B ut it still contains elements like labels or keys that make the data identifiable and searchable Examples of Semi-Structured Data: JSON (JavaScript Object Notation) XML ( eXtensible Markup Language) CSV files with inconsistent rows Emails (with structured headers and unstructured body text) Sensor data from IoT devices HTML web pages https://www.geeksforgeeks.org/dbms/what-is-semi-structured-data/

Different Sources of Data: Surveys and Questionnaires Source Type: Primary Data Examples: Customer feedback surveys, Demographic questionnaires.

Different Sources of Data: Sensor Data: Source Type: Primary Data Examples: Temperature sensors, Motion detectors, Environmental sensors . Social Media Posts: Source Type: Unstructured Data Examples: Twitter posts, Facebook updates, user comments.

Different Sources of Data: Web Data: Source Type: Primary Data Examples: Website traffic data, user behavior analytics.

Transaction Records: Source Type: Structured Data Examples: Sales transactions, Financial records. Different Sources of Data:

Text Documents: Source Type: Unstructured Data Examples: Articles, reports, emails. Different Sources of Data:

Image Data Sources: Surveillance Cameras Satellite Imagery Medical Imaging Social Media Photos Different Sources of Data:

Different Sources of Data: Benchmark Databases

Data source from API

Description: APIs from financial institutions or market data providers, delivering stock prices, market indices, and financial news. Use Case: Stock market analysis, algorithmic trading. Financial Market Data API:

Description: APIs provided by platforms like Twitter, Facebook, or Instagram, allowing access to user data, posts, and engagement metrics. Use Case: Social media analytics, sentiment analysis . Social media API: Here are some popular social media APIs along with their official documentation links: 1. Twitter API (v2) : Allows developers to access Twitter data, including tweets, user profiles, and trends. https://developer.x.com/en/docs/twitter-api 2. Facebook Graph API: Provides access to Facebook's social graph, allowing interaction with Facebook users, pages, and more. https://facepager.software.informer.com/3.6/ 3. Instagram Graph API: Enables interaction with Instagram's platform for business profiles, offering insights, media management, and more. https://developers.facebook.com/docs/instagram-platform/instagram-api-with-facebook-login

Maps and Geolocation API: Description: APIs offered by mapping services like Google Maps, providing geolocation data, directions, and place information. Use Case: Location-based services, route planning. Example: 1. Google Maps Geocoding API: 2. Open Cage Data Geocoding API: 3. Map box Geocoding API: 4. Here Geocoding API: 5. Location IQ Geocoding API:

E-commerce Product API: Description: APIs from e-commerce platforms, granting access to product details, prices, and customer reviews. Use Case: Price comparison, product recommendations . News and Media APIs: Description: APIs from news organizations, offering access to headlines, articles, and multimedia content. Use Case: News aggregation, sentiment analysis. Example:   Newscatcher API   Bing News Search API   Google News API   NewsomaticAPI   News Article Data Extract and Summarization API   News API   News Sentiment API

Further Readings: UCI machine learning repository: https://archive.ics.uci.edu/ Kaggle dataset: https://www.kaggle.com/datasets AWS public dataset: https://registry.opendata.aws/ Govt. dataset (public): https://data.gov/ One of the Healthcare API: https://developer.apple.com/documentation/healthkit
Tags