Intro to Data Science and Data Wrangling.pptx

kunaltomarmu26 0 views 24 slides Sep 28, 2025
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

Intro to Data Science and Data Wrangling


Slide Content

Introduction to Data Science and Data Wrangling By - Vineeta Rathore

What is Data Science? Key Points: Study of data to derive useful insights for business decision-making. Combines mathematics, computer science, and domain expertise to tackle real-world challenges. Processes raw data to solve business problems and make predictions about future trends.

Why it Matters? (Need for Data Science) Crucial for organizations to extract meaningful insights from vast amounts of data. Drives better decision-making and problem-solving across various industries. Essential for navigating complexities of the modern, data-driven world. Helps businesses optimize operations, anticipate trends, and personalize experiences. Example questions Data Science can answer: "What do customers want?", "How can we improve our service?", "What will be the upcoming trends in sales?", "How much stock is needed for the upcoming festival?".

Hands-On with Basic Data Science Operations Data Exploration and Summarization: Core Libraries: Pandas, NumPy Key Operations: Loading and Inspecting Data (Operation 1) - You'll almost always start by loading a dataset (commonly from a CSV file) into a Pandas DataFrame and performing initial inspections.

Descriptive Statistics (Operation 2) -

Data Cleaning and Preprocessing: Raw data is rarely clean. We have to identify and handle common data quality issues. Core Libraries: Pandas, NumPy Key Operations: Handling Missing Values (Operation 1) - This is one of the most common data cleaning tasks.

Handling Duplicates (Operation 2) - Duplicate records can skew your analysis and model training.

Data Selection and Manipulation: You'll often need to select specific subsets of your data or manipulate it to create new columns or structures. Core Libraries: Pandas Key Operations: Selecting Data with loc and iloc (Operation 1) - Understanding the difference between label-based indexing (loc) and integer-based indexing (iloc) is fundamental.

Applying Functions (Operation 2) - You can apply functions to a DataFrame to perform custom transformations.

Grouping and Aggregating (Operation 3) - The groupby operation is powerful for calculating statistics on different segments of your data.

Data Visualization: Visualizing your data is key to understanding patterns and communicating your findings. Core Libraries: Matplotlib, Seaborn Key Operations: Histograms and Box Plots (Operation 1) - For understanding the distribution of a single variable.

Scatter and Line Plots (Operation 2) - For exploring the relationship between two variables.
Tags