Moving from Data Scientist To Data Analyst (1).pptx
hrvaluebound
0 views
10 slides
Sep 28, 2025
Slide 1 of 10
1
2
3
4
5
6
7
8
9
10
About This Presentation
Moving from Data Scientist To Data Analyst (1).pptx
Size: 393.02 KB
Language: en
Added: Sep 28, 2025
Slides: 10 pages
Slide Content
Moving from Data Scientist To Data Analyst Presented By :- Anchal Singh Roll number:- BCA2302419
SQL Introduction SQL (Structured Query Language) is a standard language used for accessing and manipulating data in relational databases, where information is stored in tables consisting of rows and columns.
RELATIONAL DATABASE A relational database is a type of database that organizes data into tables consisting of rows and columns, where each table stores data about a specific entity, and relationships are established between these tables using keys.
Common SQL Structures Databases : The highest-level structure, containing one or more tables and related objects. Tables : Fundamental building blocks where data is stored in rows (records) and columns (attributes) Columns (Attributes) : Define the type of data that each field in a table can hold Rows (Records) : Individual data entries within a table. Primary Keys : Uniquely identify each row in a table to ensure data integrity. Foreign Keys : Establish relationships between tables by referencing primary keys in other tables. Indexes : Structures that speed up data retrieval by enabling quick searches on specific columns. Queries : SQL statements used to retrieve, insert, update, or delete data in the database
Using SQL Data Types Numeric: INT, FLOAT String/Text: CHAR, VARCHAR Date/Time: DATE, DATETIME,TIME, YEAR Boolean: BOOL Identifying The Wrong Data Types Storing numeric data (like age or salary) in VARCHAR or CHAR columns, which prevents mathematical operations. Using a DATE type for time-only data, or vice versa. Assigning a TEXT/BLOB type to data that should fit in VARCHAR, causing unnecessary storage use
FINDING MISSING VALUES To find missing (NULL) values in a column: SELECT * FROM table_name WHERE column_name IS NULL; To find rows where the value is present (not missing): SELECT * FROM table_name WHERE column_name IS NOT NULL; Data Type Checks: Using functions like ISDATE() to validate if values match expected formats, e.g., dates. Range Checks: Using conditions in SQL queries to ensure values fall within valid ranges. For example, filtering scores to check if they are between 0 and 100 Pattern Matching: Using LIKE or regular expressions to detect values that violate expected formats or patterns, such as alphanumeric codes of a certain structure FINDING POSSIBLE INACCURATE DATA VALUE
WORKING WITH DATES Dates are stored in columns using date/time data types like DATE, DATETIME, TIMESTAMP, TIME, depending on the database. Exact date format depends on the DBMS but is generally YYYY-MM-DD for dates, YYYY-MM-DD HH:MM:SS for timestamps. Exact date match: SELECT * FROM Orders WHERE OrderDate = '2025-09-13’; Date range filtering using BETWEEN: SELECT * FROM Orders WHERE OrderDate BETWEEN '2025-09-01' AND '2025-09-13’; Filtering dates before or after a given date: SELECT * FROM Orders WHERE OrderDate > '2025-09-01'; SELECT * FROM Orders WHERE OrderDate < '2025-09-13'; Common Concepts for SQL Dates FILTERING DATA BY DATES
Common SQL string functions Function Description CONCAT() Concatenates two or more strings CHAR_LENGTH() / LENGTH() Returns length of a string in characters or bytes UPPER() Converts string to uppercase LOWER() Converts string to lowercase Function Description COUNT() Counts rows or non-null values SUM() Calculates the total sum of a numeric column AVG() Calculates the average of a numeric column MIN() Finds the minimum value in a column MAX() Finds the maximum value in a column Common SQL Aggregate Functions
INTRODUCTION TO DATA MANIPULATION Data manipulation in SQL refers to the process of querying, adding, modifying, and deleting data stored within a database. It is performed using commands categorized under Data Manipulation Language (DML), which is a subset of SQL. Key DML Operations: SELECT: Retrieves data from a database table or multiple tables using queries. INSERT: Adds new rows (records) into a database table. UPDATE: Modifies existing data in one or more columns of a table. DELETE: Removes rows from a table based on specified conditions.
GETTING STARTED WITH DATA VISUALIZATION Introduction Data visualization is the graphical representation of information and data using visual elements like charts, graphs, maps, and dashboards. It helps make complex data easier to understand and interpret by revealing patterns, trends, and relationships that might be hard to detect in raw data Steps for Creating Data Visualization with SQL Formulate Your Question or Goal Write SQL Queries to Extract Data Choose a Visualization Tool Create Visuals Within the Tool Interpret and Share