python_data_science_guide_20250903054121.pptx

nagasaipavanjandrama 9 views 12 slides Sep 16, 2025
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

data science


Slide Content

Python Library Guide Python Data Science Complete Guide A comprehensive presentation covering Python modules, NumPy arrays, SciPy operations, Pandas DataFrames, and Seaborn visualizations with practical code examples Presented by [Your Name] Date September 3, 2025 Made with Genspark

Built-in Modules in Python Explore core built- in modules: os, sys, math, datetime, random. Built- in Modules Examples Key Points: The os module provides functions for interacting with the operating system The sys module offers access to Python interpreter variables The math module includes mathematical functions and constants The datetime module handles date and time operations The random module generates pseudo- random numbers import os print ( 'Current directory:' , os.getcwd()) import sys print ( 'Python version:' , sys.version) import math print ( 'Square root of 16:' , math.sqrt( 16 )) import datetime print ( 'Today:' , datetime.date.today()) import random print ( 'Random number:' , random.randint( 1 , 10 )) Made with Genspark Page 2 Python Data Science Complete Guide

User Defined Modules Creating and using your own Python modules enhances code organization and reusability. Creating a Simple Module (mymodule.py) Importing and Using the Module Benefits of User Defined Modules: Reusability: Write code once, use it many times in different programs Organization: Group related functions and variables together Namespace Management: Avoid naming conflicts in larger projects Maintainability: Easier to debug and update modularized code Distribution: Share your modules with other developers # File: mymodule.py def add (a, b): return a + b def subtract (a, b): return a - b # Variables can be part of modules too PI = 3.14159 # Import the entire module import mymodule print ( "5 + 3 =" , mymodule.add( 5 , 3 )) print ( "5 - 3 =" , mymodule.subtract( 5 , 3 )) print ( "PI value:" , mymodule.PI) # Or import specific functions from mymodule import add, PI print ( "7 + 2 =" , add( 7 , 2 )) print ( "PI directly:" , PI) Made with Genspark Page 3 Python Data Science Complete Guide

NumPy Arrays Overview NumPy is fundamental for numerical computing in Python. Arrays are the core data structure. NumPy Array Creation Examples Array Properties: Shape : Dimensions of the array array.shape Size : Total number of elements array.size Ndim : Number of dimensions array.ndim Special Arrays: Identity matrix : np.eye(3) Random values : np.random.rand(2,3) Random integers : np.random.randint(10, size=5) import numpy as np arrays from lists np.array([1, 2, 3, 4, 5]) # Creating array_1d = array_2d = np.array([[1, 2, print (array_1d) print (array_2d) # [1 2 3 4 # [[1 2 3] 3], [4, 5, 6]]) 5] [4 5 6]] # Creating arrays with specific values zeros = np.zeros(( 3 , 4 )) ones = np.ones( 5 ) empty = np.empty(( 2 , 3 )) # 3x4 array of zeros # Array of 5 ones # Uninitialized array linspace = np.linspace( , 1 , # Ranges and sequences range_arr = np.arange( , 10 , 2 ) 5 ) # [0, 2, 4, 6, 8] # 5 evenly spaced values from to 1 Dtype : Data type of elements array.dtype Constants : np.full((2,2), 7) Made with Genspark Page 4 Python Data Science Complete Guide

NumPy Array Indexing Access and manipulate array elements using indexing and slicing techniques. Array Indexing Examples Key Indexing Techniques: Basic indexing : Access specific elements with [row, col] Slicing : Extract ranges with start:stop syntax Boolean indexing : Filter arrays using conditions Important Notes: Indexing starts at (zero- based indexing) Slices include start but exclude stop index Array views share memory with original arrays import numpy as np # Create a sample array arr = np.array([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]]) # Get element at row 0, column 1 (value: 2) # Get entire row 1 ([4, 5, 6]) # Get entire column 2 ([3, 6, 9]) 1 : 3 ] # Get 2x2 submatrix ([[2, 3], [5, 6]]) # Basic indexing element = arr[ , 1 ] # Slicing arrays row = arr[ 1 , :] column = arr[:, 2 ] submatrix = arr[ : 2 , # Boolean indexing mask = arr > 5 # Create boolean mask filtered = arr[mask] # Get elements > 5 ([6, 7, 8, 9]) Fancy indexing : Select elements with integer arrays Use arr.copy() to create independent copies Made with Genspark Page 5 Python Data Science Complete Guide

NumPy Datatypes NumPy supports various data types for optimization and memory efficiency. NumPy Datatype Examples Common NumPy Datatypes: np.int8, np.int16, np.int32, np.int64 - integers of different sizes np.uint8, np.uint16, np.uint32, np.uint64 - unsigned integers np.float16, np.float32, np.float64 - floating point numbers np.bool_ - Boolean (True/False) values Benefits of NumPy Datatypes: Memory efficiency - choose optimal size for your data Performance - operations on appropriate types are faster Compatibility - interface with low-level libraries Precision control - specify exact numerical precision import numpy as np # Basic datatypes int_arr = np.array([1, 2, 3], dtype=np.int32) float_arr = np.array([1.0, 2.0, 3.0], dtype=np.float64) bool_arr = np.array([True, False, True], dtype=np.bool_) # Check datatypes print ( 'Integer array dtype:' , int_arr.dtype) print ( 'Float array dtype:' , float_arr.dtype) print ( 'Boolean array dtype:' , bool_arr.dtype) # Convert datatypes converted = int_arr.astype(np.float32) print ( 'Converted dtype:' , converted.dtype) # Memory usage comparison print ( 'int32 size:' , int_arr.itemsize, 'bytes' ) print ( 'float64 size:' , float_arr.itemsize, 'bytes' ) np.complex64, np.complex128 - complex numbers Type safety - prevent type- related bugs Made with Genspark Page 6 Python Data Science Complete Guide

NumPy Array Math Perform arithmetic and mathematical operations efficiently on NumPy arrays. Array Math Operations Element- wise Operations NumPy operations are vectorized - they operate on all array elements at once Much faster than Python loops for large datasets Supports all standard arithmetic operators: + , - , * , / , ** (power), // (floor division) Mathematical Functions Universal functions ( ufuncs ) for element-wise operations Trigonometric: sin , cos , tan Exponential: exp , log , log10 Statistical: sum , mean , std , min , max Linear algebra functions through np.linalg module import numpy as np # Basic arithmetic operations a = np.array([ 1 , 2 , 3 ]) b = np.array([ 4 , 5 , 6 ]) print ( 'Addition:' , a + b) print ( 'Subtraction:' , b - a) print ( 'Multiplication:' , a * print ( 'Division:' , b / a) # [5 7 9] # [3 3 3] b) # [4 10 18] # [4. 2.5 2.] # Advanced mathematical functions print ( 'Square root:' , np.sqrt(a)) print ( 'Exponential:' , np.exp(a)) print ( 'Sine:' , np.sin(a)) # Aggregation operations print ( 'Sum:' , np.sum(b)) print ( 'Mean:' , np.mean(b)) print ( 'Standard deviation:' , # 15 # 5.0 np.std(b)) Also includes % (modulo) and bit- wise operations Made with Genspark Page 7 Python Data Science Complete Guide

NumPy Broadcasting Broadcasting allows arithmetic operations between arrays of different shapes without unnecessary copying. Broadcasting Examples Broadcasting Rules: Arrays are compared starting from trailing dimensions Dimensions must be equal, or one must be 1 Missing dimensions are treated as having size 1 Benefits: Eliminates unnecessary memory allocation Improves computational efficiency Makes code more concise and readable import numpy as np # Example 1: Scalar + Array a = np.array([1, 2, 3]) b = 5 print ( 'a + b:' , a + b) # [6 7 8] # Example 2: Arrays with compatible dimensions x = np.array([[1, 2, 3], [4, 5, 6]]) # Shape: (2, 3) y = np.array([[10], [20]]) # Shape: (2, 1) print ( 'x + y:' ) print (x + y) # Broadcasting y to shape (2, 3) # Example 3: Array with different dimensions p = np.zeros((3, 4, 5)) q = np.zeros((4, 5)) r = p + q # q is broadcast to shape (3, 4, 5) print ( 'r.shape:' , r.shape) Smaller arrays are "stretched" to match larger ones Enables vectorized operations on arrays of different shapes Made with Genspark Page 8 Python Data Science Complete Guide

SciPy Image Operations SciPy extends NumPy with advanced functions, including powerful image processing capabilities. Image Processing Examples as plt # Basic image processing from scipy import ndimage import matplotlib.pyplot import numpy as np from scipy import misc # Loading and displaying an image face = misc.face(gray= True ) # Sample image # Applying Gaussian filter (blurring) blurred = ndimage.gaussian_filter(face, sigma= 5 ) # Edge detection using Sobel filter sobel_x = ndimage.sobel(face, axis= ) sobel_y = ndimage.sobel(face, axis= 1 ) edge_detected = np.hypot(sobel_x, sobel_y) # Rotating an image rotated = ndimage.rotate(face, 45 ) # Display the results plt.figure(figsize=( 12 , 8 )) plt.subplot( 221 ), plt.imshow(face, cmap= 'gray' ) plt.title( 'Original' ) plt.subplot( 222 ), plt.imshow(blurred, cmap= 'gray' ) plt.title( 'Blurred' ) plt.subplot( 223 ), plt.imshow(edge_detected, cmap= 'gray' ) plt.title( 'Edge Detection' ) plt.subplot( 224 ), plt.imshow(rotated, cmap= 'gray' ) plt.title( 'Rotated 45°' ) plt.tight_layout() plt.show() Key SciPy Image Processing Features: Filtering : Gaussian, median, Sobel filters for denoising and edge detection Morphology : Binary operations like erosion, dilation, opening, and closing Transformations : Rotate, zoom, shift, and affine transformations Measurements : Extract features from labeled images Interpolation : Various interpolation methods for image resizing Made with Genspark Page 9 Python Data Science Complete Guide

Pandas DataFrame Operations Pandas enables convenient data manipulation with DataFrames - the main data structure for tabular data analysis. Pandas DataFrame Examples Common DataFrame Operations: Creation: pd.DataFrame(data) Reading data: pd.read_csv('file.csv') Column selection: df['column'] or df.column Row selection: df.loc[] , df.iloc[] Filtering: df[df.column > value] Advanced Operations: Merging: pd.merge(df1, df2) Grouping: df.groupby('column') Pivoting: df.pivot_table() Applying functions: df.apply(func) Handling missing data: df.fillna(), df.dropna() import pandas as pd import numpy as np # Creating a DataFrame data = { 'Name' : [ 'Alice' , 'Bob' , 'Charlie' ], 'Age' : [ 25 , 30 , 35 ], 'Chicago' ]} 'City' : [ 'New York' , 'Boston' , df = pd.DataFrame(data) print (df.head()) # Selecting columns print (df[ 'Name' ]) # Adding a new column df[ 'Salary' ] = [ 60000 , 70000 , 80000 ] > 65000 ] # Filtering data high_salary = df[df[ 'Salary' ] print (high_salary) # Basic statistics print (df.describe()) Made with Genspark Page 10 Python Data Science Complete Guide

Seaborn Visualization Seaborn simplifies statistical data visualization with beautiful defaults and seamless integration with Pandas. Multiple Seaborn Plot Examples Bar Plot Statistical visualization with error bars Regression Plot Scatter with fitted regression line Heatmap Color- encoded matrix visualization Key Seaborn Features: Built on Matplotlib with tight Pandas integration Beautiful default aesthetics for statistical visualizations High- level functions for common visualization patterns (catplot, relplot) Support for categorical data visualization (boxplot, violinplot, stripplot) Automatic color palette management with customization options import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd # Load example datasets tips = sns.load_dataset( 'tips' ) flights = sns.load_dataset( 'flights' ) # Example 1: Bar plot with error bars plt.figure(figsize=(8, 5)) sns.barplot(x= 'day' , y= 'total_bill' , data=tips) plt.title( 'Average Bill by Day' ) # Example 2: Scatter plot with regression line plt.figure(figsize=(8, 5)) sns.regplot(x= 'total_bill' , y= 'tip' , data=tips) plt.title( 'Tips vs Total Bill with Regression' ) # Example 3: Heatmap for correlation matrix plt.figure(figsize=(8, 6)) corr = tips.corr() sns.heatmap(corr, annot= True , cmap= 'coolwarm' ) plt.title( 'Correlation Heatmap' ) plt.tight_layout() plt.show() Made with Genspark Page 11 Python Data Science Complete Guide

Summary & Key Takeaways Foundation Libraries Python's data science ecosystem is robust and versatile NumPy and Pandas are foundational for numerical and table data Advanced Analytics SciPy and Seaborn add advanced analytics and visualization Built- in and user modules provide flexibility and scalability Best Practices Practice and explore these core tools for effective data science in Python! Python Data Processing Visualization Made with Genspark Page 12 Python Data Science Complete Guide
Tags