Introduccion a Pandas_cargar datos, modelar, analizar, manipular y prepararlos.pptpptx

ssuser36fa07 10 views 54 slides Sep 18, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

Pandas es una muy popular librería de código abierto dentro de los desarrolladores de Python, y sobre todo dentro del ámbito de Data Science y Machine Learning, ya que ofrece unas estructuras muy poderosas y flexibles que facilitan la manipulación y tratamiento de datos.


Slide Content

Introduction to pandas Dr. Noman Islam

Introduction It contains data structures and data manipulation tools designed to make data cleaning and analysis fast and easy in Python. Pandas is designed for working with tabular or heterogeneous data. NumPy , by contrast, is best suited for working with homogeneous numerical array data.

Series

Using indexes to access data

Filtering

Creating series from dictionary

Checking for null values

Joining two series

Dataframe

Modifying column values

Adding / deleting a new column

Transpose a dataframe

Slicing in dataframe

Values attribute

Indexing, selection and filtering

Dropping entries from axis

Axis parameter

Indexing and slicing

Selection with loc and iloc

Apply function

Function application and mapping

Descriptive statistics

Reading and writing data

Filtering out missing values

Filling missing data

Reading data in pandas

Optional arguments Indexing Can treat one or more columns as the returned DataFrame , and whether to get column names from the file, the user, or not at all. Type inference and data conversion This includes the user-defined value conversions and custom list of missing value markers. Datetime parsing Includes combining capability, including combining date and time information spread over multiple columns into a single column in the result. Iterating Support for iterating over chunks of very large files. Unclean data issues Skipping rows or a footer, comments, or other minor things like numeric data with thousands separated by commas.

Examples df = pd.read_csv ('examples/ex1.csv ') pd.read_table ('examples/ex1.csv', sep=',') pd.read_csv ('examples/ex2.csv', header=None ) pd.read_csv ('examples/ex2.csv', names=['a', 'b', 'c', 'd', 'message ']) names = ['a', 'b', 'c', 'd', 'message'] pd.read_csv ('examples/ex2.csv', names=names, index_col ='message')

Passing regular expression as separator result = pd.read_table ('examples/ex3.txt', sep='\s +') Skipping rows pd.read_csv ('examples/ex4.csv', skiprows =[0, 2, 3 ])

Handling null values

Reading text file in pieces

Iterating over chunk

Writing file

Working with delimited format

Writing csv data

JSON format

The pandas.read_json can automatically convert JSON datasets in specific arrange‐ ments into a Series or DataFrame The default options for pandas.read_json assume that each object in the JSON array is a row in the table

Web scraping The pandas.read_html function has a number of options, but by default it searches for and attempts to parse all tabular data contained within tags.

Saving in binary format

HD5 format

Excel format

Interacting with Web API

Interacting with database