yerrasaniayyapparedd
52 views
73 slides
Aug 02, 2024
Slide 1 of 73
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
About This Presentation
routing and switching room at Cisco, a dream was born...
Today, Cisco certifications are the gold standard in IT training. We’ve issued more than 4 million certifications so far. In the next 30 years, we aim to train over 10 million more people in our pledge to close the IT skills gap and reshape ...
routing and switching room at Cisco, a dream was born...
Today, Cisco certifications are the gold standard in IT training. We’ve issued more than 4 million certifications so far. In the next 30 years, we aim to train over 10 million more people in our pledge to close the IT skills gap and reshape diversity in the tech industry.
99%
of organizations surveyed use technical certifications to make hiring decisions.
91%
of employers believe IT certifications are a reliable predictor of a successful employee.
Size: 1.04 MB
Language: en
Added: Aug 02, 2024
Slides: 73 pages
Slide Content
Unit3 Customizing Plots: Introduction to Matplotlib , Plots , making subplots, controlling axes, Ticks, Labels & legends, annotations and Drawing on subplots, saving plots to files, matplotlib configuration using different plot styles, Seaborn library . Making sense of data through advanced visualization : Controlling line properties of chart, creating multiple plots , Scatter plot, Line plot, bar plot, Histogram, Box plot, Pair plot, playing with text, styling your plot, 3d plot of surface
Introduction to Matplotlib Plotting and Visualization Making informative visualizations (sometimes called plots) is one of the most important tasks in data analysis. It may be a part of the exploratory process—for example, to help identify outliers or needed data transformations, or as a way of generating ideas for models. There are two primary uses for data visualization: To explore data To communicate data matplotlib API Primer: Matplotlib is a low level graph plotting library in python that serves as a visualization utility .
Installation of Matplotlib If you have Python and PIP already installed on a system, then installation of Matplotlib is very easy. Install it using this command: > pip install matplotlib Most of the Matplotlib utilities lies under the pyplot submodule , and are usually imported under the plt alias : With matplotlib , we use the following import convention: import matplotlib.pyplot as plt
Plotting x and y points The plot() function is used to draw points (markers) in a diagram. By default, the plot() function draws a line from point to point. The function takes parameters for specifying points in the diagram. Parameter 1 is an array containing the points on the x-axis. Parameter 2 is an array containing the points on the y-axis. If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to the plot function . The x-axis is the horizontal axis. The y-axis is the vertical axis.
Figures and Subplots Plots in matplotlib reside within a Figure object. You can create a new figure with plt.figure : fig = plt.figure () In IPython , an empty plot window will appear, but in Jupyter nothing will be shown until we use a few more commands. plt.figure has a number of options; notably, figsize will guarantee the figure has a certain size and aspect ratio if saved to disk. You can’t make a plot with a blank figure. You have to create one or more subplots using add_subplot : ax1 = fig.add_subplot (2, 2, 1) This means that the figure should be 2 × 2 (so up to four plots in total), and we’re selecting the first of four subplots (numbered from 1).
If you create the next two sub plots, you’ll end up with a visualization that looks like Figure 9-2: In [18]: ax2 = fig.add_subplot (2, 2, 2) In [19]: ax3 = fig.add_subplot (2, 2, 3)
If we add the following command, you’ll get something like Figure 9-3 : plt.plot ( np.random.randn (50 ). cumsum (), 'k- -') The 'k--' is a style option instructing matplotlib to plot a black dashed line.
Adjusting the spacing around subplots By default matplotlib leaves a certain amount of padding around the outside of the subplots and spacing between subplots . You can change the spacing using the subplots_adjust method on Figure objects, also avail able as a top-level function: subplots_adjust (left=None, bottom=None, right=None, top=None, wspace =None, hspace =None ) wspace and hspace controls the percent of the figure width and figure height, respectively , to use as spacing between subplots . plt.subplots_adjust ( wspace =0, hspace =0)
Colors, Markers, and Line Styles Matplotlib’s main plot function accepts arrays of x and y coordinates and optionally a string abbreviation indicating color and line style. For example, to plot x versus y with green dashes, you would execute: ax.plot (x , y, 'g- -') ax.plot (x, y, linestyle ='--', color='g ') Line plots can additionally have markers to highlight the actual data points . The marker can be part of the style string, which must have color followed by marker type and line style from numpy.random import randn plt.plot ( randn (30). cumsum (), ' ko - -')
O/p
Ticks, Labels, and Legends This could also have been written more explicitly as : plot( randn (30 ). cumsum (), color='k', linestyle ='dashed', marker='o ') The pyplot interface, designed for interactive use, consists of methods like xlim , xticks , and xticklabels . These control the plot range, tick locations, and tick labels, respectively. They can be used in two ways. Called with no arguments returns the current parameter value (e.g., plt.xlim () returns the current x-axis plotting range) Called with parameters sets the parameter value (e.g., plt.xlim ([0, 10]), sets the x-axis range to 0 to 10)
Setting the title, axis labels, ticks, and ticklabels To illustrate customizing the axes, To create a simple figure and plot of a random walk (see Figure 9-8 ): fig = plt.figure () ax = fig.add_subplot (1, 1, 1) ax.plot ( np.random.randn (1000). cumsum ())
To change the x-axis ticks, it’s easiest to use set_xticks and set_xticklabels . The former instructs matplotlib where to place the ticks along the data range; by default these locations will also be the labels. But we can set any other values as the labels using set_xticklabels : In [40]: ticks = ax.set_xticks ([0, 250, 500, 750, 1000]) In [41]: labels = ax.set_xticklabels (['one', 'two', 'three', 'four', 'five'], rotation=30 , fontsize ='small ') The rotation option sets the x tick labels at a 30-degree rotation. Lastly, set_xlabel gives a name to the x-axis and set_title the subplot title (see Figure 9-9 for the resulting figure): ax.set_title ('My first matplotlib plot ') ax.set_xlabel ('Stages')
Adding legends In [44]: from numpy.random import randn In [45]: fig = plt.figure (); ax = fig.add_subplot (1, 1, 1 ) In [46]: ax.plot ( randn (1000). cumsum (), 'k', label='one ') Out[46]: [<matplotlib.lines.Line2D at 0x7fb624bdf860 >] In [47]: ax.plot ( randn (1000). cumsum (), 'k--', label='two ') Out[47]: [<matplotlib.lines.Line2D at 0x7fb624be90f0 >] In [48]: ax.plot ( randn (1000). cumsum (), 'k.', label='three') Out[48]: [<matplotlib.lines.Line2D at 0x7fb624be9160 >]
Once you’ve done this, you can either call ax.legend () or plt.legend () to automatically create a legend. The resulting plot is in Figure 9-10: In [49]: ax.legend ( loc = 'best‘)
Annotations and Drawing on a Subplot you may wish to draw your own plot annotations , which could consist of text, arrows, or other shapes. You can add annotations and text using the text, arrow, and annotate functions. text draws text at given coordinates (x, y) on the plot with optional custom styling. ax.text (x , y, 'Hello world!', family=' monospace ', fontsize =10) Annotations can draw both text and arrows arranged appropriately.
Saving Plots to File plt.savefig ( ' figpath.svg ‘) plt.savefig ( 'figpath.png', dpi=400, bbox_inches ='tight ') savefig doesn’t have to write to disk; it can also write to any file-like object, such as a BytesIO : from io import BytesIO buffer = BytesIO () plt.savefig (buffer) plot_data = buffer.getvalue () See Table 9-2 for a list of some other options for savefig
matplotlib Configuration One way to modify the configuration programmatically from Python is to use the rc method; for example, to set the global default figure size to be 10 × 10, you could enter: plt.rc ('figure', figsize =(10, 10 ))
Plotting with pandas and seaborn In pandas we may have multiple columns of data, along with row and column labels. pandas itself has built-in methods that simplify creating visualizations from Data Frame and Series objects. Another library is seaborn , a statistical graphics library created by Michael Waskom. Seaborn simplifies creating many common visualization types
Line Plots Series and DataFrame each have a plot attribute for making some basic plot types. By default, plot() makes line plots (see Figure 9-13): In [60]: s = pd.Series ( np.random.randn (10). cumsum (), index= np.arange (0, 100, 10)) In [61]: s.plot ()
DataFrame’s plot method plots each of its columns as a different line on the same subplot, creating a legend automatically (see Figure 9-14): df = pd.DataFrame ( np.random.randn (10, 4). cumsum (0), columns =['A', 'B', 'C', 'D '],index= np.arange (0 , 100, 10)) > df.plot ()
DataFrame Plot
Bar Plots The plot.bar () and plot.barh () make vertical and horizontal bar plots, respectively . In this case, the Series or DataFrame index will be used as the x (bar) or y ( barh ) ticks (see Figure 9-15 ): fig, axes = plt.subplots (2, 1 ) data = pd.Series ( np.random.rand (16), index=list (' abcdefghijklmnop ')) data.plot.bar (ax=axes[0], color='k', alpha=0.7 ) data.plot.barh (ax=axes[1], color='k', alpha=0.7 ) (h-horizontal)
The options color='k' and alpha=0.7 set the color of the plots to black and use par tial transparency on the filling
Refer –Textbook 286 Python for Data Analysis Data Wrangling with Pandas, NumPy , and IPython Wes McKinney
Practise : from matplotlib import pyplot as plt years = [1950, 1960, 1970, 1980, 1990, 2000, 2010] gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3] # create a line chart, years on x-axis, gdp on y-axis plt.plot (years, gdp , color='green', marker='o', linestyle ='solid') # add a title plt.title ("Nominal GDP") # add a label to the y-axis plt.ylabel ("Billions of $") plt.show ()
bar chart A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items. For instance, Figure 3-2 shows how many Academy Awards were won by each of a variety of movies : movies = ["Annie Hall", "Ben- Hur ", "Casablanca", "Gandhi", "West Side Story"] num_oscars = [5, 11, 3, 8, 10] # plot bars with left x-coordinates [0, 1, 2, 3, 4], heights [ num_oscars ] plt.bar (range( len (movies)), num_oscars ) plt.title ("My Favorite Movies") # add a title plt.ylabel ("# of Academy Awards") # label the y-axis # label x-axis with movie names at bar centers plt.xticks (range( len (movies)), movies) plt.show ()
BAR Chart
A bar chart can also be a good choice for plotting histograms of bucketed numeric values, as in Figure 3-3, in order to visually explore how the values are distributed : from collections import Counter grades = [83, 95, 91, 87, 70, 0, 85, 82, 100, 67, 73, 77, 0] # Bucket grades by decile , but put 100 in with the 90s histogram = Counter(min(grade // 10 * 10, 90) for grade in grades) plt.bar ([x + 5 for x in histogram.keys ()], # Shift bars right by 5 histogram.values (), 10 # Give each bar its correct height # Give each bar a width of 10 edgecolor =(0, 0, 0)) # Black edges for each bar plt.axis ([-5, 105, 0, 5]) # x-axis from -5 to 105, # y-axis from 0 to 5 plt.xticks ([10 * i for i in range(11)]) # x-axis labels at 0, 10, ..., 100 plt.xlabel (" Decile ") plt.ylabel ("# of Students") plt.title ("Distribution of Exam 1 Grades") plt.show ()
For example, making simple plots (like Figure 3-1) is pretty simple: from matplotlib import pyplot as plt years = [1950, 1960, 1970, 1980, 1990, 2000, 2010] gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3] # create a line chart, years on x-axis, gdp on y-axis plt.plot (years, gdp , color='green', marker='o', linestyle ='solid') # add a title plt.title ("Nominal GDP") # add a label to the y-axis plt.ylabel ("Billions of $") plt.show ()
Simple line chart
Bar Charts A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items. For instance, Figure 3-2 shows how many Academy Awards were won by each of a variety of movies: movies = ["Annie Hall", "Ben- Hur ", "Casablanca", "Gandhi", "West Side Story"] num_oscars = [5, 11, 3, 8, 10] # plot bars with left x-coordinates [0, 1, 2, 3, 4], heights [ num_oscars ] plt.bar (range( len (movies)), num_oscars ) plt.title ("My Favorite Movies") # add a title plt.ylabel ("# of Academy Awards") # label the y-axis # label x-axis with movie names at bar centers plt.xticks (range( len (movies)), movies) plt.show ()
Bar chart
A bar chart can also be a good choice for plotting histograms of bucketed numeric values, as in Figure 3-3, in order to visually explore how the values are distributed from collections import Counter grades = [83, 95, 91, 87, 70, 0, 85, 82, 100, 67, 73, 77, 0] # Bucket grades by decile , but put 100 in with the 90s histogram = Counter(min(grade // 10 * 10, 90) for grade in grades) plt.bar ([x + 5 for x in histogram.keys ()], # Shift bars right by 5 histogram.values (), # Give each bar its correct height 10, # Give each bar a width of 10 edgecolor =(0, 0, 0)) # Black edges for each bar plt.axis ([-5, 105, 0, 5]) # x-axis from -5 to 105, # y-axis from 0 to 5 plt.xticks ([10 * i for i in range(11)]) # x-axis labels at 0, 10, ..., 100 plt.xlabel (" Decile ") plt.ylabel ("# of Students") plt.title ("Distribution of Exam 1 Grades") plt.show ()
Bar chart for hitogram
Line chart Line Charts - As we saw already, we can make line charts using plt.plot These are a good choice for showing trends, as illustrated in Figure 3-6: variance == [1, 2, 4, 8, 16, 32, 64, 128, 256] bias_squared = [256, 128, 64, 32, 16, 8, 4, 2, 1] total_error = [x + y for x, y in zip(variance, bias_squared )] xs = [i for i, _ in enumerate(variance)] # We can make multiple calls to plt.plot # to show multiple series on the same chart plt.plot ( xs , variance, 'g- ', label='variance') plt.plot ( xs , bias_squared , 'r-.', label='bias^2') # green solid line # red dot-dashed line plt.plot ( xs , total_error , 'b:', label='total error') # blue dotted line
Bias variance tradoff # Because we've assigned labels to each series, # we can get a legend for free ( loc =9 means "top center") plt.legend ( loc =9) plt.xlabel ("model complexity") plt.xticks ([]) plt.title ("The Bias-Variance Tradeoff") plt.show ()
Scatterplots A scatterplot is the right choice for visualizing the relationship between two paired sets of data. For example, Figure 3-7 illustrates the relationship between the number of friends your users have and the number of minutes they spend on the site every day.
Histograms and Density Plots A histogram is a kind of bar plot that gives a discretized display of value frequency. The data points are split into discrete, evenly spaced bins, and the number of data points in each bin is plotted. A histogram is a graph showing frequency distributions. It is a graph showing the number of observations within each given interval. Example: Say you ask for the height of 250 people, you might end up with a histogram like this:
Histogram You can read : 2 people from 140 to 145cm 5 people from 145 to 150cm 15 people from 151 to 156cm 31 people from 157 to 162cm 46 people from 163 to 168cm 53 people from 168 to 173cm 45 people from 173 to 178cm 28 people from 179 to 184cm 21 people from 185 to 190cm 4 people from 190 to 195cm
The hist () function takes in an array-like dataset and plots a histogram, which is a graphical representation of the distribution of the data. Here’s how you can use the hist () function to create a basic histogram : import matplotlib.pyplot as plt data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] plt.hist (data) plt.show () # Output: # A histogram plot with x-axis representing the data and y-axis representing the frequency.
The ‘bins’ parameter in the hist () function determines the number of equal-width bins in the range. Let’s see how changing the ‘bins’ parameter affects the histogram. import matplotlib.pyplot as plt data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] plt.hist (data, bins=20) plt.show () # Output : # A histogram plot with x-axis representing the data and y-axis representing the frequency. The number of bars is increased due to the increased number of bins.
Working with ‘range’ The ‘range’ parameter specifies the lower and upper range of the bins. Anything outside the range is ignored. import matplotlib.pyplot as plt data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] plt.hist (data, range=[2, 3]) plt.show () # Output: # A histogram plot with x-axis representing the data and y-axis representing the frequency. The plot only includes data within the specified range . In this example, we’ve set the ‘range’ to [2, 3]. As a result, the histogram only includes the data points between 2 and 3.
Exploring ‘density’ The ‘density’ parameter, when set to True, normalizes the histogram such that the total area (or integral) under the histogram will sum to 1. This is useful when you want to visualize the probability distribution. import matplotlib.pyplot as plt data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] plt.hist (data, density=True) plt.show () # Output: # A histogram plot with x-axis representing the data and y-axis representing the probability density. The total area under the histogram sums to 1.
Histograms with Seaborn and Pandas Seaborn : An Enhanced Visualization Library Seaborn is a statistical plotting library built on top of Matplotlib . It provides a high-level interface for creating attractive graphics, including histograms. import seaborn as sns data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] sns.histplot (data) # Output: # A histogram plot similar to Matplotlib but with a different style.
In this example, we use the histplot () function from Seaborn to create a histogram. The output is similar to Matplotlib’s histogram, but it comes with a distinct Seaborn style.
import pandas as pd data = pd.DataFrame ([1, 2, 2, 3, 3, 3, 4, 4, 4, 4], columns=['Values']) data['Values'].plot(kind=' hist ') # Output: # A histogram plot similar to Matplotlib but created from a DataFrame .
Scatter or Point Plots Point plots or scatter plots can be a useful way of examining the relationship between two one-dimensional data series. For example, here we load the macrodata dataset from the statsmodels project, select a few variables, then compute log differences: macro = pd.read_csv ('examples/macrodata.csv') data = macro[[' cpi ', 'm1', ' tbilrate ', ' unemp ']] trans_data = np.log(data).diff(). dropna () trans_data [-5:] Out[103]: cpi m1 tbilrate unemp 198 - 0.007904 0.045361 - 0.396881 0.105361 199 - 0.021979 0.066753 - 2.277267 0.139762 200 0.002340 0.010286 0.606136 0.160343 201 0.008419 0.037461 - 0.200671 0.127339 202 0.008894 0.012202 - 0.405465 0.042560
We can then use seaborn’s regplot method, which makes a scatter plot and fits a linear regression line (see Figure 9-24): sns.regplot ('m1', ' unemp ', data= trans_data ) Out[105 ]: plt.title ('Changes in log %s versus log %s' % ('m1', ' unemp ')
In exploratory data analysis it’s helpful to be able to look at all the scatter plots among a group of variables; this is known as a pairs plot or scatter plot matrix. Making such a plot from scratch is a bit of work, so seaborn has a convenient pairplot function, which supports placing histograms or density estimates of each variable along the diagonal (see Figure 9-25 for the resulting plot): sns.pairplot ( trans_data , diag_kind =' kde ', plot_kws ={'alpha': 0.2})
Pair plot
Three-dimensional Plotting in Python using Matplotlib 3D plots are very important tools for visualizing data that have three dimensions such as data that have two dependent and one independent variable. By plotting data in 3d plots we can get a deeper understanding of data that have three variables. We can use various matplotlib library functions to plot 3D plots . We will first start with plotting the 3D axis using the Matplotlib library. For plotting the 3D axis we just have to change the projection parameter of plt.axes () from None to 3D . import numpy as np import matplotlib.pyplot as plt fig = plt.figure () ax = plt.axes (projection='3d')
3d plot With the above syntax three -dimensional axes are enabled and data can be plotted in 3 dimensions. 3 dimension graph gives a dynamic approach and makes data more interactive. Like 2-D graphs, we can use different ways to represent to plot 3-D graphs. We can make a scatter plot, contour plot, surface plot, etc. Let’s have a look at different 3-D plots. Graphs with lines and points are the simplest 3-dimensional graph. We will use ax.plot3d and ax.scatter functions to plot line and point graph respectively.
Creating 3D surface Plot The axes3d present in Matplotlib’s mpl_toolkits.mplot3d toolkit provides the necessary functions used to create 3D surface plots . Surface plots are created by using ax.plot_surface () function . Syntax : ax.plot_surface (X, Y, Z)where X and Y are 2D array of points of x and y while Z is 2D array of heights .
Pie chart A Pie Chart is a circular statistical plot that can display only one series of data. The area of the chart is the total percentage of the given data. Pie charts are commonly used in business presentations like sales, operations, survey results, resources, etc. as they provide a quick summary. In this article, let’s understand how to create pie chart in python with pie diagram . Matplotlib API has pie() function in its pyplot module which create a pie chart representing the data in an array. let’s create pie chart in python. Syntax: matplotlib.pyplot.pie (data, explode=None, labels=None, colors=None, autopct =None, shadow=False)
Pie chart # Import libraries from matplotlib import pyplot as plt import numpy as np # Creating dataset cars = ['AUDI', 'BMW', 'FORD ', 'TESLA ', 'JAGUAR', 'MERCEDES'] data = [23, 17, 35, 29, 12, 41] # Creating plot fig = plt.figure ( figsize =(10, 7)) plt.pie (data, labels=cars) # show plot plt.show ()
Box Plot A Box Plot is also known as Whisker plot is created to display the summary of the set of data values having properties like minimum, first quartile, median, third quartile and maximum. In the box plot, a box is created from the first quartile to the third quartile, a vertical line is also there which goes through the box at the median. Here x-axis denotes the data to be plotted while the y-axis shows the frequency distribution . The matplotlib.pyplot module of matplotlib library provides boxplot() function with the help of which we can create box plots. Syntax: matplotlib.pyplot.boxplot (data, notch=None, vert =None, patch_artist =None, widths=None)
The data values given to the ax.boxplot () method can be a Numpy array or Python list or Tuple of arrays. Let us create the box plot by using numpy.random.normal () to create some random data, it takes mean, standard deviation, and the desired number of values as arguments .
Add Text Inside the Plot in Matplotlib The matplotlib.pyplot.text () function is used to add text inside the plot. The syntax adds text at an arbitrary location of the axes. It also supports mathematical expressions. Python matplotlib.pyplot.text () Syntax Syntax: matplotlib.pyplot.text (x, y, s, fontdict =None, ** kwargs )
Adding Mathematical Equations as Text Inside the Plot In this example, this code uses Matplotlib and NumPy to generate a plot of the parabolic function y = x^2 over the range -10 to 10. The code adds a text label “Parabola $Y = x^2$” at coordinates (-5, 60) within the plot. Finally, it sets axis labels, plots the parabola in green, and displays the plot . import matplotlib.pyplot as plt import numpy as np x = np.arange (-10, 10, 0.01) y = x**2 #adding text inside the plot plt.text (-5, 60, 'Parabola $Y = x^2$', fontsize = 22) plt.plot (x, y, c='g') plt.xlabel ("X-axis", fontsize = 15) plt.ylabel ("Y-axis", fontsize = 15) plt.show ()
Text book Python for Data Analysis Data Wrangling with Pandas, NumPy , and Ipython , Wes McKinney , SECOND EDITION .