UNIT 1-Introduction to Data Science.pptx

MadhavanR30 36 views 75 slides Oct 11, 2024
Slide 1
Slide 1 of 75
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75

About This Presentation

unit 1


Slide Content

INTRODUCTION TO DATA SCIENCE (IDS) COURSE CODE : B20CS0101 ​ School of Computer Science and Engineering

2 Data science is the study of data to extract meaningful insights for business

3 Unit-I Introduction to Microsoft Excel Creating Excel tables, understand how to Add, Subtract, Multiply, Divide in Excel. Excel Data Validation, Filters, Grouping. Introduction to formulas and functions in Excel. Logical functions (operators) and conditions. Visualizing data using charts in Excel. Import XML Data into Excel How to Import CSV Data (Text) into Excel, How to Import MS Access Data into Excel, Working with Multiple Worksheets.

101 102 103 104 105 106 107 108 109 110

5 Unit-II Introduction to Data Science What is Data Science? Probability theory, bayes theorem, bayes probability; Cartesian plane, equations of lines, graphs; exponents. Introduction to SQL: SQL: creation, insertion, deletion, retrieval of Tables by experimental demonstrations. Import SQL Database Data into Excel

6 Unit-III Data science components Tools for data science, definition of AI, types of machine learning (ML), list of ML algorithms for classification, clustering, and feature selection. Description of linear regression and Logistic Regression. Introducing the Gaussian, Introduction to Standardization, Standard Normal Probability Distribution in Excel, Calculating Probabilities from Z-scores, Central Limit Theorem, Algebra with Gaussians, Markowitz Portfolio Optimization, Standardizing x and y Coordinates for Linear Regression, Standardization Simplifies Linear Regression, Modeling Error in Linear Regression, Information Gain from Linear Regression.

7 Unit-IV Data visualization using scatter plots, charts, graphs, histograms and maps Statistical Analysis: Descriptive statistics- Mean, Standard Deviation for Continuous Data, Frequency, Percentage for Categorical Data. Applications of Data Science: Data science life cycle, Applications of data science with demonstration of experiments by using Microsoft Excel

8 Assessment in weightage CIE(Continues internal evaluation )=50 -IA -20 Marks -IA -20 Marks -Assignment .Assignment 1- 5 Marks . Assignment 2 - 5 Marks SEE (Semester End Examination)-100 Marks ( Scale down to 50 Marks) -Total No. Of 8 questions. -Each unit carries 2 questions (one question u have to answer) -Each question carries 25 Marks -25 *4(units)= 100

9 Unit-I Introduction to Microsoft Excel Creating Excel tables; understand how to Add, Subtract, Multiply, Divide in Excel. Logical functions (operators) and conditions. Introduction to formulas and functions in Excel. Excel Data Validation, Filters, Grouping. Visualizing data using charts in Excel. Import XML Data into Excel How to Import CSV Data (Text) into Excel, How to Import MS Access Data into Excel, Working with Multiple Worksheets.

10 Excel definition A software program created by Microsoft that uses spreadsheets to organize numbers and data with formulas and functions. Excel is typically used to organize data and perform financial analysis.  It is used across all business functions and at companies from small to large.

GETTING STARTED To open Excel, click the Start button, point to All Programs, point to Microsoft Office, and then click Microsoft Office Excel 2010. 11

12 Work book -Its an entire excel file. By default its name will be Book1. -A worksheet is a collection of cells where you keep and manipulate the data. -The name of the worksheet appears on its sheet tab at the bottom of the document window.

13 Work sheet An excel file/work book can have multiple work sheet. We can rename the sheet. To give a worksheet a more specific name, execute the following steps. 1. Right click on the sheet tab of Sheet1. 2. Choose Rename.

14 Cell Name Its named as Column Letter followed by Row Number. Here its A1. It can be given other names too. In each cell there may be the following data types: 1. Labels -- (text with no numerical value) 2. Number data (constant values) 3. Formulas (mathematical equation used to calculate)

15 Formula Bar: Here we build our own formulas beginning with equal sign. We can use predefined functions that are created by the excel. By clicking on fx we can view various available formulas.

16 Ribbon: It Contain various functionalities. These functions can be used directly by clicking on those options or by using shortcuts appropriately.

17

18 Creating Excel tables: It is a special object that works as a whole and allows you to manage the table's contents independently from the rest of the worksheet data. There are many powerful features inside: Excel tables are dynamic by nature, meaning they expand and contract automatically as you add or remove rows and columns.  Integrated sort and filter options; visual filtering with slicers.  Easy formatting with inbuilt table styles.  Column headings remain visible while scrolling.  Quick totals allow you to sum and count data as well as find average, min or max value in a click.  Calculated columns allow you to compute an entire column by entering a formula in one cell.  Easy-to-read formulas due to a special syntax that uses table and column names rather than cell references.  Dynamic charts adjust automatically as you add or remove data in a table.

19 How to create a table in Excel With the source data organized in rows and columns, carry out the below steps to covert a range of cells into a table: 1. Select any cell within your data set. 2. On the Insert tab, in the Tables group, click the Table button or press the Ctrl + T shortcut or On the Home tab, in the Styles group, click Format as Table , and select one of the predefined table styles. 3. The Create Table dialog box appears with all the data selected for you automatically; you can adjust the range if needed. If you want the first row of data to become the table headers, make sure the My table has headers box is selected. 4. Click OK.

20

21 Features of Excel tables Integrated sorting and filtering options In tables, filter arrows are automatically added in the header row and enable you to use various text and number filters, sort in ascending or descending order, by color, or create a custom sort order.

22 2. Column headings are visible while scrolling When you are working with a large table that does not fit on a screen, the header row always remains visible when you scroll down. If this doesn't work for you, just be sure to select any cell inside the table before scrolling.

23 3. Easy formatting (Excel table styles) newly created table is already formatted with banded rows, borders, shading , and so on. can easily be changed it by selecting from 50+ p redefined styles available in the Table Styles gallery on the Design tab.

24 4. Automatic table expansion to include new data whenever you draw a table in Excel, it is a " dynamic table " by nature, and like a dynamic named range it expands automatically to accommodate new values.

25 5. One-click data selection You can select cells and ranges in a table with the mouse like you normally do. You can also select table rows and columns in a click.

26 6. Dynamic charts When you create a chart based on a table, the chart updates automatically as you edit the table data. Once a new row or column is added to the table, the graph dynamically expands to take the new data in. When you delete some data in the table, Excel removes it from the chart straight away. Automatic adjustment of a chart source range is an extremely useful feature when working with data sets that frequently expand or contract.

27 7. Printing only the table If you want to print just the table and leave out other stuff on the worksheet, select any sell within your table and press Ctrl+P or click File > Print . The Print Selected Table option will get selected automatically without you having to adjust any print settings:

28 Understand how to Add, Subtract, Multiply, Divide in Excel Types of operators The Excel provides four different types of calculation operators: arithmetic, comparison, text concatenation, and reference.

29 Excel logical functions (operators) and conditions

30 Introduction to Formulas and Functions in Excel. Formula is an expression that calculates values in a cell or in a range of cells. For example, =A2+A2+A3+A4 is a formula that adds up the values in cells A2 through A4. Function is a predefined formula already available in Excel. Functions perform specific calculations in a particular order based on the specified values, called arguments, or parameters. you can use the SUM function to add up a range of cells: =SUM(A2:A4). We can find all available Excel functions in the Function Library on the Formulas tab:

31 The basic functions we must know 1. SUM The first Excel function you should be familiar with is the one that performs the basic arithmetic operation of addition: SUM(number1, [number2], …) =SUM(B2:B6) - adds up values in cells B2 through B6. =SUM(B2, B6) - adds up values in cells B2 and B6. 2. AVERAGE The Excel AVERAGE function does exactly what its name suggests, i.e. finds an average, or arithmetic mean, of numbers. Its syntax is similar to SUM's: AVERAGE(number1, [number2], …) = AVERAGE (B2:B6)

32 3. MAX & MIN The MAX and MIN formulas in Excel get the largest and smallest value in a set of numbers, respectively. For our sample data set, the formulas will be as simple as: =MAX(B2:B6) =MIN(B2:B6) 4. COUNT & COUNTA If you are curious to know how many cells in a given range contain numeric values (numbers or dates), don't waste your time counting them by hand. The Excel COUNT function will bring you the count in a heartbeat: COUNT(value1, [value2], …) While the COUNT function deals only with those cells that contain numbers, the COUNTA function counts all cells that are not blank , whether they contain numbers, dates, times, text, logical values of TRUE and FALSE, errors or empty text strings (""): COUNTA(value1, [value2], …)

33 IF To test a certain condition and return one value or perform one calculation if the condition is met, and another value or calculation if the condition is not met: IF( logical_test , [ value_if_true ], [ value_if_false ]) For example, the following IF statement checks if the order is completed (i.e. there is a value in column C) or not. To test if a cell is not blank, you use the "not equal to" operator ( < >) in combination with an empty string (""). As the result, if cell C2 is not empty, the formula returns "Yes", otherwise "No": =IF(C2<>"", "Yes", "No")

34 LEN To know the number of characters in a certain cell, LEN(text) Wish to find out how many characters are in cell A2? Just type the below formula into another cell: =LEN(A2)

35 AND & OR These are the two most popular logical functions to check multiple criteria. The difference is how they do this:  AND returns TRUE if all conditions are met, FALSE otherwise.  OR returns TRUE if any condition is met, FALSE otherwise.

36 CONCATENATE In case you want to take values from two or more cells and combine them into one cell, use the concatenate operator (&) or the CONCATENATE function: CONCATENATE(text1, [text2], …)

37 TODAY & NOW To see the current date and time whenever you open your worksheet without having to manually update it on a daily basis, use either: =TODAY() to insert the today's date in a cell. =NOW() to insert the current date and time in a cell.

38 Excel Data Validation Excel Data Validation is a feature that restricts (validates) user input to a worksheet. Excel's data validation can do:  Allow only numeric or text values in a cell.  Allow only numbers within a specified range .  Allow data entries of a specific  Restrict dates and times outside a given time frame .  Restrict entries to a selection from a drop-down list .  Validate an entry based on another cell .  Show an input message when the user selects a cell.  Show a warning message when incorrect data has been entered.  Find incorrect entries in validated cells.

39 To add data validation in Excel, perform the following steps. Open the Data Validation dialog box Select one or more cells to validate, go to the Data tab > Data Tools group, and click the Data Validation button. You can also open the Data Validation dialog box by pressing Alt > D > L, with each key pressed separately. dl

40 Create an Excel validation rule On the Settings tab, define the validation criteria according to your needs. In the criteria, you can supply any of the following:  Values - type numbers in the criteria boxes like shown in the screenshot below.  Cell references - make a rule based on a value or formula in another cell.  Formulas - allow to express more complex conditions like in this example. With the validation rule configured, either click OK to close the Data Validation window or switch to another tab to add an input message or/and error alert.

41 Add an input message (optional) If you want to display a message that explains to the user what data is allowed in a given cell, open the Input Message tab and do the following:  Make sure the Show input message when cell is selected box is checked.  Enter the title and text of your message into the corresponding fields.  Click OK to close the dialog window.

42 Display an error alert (optional) In addition to the input message, you can show one of the following error alerts when invalid data is entered in a cell.

43 To configure a custom error message To configure a custom error message, go to the Error Alert tab and define the following parameters:  Check the Show error alert after invalid data is entered box (usually selected by default).  In the Style box, select the desired alert type.  Enter the title and text of the error message into the corresponding boxes.  Click OK .

44 Filters and Grouping Sorting lists is a common spreadsheet task that allows you to easily reorder your data. -Select a cell in the column you want to sort (In this example, we choose a cell in column A). - Click the Sort & Filter command in the Editing group on the Home tab. -Select Sort A to Z . Now the information in the Category column is organized in alphabetical order.

45 2. Grouping cells using the Subtotal command Grouping is a useful Excel feature that gives you control over how the information is displayed. sort before you can group . To create groups with subtotals:  Select any cell with information in it.  Click the Subtotal command on the Data tab. The information in your spreadsheet is automatically selected, and the Subtotal dialog box appears.

46 Example

47 Filtering cells Filtering, or temporarily hiding, data in a spreadsheet is simple. This allows you to focus on specific spreadsheet entries.

48 Visualizing data using Charts In Excel, charts are used to make a graphical representation of any set of data Suppose you have the target and actual profits for the fiscal year 2015-2016 that you obtained from different regions

49 Types of charts Column Chart Line Chart Bar Chart Area chart Pie chart or Doughnut chart Surface chart

50 Column Charts column chart - vertical chart that is used to represent the data in vertical bars. It works efficiently with different types of data, used for comparing the information

51 Line Chart Line charts are most useful for showing trends We can easily analyze the ups and downs in your data over time. In this chart, data points are connected with lines.

52 Bar charts Bar charts are horizontal bars that work like column charts. Bar charts are horizontally plotted. bar charts and column charts are just opposite to each other.

53 Pie chart Pie chart is a rounded shape graph that is divided into slices of pie. Using this chart, you can easily analyze data that is divided into slices. It makes the data easy to compare the proportion.

54 Surface chart Surface chart is actually a 3D chart that helps to represent the data into a 3D landscape. These charts are best to use with a large dataset. This chart allows to displaying a variety of data at the same time.

55 Area chart Area charts are just like line charts. Unlike the line charts, gaps are filled with color in area charts. Area charts are easy to analyze the growth in business as its shows ups and downs through line.

56 Sparklines Sparklines are tiny charts placed in single cells, each representing a row of data in your selection. They provide a quick way to see trends. Line Sparkline : Line Sparkline in Excel will be in the form of lines, and high values will indicate fluctuations in height difference. Column Sparkline : Column Sparkline in Excel will be in the form of column chart or bar chart. Each bar shows each value. Win/Loss Sparkline : It is mainly used to show negative values like ups and downs on the floated costs.

57 Pivot Charts Pivot Charts are used to graphically summarize data and explore complicated data. Pivot Charts are useful when you have data in a huge PivotTable, or many complex worksheet data that includes text and numbers.

58 The PivotChart has three filters – Region, Salesperson and Month.

59 Import XML Data into Excel External data is data that you link/import into excel from a source that resides outside excel. Examples of external include the following:  Data stored in a Microsoft Access database. This could the information from a custom application i.e. Payroll, Point of Sale, Inventory, etc.  Data from SQL Server or other database engines i.e. MySQL, Oracle, etc. – This could be information from a custom application  From a web site/web service – this could be information from a Web services i.e. currency exchange rates from the internet, stock prices, etc.  Text file i.e. CSV, tab separated, etc. – this could be information from a third party application that does not provide direct links. Such data could include bank payments exported to comma separated file CSV, etc.  Other types i.e. HTML data , Windows Azure Market Place, etc.

60 Website (XML data) external data source example - Open a new workbook Click on the DATA tab on the ribbon bar Click on "From TEXT/ Web/TABLE" button You will get the following window TEXT FILE/Enter http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml in the address Click on Go button, you will get the XML data preview Click on Import button when done Click on OK button You will get the following data

61 Conti…..

62 Handling CSV Data in Excel

63 How to Import a CSV file into Excel Open a new Excel document and navigate to the Data tab.

64 Conti…. 2. Click “From Text”.

65 3. Navigate to the CSV file you wish to open and click “Import”.

66 Conti…. 4.From the newly-opened window, choose “Delimited”. Then click “Next”.

67 Conti…. 5. Check the box next to the type of delimiter – in most cases this is either a semicolon or a comma. Then click “Next”.

68 Conti… 6. Click “Finish”.

69 Import Access Data On the Data tab, in the Get & Transform Data group, click Get Data . 2 . Click From Database, From Microsoft Access Database

70 Conti… 3 . Select the Access file.

71 Conti… 4. Click Import. 5. Select a table on the left side of the Navigator window and click Load.

72 Result. Your database records in Excel.

73 Conti…. 6. When your Access data changes, you can easily refresh the data in Excel. First, select a cell inside the table. Next, on the Design tab, in the External Table Data group, click Refresh.

74 Exporting Excel Data to Access Database The following steps needs to be followed to import an Excel spreadsheet into a new table in Access: Open the Access database. If you receive a security warning, click the Enable Content button. On the Office ribbon, select the External Data tab and click Excel. "Get External Data - Excel Spreadsheet" wizard appears. In the File name field, browse to the Excel file. Select the "Import the source data into a new table in the current database" option and click OK. Select the worksheet to import. Click Next. If the first row contains headers, mark the "First Row Contains Column Headings" checkbox. Click Next. Select the options for each column or just leave it at the default and click Next. Accept the default of "Let Access add primary key." Click Next. The Import to Table field defaults to the worksheet name. Update it if needed. Click Finish. The worksheet imports into a table.

75 Working with multiple Work Sheets Every workbook contains at least one worksheet by default. When working with a large amount of data, you can create multiple worksheets to help organize your workbook and make it easier to find content. You can also group worksheets to quickly add information to multiple worksheets at the same time. To insert a new worksheet: Locate and select the New sheet button near the bottom-right corner of the Excel window.
Tags