1_Course Overview, Data Science Lifecycle.pptx

pritisavailable 11 views 26 slides Sep 05, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Course Overview, Data Science Lifecycle and its' applications


Slide Content

Course Overview An overview of data science, CS 577, and the data science lifecycle. Josh Hug and Lisa Yan 1

2 Intros What is data science? What will you learn in this class? Course overview Lots of important details Data Science Lifecycle Demo What is Data Science?

Why I Care About Data Science 3

Why Data Science? The world is complicated, and data is a tool for finding truth in this complicated world! We have a lot of questions in different domains that need to know the answer Data science : Uses a combination of methods and principles from statistics and computer science to work with and draw insights from data. 4

Data-Centric Problems Assess whether a vaccine works Filter out fake news automatically Calibrate air quality sensors Advise analysts on policy changes 5

Primary Goal of This Course 6 Be able to take data and produce useful insights on the world’s most challenging and ambiguous problems.

What is Data Science? PRINCIPLES AND TECHNIQUES OF DATA SCIENCE 7

Data is changing the world From Joey Gonzalez. 8

Data science is a fundamentally interdisciplinary field Joey Gonzalez Data Science is the application of data centric, computational, and inferential thinking to: Understand the world (science). Solve problems (engineering). 9

Data Science Venn Diagram by Drew Conway in 2010 ( link ) 10

Insight Good data analysis is not: Simple application of a statistics recipe. Simple application of statistical software. There are many tools out there for data science, but they are merely tools. They don’t do any of the important thinking! “The purpose of computing is insight, not numbers.” - R. Hamming. Numerical Methods for Scientists and Engineers (1962). 11

Example Questions in Data Science Some (broad) questions we might try to answer with data science: What show should we recommend to our user to watch? In which markets should we focus our advertising campaign? What areas of the world are at higher risks for climate change impact in 10 years? 20? What should we eat to avoid dying early of heart disease? Do immigrants from poor countries have a positive or negative impact on the economy? Is the world getting better or worse? 12

13 Intros What is data science? What will you learn in this class? Course overview Lots of important details Data Science Lifecycle Demo What will you learn in this class?

Tentative List of Topics to be Covered in CS-577 Pandas and NumPy Relational Databases & SQL Exploratory Data Analysis Regular Expressions Visualization matplotlib Seaborn plotly Sampling Probability and random variables Model design and loss formulation Linear Regression Feature Engineering Regularization, Bias-Variance Tradeoff, Cross-Validation Gradient Descent Logistic Regression Decision Trees and Random Forests 14

Course Websites / Platforms 15

Online platforms Course website on Canvas Where all lectures, assignments, and discussions are posted. Textbook ( www.textbook.ds100.org ) Supplemental reading. 16

Programming Environment

Jupyter Notebook “ Jupyter notebooks are documents that combine live runnable code with narrative text (Markdown), equations (LaTeX), images, interactive visualizations and other rich output” Installing Jupyter https:// jupyterlab.readthedocs.io / en /stable/ getting_started / installation.html

JupyterLab 19 JupyterLab offers notebooks and more tools for data science. Use JupyterLab locally on your own machine. Use Google Colab

Learning Advanced JupyterLab Resources for learning fancier JupyterLab functionality: A quickest intro is this great 2-minute overview by Serena Bonaretti . Note: Unlike Serena’s example, in our course we’re using JupyterLab notebooks hosted on the internet, not on your own local computer. The interface overview from the official docs has more details and short, embedded videos. A more detailed discussion from a bio/data angle: ~45 minute video . Full ~3h in-depth tutorial is available from the core team. 20

Google Colab What is Colab ? Colab , or " Colaboratory ", allows you to write and execute Python in your browser, with Zero configuration required Access to GPUs free of charge Easy sharing 21

Course Logistics Content and workflow 22

23 Weekly Flow Class Days: TTR Class Times Section 1: 12:30 pm -- 1:45 pm Class Times Section 2: 2:00 pm -- 3:15 pm Class Location: LH 347

Discussion Section There is a discussion board in Canvas. Two types of topics: Topics covered in lecture Topics covered in assignments 24

Homework 4 assignments in Jupyter Notebook that must be individually submitted Midterm exam: Oct. 19 Final exam: Dec. 12 A group term project: by Dec. 10 Format : Current plan: Primarily in-person exams with the option for virtual exams. Details TBD. Alternate exam times will be provided for all exams for pre-approved reasons, such as a concurrent final exam. If you miss an exam due to a personal emergency or illness, please contact me. 25

Grading Logistics Grades will be posted on Canvas Deadlines are firm at 11:59PM. Extensions are provided only to students with DSP accommodations, or in the case of exceptional circumstances, only if you email me before the deadline. You can submit assignments up to 2 days late, at 10% off per day. Rounded up to the next day: 2 minutes late = 1 day late. 26 Mid-term exam 25% Final exam 25% Assignments 30% Discussions 5% Semester project 15%
Tags