The power of big data and data science are revolutionizing the world. In everyday of our life, We are constantly surrounded with Data Big Data Structured Data Semi Structured Data UnStructured Data Data Science data science represents process and resource optimization Data Science and Visualization
Data Science and Visualization
What is Data Science? Data science is the field of study that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data. It combines various disciplines such as statistics, machine learning, data analysis, and visualization to uncover hidden patterns, trends, and correlations in data. Data science plays a crucial role in decision-making, forecasting, and problem-solving across industries, driving innovation and enabling organizations to make data-driven decisions. Data Science and Visualization
Data Science and Visualization To extract knowledge from data, to understand data, to find some hidden relationships and to build a model. Data science uses scientific methods, such as probability and statistics. Practical insights that you can apply to real business situations. Application domain is an important concept, and data scientists often need at least some degree of expertise in the problem domain, for example: finance, medicine, marketing, etc. Why Data Science
Fundamentals of Data Science Data Collection Data Processing Data Analysis Pattern Uncovering Predictive Maintenance Implementation Data Science and Visualization
Data Science and Visualization interesting facts about data science and big data Data Growth: Every day, we create 2.5 quintillion bytes of data, and this pace is accelerating with the growth of digital technologies like IoT devices, social media, and online transactions. This massive volume of data is what constitutes "big data." 2.5 quintillion bytes is equivalent to 2.5 exabytes, where an exabyte is 1 followed by 18 zeros, or 1,000,000,000,000,000,000 bytes. Human brain has the ability to store the equivalent of 2.5 million gigabytes digital memory Every minute, 40,000 search queries are performed on Google alone, which makes it 3.5 billion searches per day and 1.2 trillion searches per year Up to 300 hours of video are uploaded to YouTube every minute
Data Science and Visualization Data: facts with no meanings. Information: learning from facts. Knowledge: practical understanding of a subject. Understanding: the ability to absorb knowledge and learn to reason. Wisdom: the quality of having experience and good judgment; ability to think and foresee. Validity: ways to confirm truth.
Data Science and Visualization Coding, coding, coding … it’s just part of the game
Data Science and Visualization
Data Science and Visualization Think about some real-life process or problem in different problem domains, and how you can improve it using the Data Science process.
Data Science and Visualization Which data can you collect? How would you collect it? How would you store the data? How large the data is likely to be? Which insights you might be able to get from this data? Which decisions we would be able to take based on the data?
Data Science and Visualization Try to think about 3 different problems/processes and describe How can you use data to improve education process for children in schools? How can you use data to control vaccination during the pandemic? How can you use data to make sure you are being productive at work? Domain Problem Which data to collect How to store the data Which insights/decisions we can make Education Vaccination Productivity
In Data Science: there’s is a difference between industry and academia. The Hype : excitement about something in (Data Science) Universal Solutions : Data science is often portrayed as a magical solution to all problems. Instant Results: There is a perception that data science delivers immediate results. AI Overestimation: Artificial intelligence and machine learning are frequently overhyped, leading to unrealistic expectations. Skill Requirements: The field is sometimes marketed as accessible to anyone with minimal training. Getting Past the Hype
Datafication Datafication is the process of converting an organization into a data-driven enterprise using a variety of tools, technologies, and techniques. What Exactly Is Datafication?
Datafication Datafication is the application of data to an organization Datafication implies that an organization's complete control over the storage, extraction, manipulation of data and related information. Datafication is revolutionizing fields like HealthCare, Business, Politics, etc It also Poses Challenges in Privacy and Security. In this Modern world everything is converted into Data, It insights our ability to understand, interpret, and use this Data.
What is the purpose of datafication? Enhanced Decision-Making: Recruitment {TCS, Wipro, Dell, HP, etc} Improved Efficiency and Optimization: Online Quiz Innovation and Development: Data Scientist - Researchers Personalization of Services: Fitness Predictive Capabilities: Business Societal Insights: ECommerce, Finance, Medical Science, etc Regulatory Compliance and Risk Management: Weather Forecasting
Why do we need Datafication..?
In today's dynamic and rapidly evolving landscape, perspectives and skill sets are continuously adapting to meet the demands of Technological advancements, Global interconnectedness, and changing societal needs. Current landscape of perspectives, Skill sets .
Perspectives Digital Transformation: Emphasis on the adoption and integration of digital technologies in various sectors. Recognition of the importance of digital literacy for both individuals and organizations. Sustainability and Ethical Considerations: Growing awareness of environmental issues and the need for sustainable practices. Increased focus on corporate social responsibility (CSR) and ethical business practices. Globalization and Cultural Awareness: Enhanced appreciation for cultural diversity and global perspectives. Need for cross-cultural communication and collaboration skills . Innovation and Agility: Valuing creativity and innovation as critical drivers of success. Emphasis on adaptability and agility in responding to change . Data-Driven Decision Making: Importance of using data analytics to inform strategic decisions . Growing reliance on big data, machine learning, and artificial intelligence.
Digital and Technical Skills: Proficiency in software development, cybersecurity, data science, and AI. Competence in using digital tools and platforms for various business functions. Soft Skills: Strong communication, teamwork, and interpersonal skills. Emotional intelligence and empathy are highly valued in leadership roles. Critical Thinking and Problem Solving: Ability to analyze complex problems and develop effective solutions. Strategic thinking and decision-making skills. Continuous Learning and Adaptability: Commitment to lifelong learning and staying updated with industry trends. Flexibility to adapt to new roles, technologies, and environments. Cross-Functional Competencies: Ability to work across different functions and disciplines. Understanding of business operations, finance, marketing, and supply chain management. Leadership and Management: Skills in leading teams, managing resources, and driving organizational goals.Capability to inspire and motivate others, fostering a positive work culture. Skill Sets
Knowledge is not just Power, its everything . https://drive.google.com/file/d/1IVV72waJ025ZSRIsDeg6c1z13IxOZT45/view?usp=drive_link
Data Science Data science is the science of analyzing raw data using statistics and machine learning techniques with the purpose of drawing conclusions about that information. So in short it can be stated that Data Science involves: Statistics, computer science, mathematics Data cleaning and formatting Data visualization
“There is computer science and there are scientific domains somewhere in the middle is data science,” Now, machine learning is a branch of data science. It is used for a lot of work that stirred the application domains, often in applications that are called artificial intelligence, rather than machine learning. Machine learning is also useful in creating general things like chat bots, which also don’t fall into any particular domain. math and stats have lots of applications in computer science. proving that algorithms developed with computer science and machine learning skills will work effectively.
Data Science
Data Science
Population and Sample In statistics as well as in quantitative methodology, the set of data are collected and selected from a statistical population with the help of some defined procedures. There are two different types of data sets namely, population and sample Population It includes all the elements from the data set and measurable characteristics of the population such as mean and standard deviation are known as a parameter . For example, All people living in India indicates the population of India. There are different types of population. They are: Finite Population Infinite Population Existent Population Hypothetical Population Let us discuss all the types one by one.
Types Finite Population The finite population is also known as a countable population in which the population can be counted. In other words, it is defined as the population of all the individuals or objects that are finite . For statistical analysis, the finite population is more advantageous than the infinite population. Examples of finite populations are employees of a company, potential consumer in a market. Infinite Population The infinite population is also known as an uncountable population in which the counting of units in the population is not possible. Example of an infinite population is the number of germs in the patient’s body is uncountable. Existent Population The existing population is defined as the population of concrete individuals. In other words, the population whose unit is available in solid form is known as existent population. Examples are books, students etc. Hypothetical Population The population in which whose unit is not available in solid form is known as the hypothetical population. A population consists of sets of observations , objects etc that are all something in common. In some situations, the populations are only hypothetical. Examples are an outcome of rolling the dice, the outcome of tossing a coin.
Difference between Population and Sample D ifferences between population and sample Comparison Population Sample Meaning Collection of all the units or elements that possess common characteristics A subgroup of the members of the population Includes Each and every element of a group Only includes a handful of units of population Characteristics Parameter Statistic Data Collection Complete enumeration or census Sampling or sample survey Focus on Identification of the characteristics Making inferences about the population :
Sample It includes one or more observations that are drawn from the population and the measurable characteristic of a sample is a statistic. Sampling is the process of selecting the sample from the population . For example, some people living in India is the sample of the population. Basically , there are two types of sampling. They are: Probability sampling Non-probability sampling
Probability Sampling Probability sampling is a technique in which the researcher chooses samples from a larger population using a method based on probability theory. Probability sampling uses statistical theory to randomly select a small group of people (sample) from an existing large population and then predict that all their responses will match the overall population. Some of the techniques used for probability sampling are: Simple random sampling Cluster sampling Multi-stage sampling
Non Probability Sampling In non-probability sampling, the population units can be selected at the discretion of the researcher . Those samples will use the human judgments for selecting units and has no theoretical basis for estimating the characteristics of the population. Some of the techniques used for non-probability sampling are Quota sampling Judgment sampling Purposive sampling Population and Sample Examples All the people who have the ID proofs is the population and a group of people who only have voter id with them is the sample. All the students in the class are population whereas the top 10 students in the class are the sample. All the members of the parliament is population and the female candidates present there is the sample.
Statistical Modelling A statistical model is a type of mathematical model that comprises of the assumptions undertaken to describe the data generation process . The mathematical expressions will be general enough that they have to include parameters , but the values of these parameters are not yet known. In mathematical expressions, the convention is to use Greek letters for parameters and Latin letters for data. So , for example, if you have two columns of data, x and y, and you think there’s a linear relationship, you’d write down y = β0 +β1x. You don’t know what β0 and β1 are in terms of actual numbers yet , so they’re the parameters. Other people prefer pictures and will first draw a diagram of data flow, possibly with arrows, showing how things affect other things or what happens over time. This gives them an abstract picture of the relationships before choosing equations to express them.
Probability Distributions What Is Probability? Probability denotes the possibility of something happening. It is a mathematical concept that predicts how likely events are to occur . The probability values are expressed between 0 and 1. The definition of probability is the degree to which something is likely to occur. This fundamental theory of probability is also applied to probability distributions . Probability Distributions? Statistical function that describes all the possible values and probabilities for a random variable within a given range. This range will be bound by the minimum and maximum possible values , but where the possible value would be plotted on the probability distribution will be determined by a number of factors .