05 - Use Case 3 (Manufacturing) - Energy Consumption Optimization _ Mumbai Models.pptx

sandeepstiwari1 17 views 59 slides Jul 18, 2024
Slide 1
Slide 1 of 59
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59

About This Presentation

#AIML


Slide Content

Reducing Energy Costs and Environmental Impact by Optimizing Energy Use in Manufacturing Processes Team - Mumbai Models Use Case 2 : Energy Consumption Optimization, Manufacturing

Agenda: © 2022 by kipi.bi Defining the Business Problem Business Stakeholders Why is this Problem Relevant Now? Pain Points/Challenges Business Impact of the Solution High Level Description of the Data Proposed Architecture and Overview Methodology Diagram Data Statistics and Sample Data

‹#› www.kipi.bi Defining the Business Problem Energy expenses can constitute upto 4.1 % of manufacturing costs. source As of today, Industrial production makes up over half of all global energy use. Manufacturing companies are always on the lookout to decrease cost as well as optimise current operations to improve efficiency. This involves leveraging data science methodologies to analyze energy consumption patterns and identify opportunities for optimization. Through data-driven decision-making, businesses can enhance operational efficiency while aligning with sustainability goals. This not only improves financial performance but also demonstrates corporate responsibility and ensure compliance with regulatory requirements for environmental sustainability. The United States is a highly industrialized country. In 2022, the industrial sector accounted for 35% of total U.S. end-use energy consumption and 33% of total U.S. energy consumption. [source]

‹#› www.kipi.bi Production Engineers Energy Management team Suppliers B usiness Stakeholders ! Manufacturing Company Management Regulatory Agencies

Rising Energy Costs : As energy costs continue to rise, businesses must cut energy usage to stay competitive. Doing so not only lowers immediate expenses but also boosts long-term profits by reducing risks from unpredictable energy prices. Global Climate Concerns: Manufacturing accounts for about 20% of greenhouse gas emissions globally. As climate change becomes a pressing issue, addressing energy consumption in manufacturing is crucial. Companies that proactively reduce their environmental impact contribute to global efforts to combat climate change Why is this problem relevant now? ‹#› www.kipi.bi In today’s world, where sustainability and cost-effectiveness are key drivers of success, energy management plays a pivotal role in manufacturing industries.

‹#› www.kipi.bi High Operational Costs Energy costs are a substantial part of manufacturing expenses. Price fluctuations, inefficient equipment, and wasteful practices can inflate bills, eroding profit margins and market competitiveness Environmental Impact Manufacturing contributes to emissions and environmental harm, primarily through high fossil fuel energy use. This poses regulatory and reputational risks Resource Scarcity and Dependency Heavy reliance on finite resources such as fossil fuels, electricity, and water & rising energy demand can lead to supply chain disruptions, price volatility, and dependency risks, affecting production continuity and cost stability. Operational Inefficiencies Inefficient equipment and outdated technologies lead to energy waste in manufacturing. Poor maintenance, training, and monitoring further reduce productivity. Technological Challenges Adopting energy-efficient technologies and renewables necessitates considerable investment and confronts technical, compatibility, and implementation hurdles Compliance and Regulatory Requirements Manufacturing sites must adhere to energy regulations, emissions targets, and international agreements, which can be complex, costly, and time-consuming. Pain Points and Challenges

‹#› www.kipi.bi By accurately predicting energy consumption, businesses can optimize energy usage during production processes and reduce energy waste and lower utility bills, resulting in significant cost savings for the company. Cost Reduction Enhance Market Reputation Cutting energy use in production reduces emissions, helping companies meet sustainability goals and enhance their eco-friendly reputation. Operational Efficiency Understanding energy consumption factors helps businesses identify production inefficiencies, enhancing operational efficiency and maximizing productivity. Data-Driven Process Improvements Businesses can make informed decisions about process improvements & equipment upgrades by understanding energy consumption factors. Risk Management Energy consumption predictions help businesses anticipate and mitigate risks from shortages or price fluctuations, minimizing disruptions and financial losses. Business Impact of the Solution

‹#› www.kipi.bi Columns Type Description Energy consumption (dependent variable) Number The amount of energy used during a specific time period, measured in kilowatt-hours (kWh) or another relevant unit. Weather conditions (independent variables) Number/String Environmental factors such as temperature, humidity, wind speed, and sunlight, which can affect energy usage. Process Variables (independent variables) Number/ String Parameters related to the manufacturing process, such as temperature, pressure, speed, etc. Maintenance activity (independent variables) Number/ String Records of maintenance activities performed on machinery and equipment, which can impact energy efficiency. Equipment configuration Number Operating status of machinery and equipment (e.g., age, speed,on/off, idle, running at full capacity). Date & Time Timestamp Date & Time of the reading High Level Description of the Data

Architecture Diagram

‹#› www.kipi.bi Starting point: Raw data files from SFTP server Contains: Energy consumption, Temperature, Equipment performance Data Source: Manufacturing Bridge between manufacturing data source and Amazon S3 storage Secure transfer and storage of raw data files Leverages scalability, durability, and security features of Amazon S3 Ingestion Layer: Amazon S3 Ensures data quality and prepares data for analysis in Snowpark UDF Tasks include data type validation, column integrity checks, and typecasting Transform Layer Initial landing zone for ingested data in Snowflake tables Preserves raw, unprocessed data for access Central repository for subsequent transformation and analysis Raw Layer Conducts EDA, feature engineering, model training, and version control Enables efficient development and management of ML models within Snowflake ML Layer Interface for users to interact with analytical outputs and models Features Streamlit application with customizable filters and visualizations Consumption Layer Overview: Proposed Architecture

Methodo logy S tored Procedure Python UDFs

‹#› www.kipi.bi Methodology Details In our case, we used python to generate the data in such a way that data is not completely random The data has some expected correlation between features and the outcome variable.Used heatmap to improve the quality iteratively. Imputing missing data and instead of removing them. Data Acquisition and Preparation Select an appropriate regression algorithm such as linear regression, SVR, etc. based on iterative training and evaluation. In our case, we will use PYCARET for preliminary model building and performance evaluation. Create new features to improve the initial performance iteratively. Train regression model and tune hyperparameters (if any) to maximize performance. Feature Engineering and Model Training Evaluate the performance of the selected model using the test dataset. Utilize evaluation metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R²), etc., Make predictions on unseen data and obtaining predictions. Interpret model solution based on it’s coefficients Model Evaluation, Prediction and Explainability The methodology outlines the key steps involved in our project, including data preparation, model building, training, evaluation, and prediction, to build a predictive model for continuous target variables.

Scope of the Project Considering energy consumed by equipments used in the assembly line of automotive industry The solution is based on historical data collected for each equipments Out of Scope of the Project Excludes energy consumption optimization across supply chain such as sourcing, shipping strategies and production scheduling. Excludes forecasting of energy consumption based on date and time of equipment usage Limitations Generalization : Model for specific production settings wouldn’t generalize well to other production environments Equipment interactions : The combined effects of multiple equipment configurations on energy consumption will not be captured with the dataset we have generated; the readings are for equipments in isolation. Future Enhancements Integrate the model with EMS for real-time energy monitoring and optimization Implement advanced algorithms (deep learning) to capture complexity of interconnected processes Factor in the usage of renewable energy to consider the effects of energy sources during modelling Project Overview

4217 Total Rows ‹#› 13 Total Columns Data Statistics Statistics of Raw Data

‹#› Sample Data D ataset link : https://docs.google.com/spreadsheets/d/1KhgXODYWBIsst_JHe_ikxBpHOA86g4rQP2vgyXDCoiY/edit#gid=1326060398

EDA Findings & Insights ‹#›

Data Summary Equipment Age Feature Std. deviation: 0.9 years Range: 1 to 5 years Non Normal distribution ; So std deviation is NOT indicative of spread of data Dataset Span 10 years Jan 2012 to Dec 2022 ( Date column) Temperature (°F) of equipments Mean: 68°F Range: 52.3°F to 86.8°F Indicates lack of temperature control in manufacturing Missing values in Data Equipment Type and Equipment Age (years) have missing values. We will analyse them further.

Data Summary & Descriptive Statistics Has no missing value There are some data points that fall outside our whiskers indicating potential outliers. Further investigation on what these points are! (next slide) Energy Consumed range: 3.9 - 554.2 kWh Boxplot allows outlier detection

Outlier Analysis Energy Consumption (kWh) Outlier Characteristics by Equipment Type Outlier Mean of Electric Screwdriver ( 493.6 ) is very large for a small equipment while the count of readings is closer to the other equipments. There may be an error in these readings. Solution : Clip the values to reflect the minimum and maximum values within the IQR rather than removing them completely. This will be done to retain the data while not allowing the outliers to influence the data .

Has 0.9 % missing values. Handling missing values : Since the percentage of missing value is quite low ( 0.9% ) we can use the available data to model a clustering algorithm and classify the equipment type that is likely to have been in the missing cells. Data Summary & Descriptive Statistics Equipment Type frequency Lifters Conveyor Belt Electric Screwdriver Robotic Arm

Handling missing values : Approach 1: Since missing value is quite low (1.1%) we can use the dataset itself to model and predict the values for the missing cells. Approach 2: We can simply remove them depending on how the model performs in preliminary stages. Approach 3: I mpute average age for that equipment type We will choose the best approach based on model performance. Has 1.1% missing values. Non-normal distribution. Additional analysis is needed for outlier/anomaly detection. Further investigation in the next slide.. Data Summary & Descriptive Statistics

Only 342 equipments ( 2% of equipments) fall in the age group 4-5 years Ba sed on domain knowledge, equipments get replaced at 5 year mark in most US manufacturing and hence shouldn’t be considered as anomaly. The imbalance in representation may degrade model performance. Solution : We will use SVM or tree based model to ensure this imbalance doesn’t affect our solution. Use ensemble modelling* technique to mitigate effects of outliers without having to remove the outliers. Outlier Analysis Equipment Age (years ) * ensemble modelling : Combine multiple models to make predictions making it robust than using a single algorithm

Handling zero values : Around 1.4% of the readings indicate a ZERO defect in units produced. Mean defect is 3.58 for overall dataset These values may be treated as regular values for preliminary model. Further processing will be guided by model performance. Higher valued defects are less frequent Data Summary & Descriptive Statistics : Defective Units

‹#› Data Summary & Descriptive Statistics Temperature of equipment has a normal distribution. Boxplot shows a few outlier beyond the farthest whisker. The quantile plot shows a slight skew which can be fixed by using transformations like log transformation* if needed. * log transformation : One of the popular ways to transform skewed distribution into a normal distribution

Correlation & Association Analysis Defective Units vs Production Output As the Equipments produce more units, the number of defective units tend to increase. Inference : The performance goes down with more work. This correlation may introduce collinearity*. Note their correlation coefficient in the next slide… *collinearity: when two features in the dataset are correlated and cause bias in the model. Ideally, features should be independent of each other.

Correlation and Association Analysis Total work time and Production Output has the most positive correlation to Energy Consumed ie; longer the equipment works more energy it will consume. Equipment Age and Energy Consumption are loosely correlated (0.17) . Hence, older equipments (4-5 years) may not consume more energy as expected. This conclusion may not hold for equipments aged beyond 5 years in another manufacturing dataset (generalization limitation) Defective Units and Production Output has moderate correlation (0.34). We will keep them in the preliminary model and do feature selection if required. l ight green = higher correlation

Summary of Key Analysis

Feature Engineering & Model Selection ‹#›

Energy Consumed by each Equipment Type All four Equipment Types are equally represented in the data so there is no imbalance in representation of Equipment Types. The energy consumed by Electric Screwdriver is the least ( 3.41% ) while energy consumed by Robotic Paint Arms ( 36.03% ) and Lifters ( 35.48% ) are comparable as they are big equipments with longer running times. Visualization & Insights

Visualization & Insights Each Equipment Type is represented equally in every age groups (1-5). This is important to achieve an unbiased model.

‹#› Feature Engineering Treating Null/Missing Values Impute Missing Values: For EQUIPMENT_TYPE, we'll use the most frequent category since it's a categorical variable. For EQUIPMENT_AGE_YEARS, we'll impute missing values using the median, given its skewed distribution. Post-imputation, the distribution includes the imputed median values, slightly increasing the central peak of the distribution. This helps in stabilizing the dataset for further analysis without significant distortion.

‹#› Feature Engineering Treating Null/Missing Values Equipment age (years) Approach 1: We imputed the median age for the missing cells and tried out many models compared below *low error values make better models (MAE, MSE, RMSE) *high R2 values make better models due to high variance capture or best fit

‹#› Feature Engineering Treating Null/Missing Values Equipment age (years) Approach 2: We also removed the missing values from data and tried out many models compared below *low error values make better models (MAE, MSE, RMSE) *high R2 values make better models due to high variance capture or best fit

‹#› Feature Engineering Treating Null/Missing Values Equipment age (years) selected approach for null value treatment Each null value treatment gave comparable model performance. So we went ahead with the imputed missing value approach (1) for ‘ Equipment Age (years )’ to retain more information from the dataset.

‹#› Feature Engineering Treating Null/Missing Values Equipment Type Performed encoding, feature scaling and PCA before modelling for missing value. We trained an unsupervised model (knn) with clusters k = 4 for four equipment types Training & Validation set contains data without null rows Test set contains data with the null rows Using the clustering model, we predicted the ‘missing’ values.

‹#› Feature Engineering Categorical Data Conversion (Encoding) Why Encoding? Most Machine Learning models accept features which are only numerical . Encoding ensures we do not unintentionally introduce pattern to the data by mapping categorical values with 0, 1, 2,..and so on. Tree-based models accept categorical variables without encoding. Categorical fields in our data - all nominal in nature and has no order associated to the values

‹#› Feature Engineering Categorical Data Conversion Encoding the Features Encoding was performed before we compared various ML models (model selection) as it is universally accepted by any model. Each Categorical Field is split into separate features

‹#› Feature Engineering Outlier Handling Equipment Age (years) The outlier for Equipment age between 4-5 years is treated as a minority class and SMOTE* is applied to treat imbalance. Data is not removed as Equipment Age in 4-5 years is not an outlier and needs to be represented in data. Before oversampling data with SMOTE After over sampling data with SMOTE *SMOTE: Method to o versampling the minority class to have uniform distribution of all classes

‹#› Feature Engineering Outlier Handling Models trained after Smote method on Equipement Age (years) Models trained w ithout Smote on Equipement Age (years) In both the cases, the tree-based ensemble method ‘LGBM’ was the best model. Since it is tree-based and robust to outliers, it performs better even without oversampling.

‹#› Feature Engineering Creating New Features Based on Business Use Case Throughput = Productivity per unit time Non-defective Production Output / Total work time Cycle Time (hr) = The time equipment takes to process one unit of work Total work time / Production Output Days since maintained = Day difference between maintenance and the record date datediff(day, Last Maintenance Date, DATE) The line plot illustrates changes in throughput and cycle time as the days since the last maintenance increase. A common trend might show decreasing throughput or increasing cycle time if maintenance is delayed, indicating potential wear and inefficiency in equipment.

‹#› Feature Selection Due to multicollinearity observed to some degree in correlation analysis, we attempted feature selection to remove any biases in the model. RFE selection (Wrapper Method): A robust technique for feature selection that gave the following model results: Recursive Forward Elimination Process: Start with all the features in the model. Test each feature and drop the one that is least contributing to the model performance. Repeat the process, until no significant improvement is observed.

‹#› Model Selection & Development Used Pycaret to accelerate model selection on the final features with the following top Regression models: *low error values make better models (MAE, MSE, RMSE) *high R2 values make better models due to high variance capture or best fit LGBM was the best model with R2 97% which means that the model was able to capture almost 97% of the patterns or variance in data. It is also low in computational cost with Training Time (TT sec) 0.28 seconds.

Streamlit App Sneak Peak ‹#›

‹#› Landing Page (1)

‹#› Landing Page (2)

‹#› Five Essential Tabs The exploration page contains five essential tabs: EDA, Model Evaluation, Inference, Bulk Inference, Model Bias. EDA tab has the following expandable sections for ease of use. These are covered in the next slides….

‹#› Five Essential Tabs : EDA Data Summary is expandable tab to get more information on the data from data types, missing values, statistical summary, as well as a grid of how the data distribution look like for each column. The EDA tab contains sample d ata : I t gives a quick glance on what the data looks like and the data. The Data Dictionary is available to understand the data better.

‹#› Aggregation Tabulation allows data description in terms of data types, missing values, count summary, and more in a single glance. Statistical Summary contains data description for the three measures of central tendency and data at different quantiles - 25%, 50%, 75% and so on. Five Essential Tabs : EDA

‹#› The Generated Features section : Tells about the calculated features as part of feature engineering. The scatterplot shows each features’ relationship with the output variable - energy consumption Five Essential Tabs : EDA

‹#› Univariate Analysis: Categorical and Numerical App allows column selection for analysis for categorical variables in a pie chart distribution. App allows column selection for analysis for numerical variables in a box-plot chart.

‹#› Multivariate Analysis: All Features at a single glance App shows correlation between all features as a heatmap (matrix). App shows pairplot between all features as a scatterplot (matrix).

‹#› Model Evaluation

‹#› Model Inference Singular Point Inference Bulk Inference with save to table feature

‹#› Model Inference Bulk Inference Analysing the solution App allows users to view the predicted Energy Consumption in an interactive chart, broken down by Equipment Type and Date of record. Users can drill down to particular record to understand what parameters were provided for a predicted label (such as a spike in predicted energy consumption). There are few suggestions provided in possible next steps after such analysis.

‹#› Model Bias Negligible bias found in model which means it exhibits fairness and can be used in productions.

‹#› Future Scope

‹#› SonarLint Report Sonar Lint is an extension for maintaining code quality which we used in our development. Sonarlint Report : SonarLint by Sonar is a free IDE extension that empowers you to fix coding issues before they exist. More than a linter, SonarLint detects and highlights issues that can lead to bugs, vulnerabilities, and code smells as you create your code. Sonarlint Report After Resolving Issues:

Feedbacks Incorporated ‹#›

‹#› Feedbacks INCORPORATED Presentation Improvements Added source of the article cited in defining the Business Problem. This included facts found on Energy Consumption cost in manufacturing industry. Modified methodology diagram to include Snowflake Stored Procedure for training and Snowpark UDF being used for Inference in the implementation. Streamlit App Improvements Added Feature Importance chart for Bayesian Ridge Model in the Point Inference tab of app that explains how to interpret the point inference provided an input of features has been by the user. Added enhancements to data as a Future Scope, as per judge’s feedback from week 2 to add features on Equipment’s operating parameters.
Tags