Internship Review Presentation submission

AryanRajesh12 43 views 21 slides Aug 06, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

ok


Slide Content

BCSE399J (Summer Industrial Internship)/
CBS1902/CSE1902/ CSI3903 (Industrial Internship)
Type of Intern: Industry
Title of Intern: Machine Learning
Internship completed at :
<i-RESEAT Conference> ,
<Best Western Nada Don Mueang Airport Hotel,
Bangkok, Thailand> , i-RESEAT, [email protected]
Duration : 30 Days : < 21/11/2023 > to < 20/12/2024>
Date of Presentation: 04/05/2024
by
Aryan Rajesh
20BCE0718
School of Computer Science and Engineering
Internship Review - Presentation
1

Overview (what was firstly discovered and
why is it a breakthrough)
Using housing data from kaggle to build prediction models. The
data often includes attributes like square footage, location, number
of bedrooms etc. Location data requires special preprocessing as it
has an outsized impact on prices. Techniques like one-hot encoding
for neighborhoods are common. Identifying and removing outliers
also very important. Predictive accuracy in the 70-80% range on
unseen test data is considered quite good. Model interpretability
also important to understand which factors are most influential.
Sometimes insights are extracted - e.g. ranking locations by average
price per square foot. These add business value over just
predictions.

Overview (what was firstly discovered and
why is it a breakthrough)
This study addresses house price prediction in Bengaluru using linear
and multiple regression techniques. Utilizing a dataset of 1298 unique
localities, the research focuses on forecasting land prices in the
Bengaluru Metropolitan Area (BMA) in Karnataka, India. Beyond the
House Price Index (HPI), factors such as area type, availability, location,
society, and apartment size are considered. The goal is to predict the
price per square foot for apartments. In metropolitan cities like
Bengaluru, determining accurate sales prices remains challenging,
making predictive modeling crucial for real estate decision-making. The
models aim to capture the complex interplay of these factors in
influencing individual house prices in the dynamic real estate market of
Bengaluru.

Introduction

Understanding House Price Prediction
Introduction:
•Houses are essential for shelter and livelihood, influencing economic, financial, and political structures.
•Fluctuations in house prices pose challenges for stakeholders, including buyers and investors.
Importance of Data:
•Real estate data analysis aids in predicting market variations and mitigating future losses.
•Accurate prediction models are crucial for real estate businesses to make informed decisions.
Prediction Models:
•Support Vector Regression, Artificial Neural Network, and Bayesian Classifier are commonly used for house price
prediction.
•These models help stakeholders determine property valuations and make budget-based decisions.

Factors Influencing House Prices
Key Considerations for Home Buyers:
•Location, property size, proximity to amenities, and noise pollution are critical factors.
•Factors like air quality and noise pollution significantly impact property prices.
Market Trends:
•Bengaluru, as a real estate hotspot, has seen shifts in demand and price due to factors like COVID-19
and regulatory changes.
•Trust issues with property developers have affected sales and prices in cities like Bengaluru.
Machine Learning Approach:
•Bayesian Classifier, a supervised learning technique, is utilized for predictive analysis based on various
factors.

House Price Prediction Model Overview
Utilization of Technology:
•Leveraging data from trusted sources and employing machine learning algorithms for prediction.
•Supervised learning techniques like Bayesian Classifier are used for accurate predictions.
Application Across Industries:
•Predictive models find applications in economics, banking, healthcare, e-commerce, and more.
•Algorithms like KNN, Decision Tree, and Regression techniques are employed based on data characteristics
and requirements.
Ensuring Integrity:
•The model aims to provide the best predictions based on gathered data, maintaining system integrity and
user trust.

Aim/Objective
The primary aim of the project was to develop an accurate and reliable
model for predicting house prices in Bengaluru, India. The specific
objectives were:
•To explore and analyze the factors that influence house prices in
Bengaluru.
•To preprocess and clean the dataset to prepare it for modeling.
•To implement and evaluate various machine learning algorithms for
house price prediction.
•To identify the best-performing model and deploy it for practical usage.

Motivation
•The motivation behind this project was driven by the growing
demand for housing in metropolitan cities like Bengaluru and the
need for reliable tools to assist home buyers, investors, and real
estate developers in making informed decisions.
•Accurate house price prediction can help stakeholders determine
fair property valuations, identify overpriced or underpriced
properties, and make sound investment choices.

About the Industry
The 5th International Conference on Renewable Energy, Sustainable Environmental and
Agricultural and Artificial Intelligence Technologies (i-RESEAT-2023)"is a Hybrid Mode
Conference being organized by Thammasat University (Pathumtani City, Thailand) and co-
organized with Maejo University (Chiang Mai City, Thailand), Kaohsiung Medical University
(Kaohsiung City, Taiwan), University of Stavanger (Stavanger, Norway), and other supporting
partner universities /Institutions around the world. This conference aims to be the premier
forum for presenting new breakthroughs and research results in the theoretical, experimental,
and practical domains of Energy, Environment, and Agriculture Innovations, and Technologies.
It also serves as an excellent international forum for researchers, practitioners, industries and
educators to present and discuss the most recent innovations, trends, and concerns, as well
as practical challenges encountered and solutions implementation. The conference will bring
together world-renowned researchers, engineers, and scientists from around the world in this
field of Interest. The main theme of the i-RESEAT-2023 conferences is “Go-Green, Go-Eco, Go-
Smart Agri-Tech, and Go-BCG”.

Certificate

Skills Acquired during Industrial Internship Period
During the industrial internship, I have gained valuable skills in various domains, which can be categorized into
the following sections:
Data Preprocessing and Exploration using Pandas
Handling Missing Data
•Identifying and addressing missing values in the dataset
•Employing techniques like imputation or removal of missing data
Dealing with Outliers
•Detecting and handling outliers in the data
•Using appropriate methods like winsorization or removal of extreme values
Data Transformation
•Converting data into suitable formats for modeling
•Techniques like normalization, scaling, and encoding categorical variables
Exploratory Data Analysis (EDA)
•Visualizing data using various plots (scatter plots, bar plots, histograms)
•Analyzing patterns, trends, and relationships between variables

Feature Engineering and Selection
Relevant Feature Identification
•Determining the most influential features for house price prediction
•Employing techniques like correlation analysis and feature importance
Feature Creation
•Generating new features from existing ones
•Combining or transforming features to capture additional information
Feature Scaling and Encoding
•Scaling numerical features for improved model performance
•Encoding categorical features for use in machine learning algorithms
Machine Learning Modeling
Supervised Learning Algorithms
•Linear Regression
•Decision Trees
•Lasso Regression

Model Evaluation Metrics
•Mean Squared Error (MSE)
•R-squared
•Root Mean Squared Error (RMSE)
Hyperparameter Tuning
•Techniques like GridSearchCV for optimizing model parameters
Web Development and Deployment
Flask Framework
•Building a web application using the Python Flask framework
•Integrating the trained machine learning model for user interaction
HTML, CSS and Web Design
•Creating user-friendly interfaces for input and output display
•Enhancing the website's visual appeal and usability

New Technologies/Frameworks/Real-time
Problems/Analysis-based Knowledge Acquired
Introduction to Machine Learning Frameworks
Scikit-learn Library
•Utilizing the powerful Scikit-learn library in Python
•Implementing various machine learning algorithms and preprocessing
techniques
TensorFlow or PyTorch (if applicable)
•Exposure to deep learning frameworks like TensorFlow or PyTorch
•Understanding the potential of deep learning for complex problems

Real-world Problem: House Price Prediction
Understanding the Importance
•Recognizing the significance of accurate house price prediction
•Implications for stakeholders (buyers, sellers, investors, developers)
Influencing Factors
•Identifying key factors that impact house prices
•Location, area, number of rooms, amenities, neighborhood characteristics
Data Acquisition and Preprocessing
•Sourcing relevant datasets for house price prediction
•Cleaning, transforming, and preparing data for modeling

Exploratory Data Analysis and Visualization
Univariate Analysis
•Analyzing the distribution and characteristics of individual features
•Identifying outliers, skewness, and central tendencies
Bivariate Analysis

•Exploring relationships between pairs of features
•Scatter plots, correlation matrices, and other visualizations
Multivariate Analysis

•Investigating interactions among multiple features
•Techniques like principal component analysis (PCA) or t-SNE

Model Evaluation and Interpretation
Evaluation Metrics
•Understanding and interpreting evaluation metrics and using K-Fold Cross Validation valuates a
predictive model's performance by splitting a dataset into folds, and training and evaluating the
modeloneachsubset
•Selecting appropriate metrics based on the problem and data characteristics like copy_X, n_jobs
for linear regression, alpha, selection for Lasso Regression and criterion splitter for Decision Tree
Model Comparison
•Comparing the performance of different machine learning models for getting accurate outputs
•Using GridSearchCV to find out the best model among the models used based on accuracy
Feature Importance
•Determining the relative importance of features
•Techniques like coefficients (linear models) or feature importances (tree-based models)

Web Application Development and Deployment
Front-end Development
•Creating user-friendly interfaces using HTML, CSS, and JavaScript
•Designing intuitive layouts and interactive elements
Back-end Integration
•Integrating the trained machine learning model with the web application using Flask framework
•Handling user inputs and generating predictions
Deployment Strategies
•Deploying the web application to the localhost
•Ensuring scalability, security, and reliability

Conclusion
The industrial internship provided a valuable opportunity to apply
theoretical concepts to a real-world problem of house price prediction in
Bengaluru. Through hands-on experience, I have acquired practical skills
in data preprocessing, exploratory data analysis, feature engineering,
and implementing machine learning algorithms like linear regression,
decision trees, and lasso regression.
A key achievement was the development of an accurate linear regression
model for predicting house prices, achieving an impressive 85%
accuracy. This model was successfully deployed as a user-friendly web
application using the Flask framework, allowing users to input relevant
parameters and obtain predicted house prices.

The internship also exposed me to the latest technologies and frameworks
in the field of data analysis and machine learning, such as scikit-learn, data
visualization techniques, and web development tools like HTML, CSS, and
JavaScript.
Overall, the industrial internship proved to be a enriching experience,
enabling me to gain practical skills, industry exposure, and a deeper
understanding of the real estate domain. The knowledge and expertise
acquired during this internship will serve as a strong foundation for future
endeavors in the field of data science and machine learning.
Tags