AIRLINE FARE PREDICTION USING MACHINE LEARNING.pdf

spub1985 62 views 77 slides Feb 23, 2025
Slide 1
Slide 1 of 77
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77

About This Presentation

AIRLINE FARE PREDICTION USING MACHINE LEARNING.pdf


Slide Content

A MAJOR PROJECT REPORT
ON
“AIRLINE FARE PREDICTION USING MACHINE LEARNING ”
Submitted to

SRI INDU COLLEGE OF ENGINEERING & TECHNOLOGY, HYDERABAD
In partial fulfillment of the requirements for the award of degree of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING

Submitted by
J. PAVITHRA [20D41A05N8]
D. SHIVA [20D41A05N2]
S. AJAYREDDY [20D41A05Q0]
SHUBHANKAR HALDAR [20D41A05M7]

Under the esteemed guidance of
Mr. SNVASRK PRASAD
(Assistant Professor)


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SRI INDU COLLEGE OF ENGINEERING AND TECHNOLOGY
(An Autonomous Institution under UGC, Accredited by NBA, Affiliated to JNTUH)
Sheriguda (V), Ibrahimpatnam (M), Rangareddy Dist – 501 510
(2023-2024)

SRI INDU COLLEGE OF ENGINEERING AND TECHNOLOGY
(An Autonomous Institution under UGC, Accredited by NBA, Affiliated to JNTUH)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


CERTIFICATE


Certified that the Major project entitled “AIRLINE FARE PREDICTION USING
MACHINE LEARNING ” is a bonafide work carried out by J. PAVITHRA [20D41A05N8],
D. SHIVA[20D41A05N2], S. AJAYREDDY[20D41A05Q0], SHUBHANKAR HALDAR
[20D41A05M7] in partial fulfillment for the award of degree of Bachelor of Technology in
Computer Science and Engineering of SICET, Hyderabad for the academic year 2023-
2024.The project has been approved as it satisfies academic requirements in respect of the work
prescribed for IV Year, II-Semester of B. Tech course.



INTERNAL GUIDE HEAD OF THE DEPARTMENT

(Mr. SNVASRK PRASAD) (Prof .Ch. GVN. Prasad)
(Assistant Professor)




EXTERNAL EXAMINER

ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of the task would be put
incomplete without the mention of the people who made it possible, whose constant guidance
and encouragement crown all the efforts with success. We are thankful to Principal
Dr. G. SURESH for giving us the permission to carry out this project. We are highly indebted
to Prof. Ch. GVN. Prasad, Head of the Department of Computer Science Engineering, for
providing necessary infrastructure and labs and also valuable guidance at every stage of this
project. We are grateful to our internal project guide Mr. SNVASRK PRASAD, Assistant
Professor for his constant motivation and guidance given by him during the execution of this
project work. We would like to thank the Teaching & Non-Teaching staff of Department of
Computer Science and engineering for sharing their knowledge with us, last but not least we
express our sincere thanks to everyone who helped directly or indirectly for the completion of
this project.







J. PAVITHRA [20D41A05N8]
D. SHIVA [20D41A05N2]
S. AJAYREDDY [20D41A05Q0]
SHUBHANKAR HALDAR [20D41A05M7]

ABSTRACT

This paper discusses the issue of airfare. A set of characteristics defining a typical flight is
chosen for this purpose, with the assumption that these characteristics influence the price of an
airline ticket. Flight ticket prices fluctuate depending on different parameters such as flight
schedule, destination, and duration, a variety of occasions such as vacations or the holiday
season. As a result, having a basic understanding of flight rates before booking a vacation will
undoubtedly save many individuals money and time. Analyzing 3 datasets to get insights about
the airline fare and the features of the three datasets are applied to the seven different machine
learning (ML) models which are used to predict airline ticket prices, and their performance is
compared. The goal is to investigate the factors that determine the cost of a flight. The data can
then be used to create a system that predicts flight prices.

I

CONTENTS

S.No. Chapters Page No.

i. List of contents ………………………………………………………… ...i
ii. List of Figures ......................................................................................... iii
iii. List of Screenshots ................................................................................... iv
1. INTRODUCTION
1.1 INTRODUCTION TO PROJECT .......................................................................... 01
1.2 LITERATURE SURVEY……………………………………………………… …..02

1.3 MODULES……………………………………………………………………... .....05
2. SYSTEM ANALYSIS
2.1 EXISTING SYSTEM & ITS DISADVANTAGES ............................................... 07
2.2 PROPOSED SYSTEM & ITS ADVANTAGES ................................................... 08
2.3 SYSTEM REQUIREMENTS ................................................................................ 09
3. SYSTEM STUDY
3.1 FEASIBILITY STUDY ........................................................................................ 10
4. SYSTEM DESIGN
4.1 ARCHITECTURE .................................................................................................. 13
4.2 UML DIAGRAMS ................................................................................................. 13
4.2.1 USECASE DIAGRAM. ............................................................................ 14
4.2.2 CLASS DIAGRAM ................................................................................... 15
4.2.3 SEQUENCE DIAGRAM .......................................................................... 16
4.2.4 ACTIVITY DIAGRAM ........................................................................... 17
4.2.5 DEPLOYMENT DIAGRAM……………………………………………... 18

II




5. TECHNOLOGIES USED
5.1 WHAT IS PYTHON ........................................................................................... 20
5.1.1 ADVANTAGRS & DISADVANTAGES OF PYTHON ......................... 20
5.1.2 HISTORY ................................................................................................ 21
5.2 WHAT IS MACHINE LEARNING ? ..................................................................... 22
5.2.1 CATEGORIES OF ML ............................................................................. 22
5.2.2 NEED FOR ML………………………………………………………….... 22
5.2.3 CHALLENGES IN ML…………………………………………………… 23
5.2.4 APPLICATIONS………………………………………………………….. 24
5.2.5 HOW TO START LEARNING ML?……………………………………... 24
5.2.6 ADVANTAGES & DISADVANTAGES OF ML……………………… .. 25
5.3 PYTHON DEVELOPMENT STEPS……………………………………………….. 26
5.4 MODULES USED IN PYTHON…………………………………………………… 27
5.5 INSTALL PYTHON STEP BY STEP IN WINDOWS & MAC………………... .....28
6. IMPLEMENTATION
6.1 SOFTWARE ENVIRONMENT…………………………………………………….. 38
6.1.1 PYTHON……………………………………………………………….. 38
6.1.2 SAMPLE CODE…………………………………………………... ......39
7. SYSTEM TESTING
7.1 INTRODUCTION TO TESTING…………………………………………………. .51
7.2 TESTING STRATEGIES……………………………………………………… …...51
8. SCREENSHOTS …………………………………………………… …..57
9. CONCLUSION ………………………………………………………… 67
10. REFERENCES ……………………………………………………… …68

III

LIST OF FIGURES


Fig No Name Page No
4.1.1 System Architecture 13
4.2.1 Use Case diagram 14
4.2.2 Class diagram 15
4.2.3 Sequence diagram 16
4.2.4 Activity diagram 17
4.2.5 Deployment diagram 18

IV

LIST OF SCREENSHOTS






Fig No Name Page No
8.1 Home Page 57
8.2 Admin Page 57
8.3
Register Page 58
8.4
User Page
59
8.5
Contact Information
60
8.6
Dashboard
61
8.7
Uploading Data
62
8.8
Data Formed
62
8.9
Graph Analysis
63
8.10
About Page
63
8.11
About Page Success
64
8.12
Predicted Page
65
8.13
User Profile
66

1

1. ITRODUCTION

1.1 Introduction: In today's world, airlines attempt to control flight ticket costs in order to
maximize profits. Most people who fly regularly know the best times to buy cheap tickets.
However, many customers who are not good at booking tickets fall into the discount trap
set by the company, causing them to spend their money. The main goal of airline
companies is to make a profit, while the customer is looking for the best purchase.
Customers frequently aim to purchase tickets far in advance of the departure date in order
to prevent price increases as the departure date approaches. Due to the great complexity of
the fare models used by airlines, it is very difficult for a customer to buy an airline ticket
at a very low price because the price is constantly fluctuating. Airlines can lower their
ticket prices when they need to create a market and when tickets are harder to obtain. These
tactics consider a number of financial, marketing, commercial, and social factors that are
all linked to ultimate flight pricing. They might be able to get the most profit possible. As
a result, costs may be influenced by various factors. The price model used by airlines is so
complex that prices fluctuate constantly, making it very difficult for customers to buy
tickets at very low prices. Surveys of customers and airlines have grown steadily over the
last two decades. From a customer point of view, it is an important question to establish a
low price or a good time to buy a ticket. In this paper, we will be using the collected data
from three different sources to build the models using Machine Learning algorithms.
Customers can save millions of rupees by using the proposed method to get the information
they need to order tickets at the proper moment.

2

1.2 Literature Survey

TITLE: "Robust Dynamic Pricing With Strategic Customers,"
ABSTRACT: We consider the canonical revenue management (RM) problem wherein a
seller must sell an inventory of some product over a finite horizon via an anonymous, posted
price mechanism. Unlike typical models in RM, we assume that customers are forward
looking. In particular, customers arrive randomly over time and strategize about their times of
purchases. The private valuations of these customers decay over time and the customers incur
monitoring costs; both the rates of decay and these monitoring costs are private information.
This setting has resisted the design of optimal dynamic mechanisms heretofore. Optimal
pricing schemes—an almost necessary mechanism format for practical RM considerations—
have been similarly elusive.

TITLE: "Airline ticket price and demand prediction: A survey"
ABSTRACT: Nowadays, airline ticket prices can vary dynamically and significantly for
the same flight, even for nearby seats within the same cabin. Customers are seeking to get the
lowest price while airlines are trying to keep their overall revenue as high as possible and
maximize their profit. Airlines use various kinds of computational techniques to increase their
revenue such as demand prediction and price discrimination. From the customer side, two
kinds of models are proposed by different researchers to save money for customers: models
that predict the optimal time to buy a ticket and models that predict the minimum ticket price.
In this paper, we present a review of customer side and airlines side prediction models. Our
review analysis shows that models on both sides rely on limited set of features such as
historical ticket price data, ticket purchase date and departure date. Features extracted from
external factors such as social media data and search engine query are not considered.
Therefore, we introduce and discuss the concept of using social media data for ticket/demand
prediction.

3

TITLE: "Data-driven Modeling of Airlines Pricing"
ABSTRACT: The popularity of travelling by airplanes is constantly growing. Much of
existing research describe the global flight market. At the same time, Russian air market is
characterized by its peculiarities that have to be identified to build proper models of airfare.
The objective of this study is to analyze Russian air transportation market and compare the
behavior of prices on local and global flights. Using these data, collected from two
independent ticket price information aggregators (AviaSales and Sabre) for the period of
spring-summer 2015, an empirical data-driven model was built for air prices prediction for
different flight directions. We found that the form of price dependency on purchase earliness
differs dramatically between local and international flights in two largest Russian cities
(Moscow and Saint-Petersburg).
TITLE: "Airfare prices predictiono using machine learning techniques,"
ABSTRACT: This paper deals with the problem of airfare prices prediction. For this
purpose a set of features characterizing a typical flight is decided, supposing that these features
affect the price of an air ticket. The features are applied to eight state of the art machine
learning (ML) models, used to predict the air tickets prices, and the performance of the models
is compared to each other. Along with the prediction accuracy of each model, this paper
studies the dependency of the accuracy on the feature set used to represent an airfare. For the
experiments a novel dataset consisting of 1814 data flights of the Aegean Airlines for a
specific international destination (from Thessaloniki to Stuttgart) is constructed and used to
train each ML model. The derived experimental results reveal that the ML models are able to
handle this regression problem with almost 88% accuracy, for a certain type of flight features.
TITLE: "A Bayesian Approach for Flight Fare Prediction Based on
Kalman Filter,"
ABSTRACT: Decision-making under uncertainty is one of the major issues faced by
recent computer-aided solutions and applications. Bayesian prediction techniques come handy
in such areas of research. In this paper, we have tried to predict flight fares using Kalman filter
which is a famous Bayesian estimation technique. This approach presents an algorithm based
on the linear model of the Kalman Filter. This model predicts the fare of a flight based on the
input provided from an observation of previous fares. The observed data is given as input in
the form of a matrix as required to the linear model, and an estimated fare for a specific
upcoming flight is calculated.

4


TITLE: "A regression model for predicting optimal purchase timing for
airline tickets,"
ABSTRACT: Optimal timing for airline ticket purchasing from the consumer’s perspective
is challenging principallybecause buyers have insufficient information for reasoning about
future price movements. This paperpresents a model for computing expected future prices and
reasoning about the risk of price changes.The proposed model is used to predict the future
expected minimum price of all available flights on specificroutes and dates based on a corpus
of historical price quotes. Also, we apply our model to predict pricesof flights with specific
desirable properties such as flights from a specific airline, non-stop only flights, ormulti-
segment flights. By comparing models with different target properties, buyers can determine
thelikely cost of their preferences. We present the expected costs of various preferences for
two high-volumeroutes. Performance of the prediction models presented is achieved by
including instances of time-delayedfeatures, by imposing a class hierarchy among the raw
features based on feature similarity, and by pruningthe classes of features used in prediction
based on in-situ performance. Our results show that purchasepolicy guidance using these
models can lower the average cost of purchases in the 2 month period priorto a desired
departure. The proposed method compares favorably with a deployed commercial web
siteproviding similar purchase policy recommendations.

TITLE: "Credit Card Fraud Detection Using Machine Learning,"
ABSTRACT: Credit card frauds are easy and friendly targets. E-commerce and many
other online sites have increased the online payment modes, increasing the risk for online
frauds. Increase in fraud rates, researchers started using different machine learning methods
to detect and analyse frauds in online transactions. The main aim of the paper is to design and
develop a novel fraud detection method for Streaming Transaction Data, with an objective, to
analyse the past transaction details of the customers and extract the behavioural patterns.
Where cardholders are clustered into different groups based on their transaction amount. Then
using sliding window strategy [1], to aggregate the transaction made by the cardholders.

5

1.3 MODULES
Machine learning introduces several techniques for predicting aircraft ticket pricing.
Algorithms that we have used include:
• Linear Regression.
• K-Neighbor Regression.
• Support Vector Machine.
• Decision Tree.
• Random Forest.
These models have been implemented using the sci-kit learn python library. In order to
verify the performance of these models, parameters such as R-square, MAE, MSE, and
RMSE are used.
KNN Regression
A k-neighbor regression analysis gives the average of its k nearest neighbors. Like SVM,
this is a non-parametric approach. The results are obtained using only a few values to get
the best value. KNN is a supervised classification technique used as a regressor. It adds a
new data point to the class. Since no assumptions are made, it is not parametric. It calculates
the distance between each training example and a new data set. The model selects K
elements from the data set that are near the new data point. The distance is calculated using
the Euclidean distance, the Manhattan distance or the Hamilton distance.
Linear Regression
Linear regression is a supervised learning (ML) technique. It performs regression tasks.
It is a linear model, assuming that there is a linear relationship between the input variable
(x) and a single output variable (y). Y can be calculated by linear inclusion of input
variables, especially (x). Because our data set contains many independent features that
prices may depend on, we will use multiple linear regression (MLR) to estimate the
relationship between two or more independent variables and a dependent variable.

6

Decision Tree Regression
A decision tree is a tree structure used to build regression or classification models. In
addition, a decision tree is generated for each data set that is reduced in size. This generates
solutions and leaf nodes. The decision tree selects independent variables from the dataset
as decision nodes for making a decision. When test data is entered into the model, the result
is determined by looking at which segment the data point belongs to. And the decision tree
will output the average of all data points in the subsection of the section that the data point
belongs to.
Random Forest Regression
The random forest algorithm combines less accurate models to create more accurate
models. It combines the base model with another model to create a larger model. The
features are scanned and passed on to the trees without replacement in order to generate
strongly uncorrelated decision trees. It is necessary to have a lower correlation between
trees in order to choose the best split. The main principle that distinguishes the random
forest from the decision tree is the aggregated uncorrelated trees. A random forest is an
ensemble learning technique in which the training model uses a variety of learning
algorithms that are then combined to produce a final predicted result. When the output of
the random forest model is examined, a random number of features and data sets will
average the predicted values, which falls within the bagging area of ensemble learning.
Support Vector Machine
A support vector machine (SVM) is a supervised machine learning algorithm that classifies
data by finding an optimal line or hyperplane that maximizes the distance between each
class in an N-dimensional space.
There are two approaches to calculating the margin, or the maximum distance between
classes, which are hard-margin classification and soft-margin classification.

7

2. SYSTEM ANALYSIS

2.1 Existing System & its Disadvantages:
Airlines can lower their ticket prices when they need to create a market and when tickets are
harder to obtain. These tactics consider a number of financial, marketing, commercial, and
social factors that are all linked to ultimate flight pricing.
They might be able to get the most profit possible. As a result, costs may be influenced by
various factors. The price model used by airlines is so complex that prices fluctuate constantly,
making it very difficult for customers to buy tickets at very low prices. Surveys of customers
and airlines have grown steadily over the last two decades.
Regression machine learning models for airline ticket price prediction have been developed by
[4]. Data from 1814 flights on a single international route was used in the development of this
model, including departure and arrival times, bag allowance, and the number of free baggage
allowances per flight. They used eight different regression machine learning models, which are
Extreme Learning Machine (ELM), Multilayer Perceptron (MLP), Generalized Regression
Neural Network, Random Forest Regression Tree, Regression Tree, Linear Regression (LR),
Regression SVM (Polynomial and Linear), Bagging Regression Tree. The model produced the
following performance results: The Bagging Regression is accurate to 87.42% and 85.91%
accuracy for Random Forest Regression Tree.
DISADVANTAGES:

• Increased Dependency: Visually-impaired individuals may become more dependent on
others for assistance in identifying objects and navigating their environment of the
limiting their independence and autonomy.
• Safety Risks: Without an object detection and recognition system, visually-impaired
individuals may be more prone to accidents and injuries due to obstacles and hazards
that they are unable to detect.

8

2.2 Proposed System & it’s Advantages:

The proposed system aims to address the issue of airfare by analysing a set of characteristics that
define a typical flight, assuming that these features significantly influence the price of an airline
ticket. The fluctuation in flight ticket prices is attributed to various parameters, including flight
schedule, destination, duration, and occasions such as vacations or holiday seasons.
Data Collection: Gather a dataset comprising historical flight information, including departure
and arrival locations, dates, times, airlines, ticket prices, and other relevant features. This dataset
should cover a wide range of routes, airlines, and time periods to capture diverse patterns. This
involves cleaning the data, handling missing values, encoding categorical variables, and possibly
feature scaling or normalization. Create new features or transform existing ones that might better
represent the relationships between the input variables and the target variable (fare). For example,
you might extract features such as day of the week, time of the day, distance between departure
and arrival locations, and any seasonal trends.
Choose appropriate machine learning algorithms for regression tasks. Common choices include
linear regression, decision trees, random forests, gradient boosting methods (like XG Boost or
Light GBM), and neural networks. Split the dataset into training and testing sets. Train the
selected model(s) on the training data.

ADVANTAGES:

Improved Accuracy: Machine learning models can analyze vast amounts of historical data and
complex patterns to make more accurate fare predictions compared to traditional methods. This
can help both airlines and travelers make better-informed decisions regarding ticket prices.
Dynamic Pricing: Airlines can leverage machine learning models to implement dynamic pricing
strategies, adjusting fares in real-time based on factors such as demand, time until departure,
competitor pricing, and seat availability. This flexibility can maximize revenue for airlines while
offering competitive prices to travelers.
Personalized Pricing: Machine learning algorithms can analyze individual traveler preferences,
booking history, and browsing behavior to offer personalized fare recommendations. This can
enhance customer satisfaction and increase loyalty by providing tailored pricing options.

9

2.3 SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
• System : i3 Processor 5
th
Gen.
• Hard Disk : 200 GB.
• RAM : 4GB.

SOFTWARE REQUIREMENTS:

• Operating System : Windows 10/11
• Development Software : Python 3.8
• Programming Language : Python
• Integrated Development
Environment (IDE) : Visual Studio Code
• Front End Technologies : HTML5, CSS3, Java Script
• Database Language : SQL
• Design/Modelling : Rational Rose

10

3. SYSTEM STUDY

3.1 FEASIBILITY STUDY
1. TECHNICAL FEASIBILITY
2. OPERATIONAL FEASIBILITY
3. ECONOMIC FEASIBILITY
INTRODUCTION
A feasibility study assesses the operational, technical and economic merits of the proposed
project. The feasibility study is intended to be a preliminary review of the facts to see if it is
worthy of proceeding to the analysis phase. From the systems analyst perspective, the
feasibility analysis is the primary tool for recommending whether to proceed to the next phase
or to discontinue the project.
The feasibility study is a management-oriented activity. The objective of a feasibility study is
to find out if an information system project can be done and to suggest possible alternative
solutions.
Projects are initiated for two broad reasons:
1. Problems that lend themselves to systems solutions
2. Opportunities for improving through:
(a) upgrading systems
(b) altering systems
(c) installing new systems
TECHNICAL FEASIBILITY
A large part of determining resources has to do with assessing technical feasibility. It considers
the technical requirements of the proposed project. The technical requirements are then
compared to the technical capability of the organization. The systems project is considered
technically feasible if the internal technical capability is sufficient to support the project
requirements.

11

The analyst must find out whether current technical resources can be upgraded or added to in a
manner that fulfils the request under consideration. This is where the expertise of system
analysts is beneficial, since using their own experience and their contact with vendors they will
be able to answer the question of technical feasibility.
The essential questions that help in testing the operational feasibility of a system include the
following:
• Is the project feasible within the limits of current technology?
• Does the technology exist at all?
• Is it available within given resource constraints?
• Is it a practical proposition?
OPERATIONAL FEASIBILITY
Operational feasibility is dependent on human resources available for the project and involves
projecting whether the system will be used if it is developed and implemented.
Operational feasibility is a measure of how well a proposed system solves the problems, and
takes advantage of the opportunities identified during scope definition and how it satisfies the
requirements identified in the requirements analysis phase of system development.
The essential questions that help in testing the operational feasibility of a system include the
following:
• Does current mode of operation provide adequate throughput and response time?
• Does current mode provide end users and managers with timely, pertinent, accurate and
useful formatted information?
• Does current mode of operation provide cost-effective information services to the
business?
• Could there be a reduction in cost and or an increase in benefits?
• Does current mode of operation offer effective controls to protect against fraud and to
guarantee accuracy and security of data and information?

12

• Does current mode of operation make maximum use of available resources, including
people, time, and flow of forms?
• Does current mode of operation provide reliable services
• Are the services flexible and expandable?
• Are the current work practices and procedures adequate to support the new system?
• If the system is developed, will it be used?
ECONOMIC FEASIBILITY
Economic analysis could also be referred to as cost/benefit analysis. It is the most frequently
used method for evaluating the effectiveness of a new system. In economic analysis the
procedure is to determine the benefits and savings that are expected from a candidate system
and compare them with costs. If benefits outweigh costs, then the decision is made to design
and implement the system. An entrepreneur must accurately weigh the cost versus benefits
before taking an action.
Possible questions raised in economic analysis are:
• Is the system cost effective?
• Do benefits outweigh costs?
• The cost of doing full system study
• The cost of business employee time
• Estimated cost of hardware
• Estimated cost of software/software development
• Is the project possible, given the resource constraints?
• What are the savings that will result from the system?
• Cost of employees' time for study
• Cost of packaged software/software development.

13

4. SYSTEM DESIGN
4.1 DATA FLOW DIAGRAM:
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing carried out
on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the
system.

4.1.1System Architecture
4.2 UML DIAGRAMS
UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling
language in the field of object-oriented software engineering. The standard is managed, and
was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object oriented
computer software. In its current form UML is comprised of two major components: a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with, UML.

14

4.2.1 USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. Its purpose is to present a graphical overview
of the functionality provided by a system in terms of actors, their goals (represented as use
cases), and any dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the actors in the
system can be depicted.



Fig-4.2.1

15

4.2.2 CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of
static structure diagram that describes the structure of a system by showing the system's classes,
their attributes, operations (or methods), and the relationships among the classes. It explains
which class contains information.


Fig-4.2.2
4.2.3 SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram
that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.

16


Fig-4.2.3

17

4.2.4 ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.

Fig-4.2.4

18

4.2.5 DEPLOYMENT DIAGRAM:
Deployment Diagram is a type of diagram that specifies the physical hardware on which the
software system will execute. It also determines how the software is deployed on the underlying
hardware. It maps software pieces of a system to the device that are going to execute it.
The deployment diagram maps the software architecture created in design to the physical
system architecture that executes it. In distributed systems, it models the distribution of the
software across the physical nodes.
The software systems are manifested using various artifacts, and then they are mapped to the
execution environment that is going to execute the software such as nodes. Many nodes are
involved in the deployment diagram; hence, the relation between them is represented using
communication paths.

Fig-4.2.5

19

There are two forms of a deployment diagram.
• Descriptor form
• It contains nodes, the relationship between nodes and artifacts.
• Instance form
• It contains node instance, the relationship between node instances and artifact instance.
• An underlined name represents node instances.
Purpose of a deployment diagram
Deployment diagrams are used with the sole purpose of describing how software is deployed
into the hardware system. It visualizes how software interacts with the hardware to execute the
complete functionality. It is used to describe software to hardware interaction and vice versa.
Deployment Diagram Symbol and notations

Deployment Diagram Notations

20

5.TECHNOLOGIES

5.1 WHAT IS PYTHION

Below are some facts about Python.

Python is currently the most widely used multi-purpose, high-level programming language.

Python allows programming in Object-Oriented and Procedural paradigms. Python
programs generally are smaller than other programming languages like Java.
Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.
Python language is being used by almost all tech-giant companies like – Google, Amazon,
Facebook, Instagram, Dropbox, Uber… etc.
5.1.1 ADVANTAGES & DIADVANTAGES OF PYTHON
1. Extensible

As we have seen earlier, Python can be extended to other languages. You can write some of
your code in languages like C++ or C. This comes in handy, especially in projects.
2. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities
to our code in the other language.
3. Improved Productivity

The language’s need to be in simplicity and extensive libraries render programmers more
productive than languages like Java and C++ do. Also, the fact that you need to write less and
get more things done.
4. Extensible

As we have seen earlier, Python can be extended to other languages. You can write some of
your code in languages like C++ or C. This comes in handy, especially in projects.

21

5. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities
to our code in the other language.
6. Improved Productivity

The language’s need to be in simplicity and extensive libraries render programmers more
productive than languages like Java and C++ do. Also, the fact that you need to write less and
get more things done.
7. IOT Opportunities

Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright for
the Internet Of Things. This is a way to connect the language with the real world.

5.1.2 HISTORY OF PYTHON
What do the alphabet and the programming language Python have in common? Right, both start
with ABC. If we are talking about ABC in the Python context, it's clear that the programming
language ABC is meant. ABC is a general-purpose programming language and programming
environment, which had been developed in the Netherlands, Amsterdam, at the CWI (Centrum
Wiskunde &Informatica). The greatest achievement of ABC was to influence the design of
Python. Python was conceptualized in the late 1980s. Guido van Rossum worked that time in a
project at the CWI, called Amoeba, a distributed operating system. In an interview with Bill
Venners1, Guido van Rossum said: "In the early 1980s, I worked as an implementer on a team
building a language called ABC at Centrum Wiskunde en Informatica (CWI). I don't know how
well people know ABC's influence on Python. I try to mention ABC's influence because I'm
indebted to everything I learned during that project and to the people who worked on it. Later on
in the same Interview, Guido van Rossum continued: "I remembered all my experience and some
of my frustration with ABC. I decided to try to design a simple scripting language that possessed
some of ABC's better properties, but without its problems. So I started typing. I created a simple
virtual machine, a simple parser, and a simple runtime. I made my own version of the various
ABC parts that I liked. I created a basic syntax, used indentation for statement grouping instead
of curly braces or begin-end blocks, and developed a small number of powerful data types: a hash
table (or dictionary, as we call it), a list, strings, and numbers."

22

5.2 WHAT IS MACHINE LEARNING

Before we take a look at the details of various machine learning methods, let's start by
looking at what machine learning is, and what it isn't. Machine learning is often categorized
as a subfield of artificial intelligence, but I find that categorization can often be misleading
at first brush. The study of machine learning certainly arose from research in this context,
but in the data science application of machine learning methods, it's more helpful to think
of machine learning as a means of building models of data.

Fundamentally, machine learning involves building mathematical models to help
understand data. "Learning" enters the fray when we give these models tunable parameters
that can be adapted to observed data; in this way the program can be considered to be
"learning" from the data. Once these models have been fit to previously seen data, they can
be used to predict and understand aspects of newly observed data. I'll leave to the reader
the more philosophical digression regarding the extent to which this type of mathematical,
model-based "learning" is similar to the "learning" exhibited by the human brain.
Understanding the problem setting in machine learning is essential to using these tools
effectively, and so we will start with some broad categorizations of the types of approaches
we'll discuss here.

5.2.1 Categories Of Machine Leaning

At the most fundamental level, machine learning can be categorized into two main types:
supervised learning and unsupervised learning.

Supervised learning involves somehow modeling the relationship between measured
features of data and some label associated with the data; once this model is determined, it
can be used to apply labels to new, unknown data. This is further subdivided into

5.2.2 Need for Machine Learning
Human beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate and solve complex problems. On the other side, AI is still
in its initial stage and haven’t surpassed human intelligence in many aspects. Then the
question is that what is the need to make machine learn? The most suitable reason for doing
this is, “to make decisions, based on data, with efficiency and scale”.

23

Lately, organizations are investing heavily in newer technologies like Artificial
Intelligence, Machine Learning and Deep Learning to get the key information from data to
perform several real-world tasks and solve problems. We can call it data-driven decisions
taken by machines, particularly to automate the process. These data-driven decisions can
be used, instead of using programing logic, in the problems that cannot be programmed
inherently. The fact is that we can’t do without human intelligence, but other aspect is that
we all need to solve real-world problems with efficiency at a huge scale. That is why the
need for machine learning arises.
Quality of data − Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data preprocessing.

Time-Consuming task − Another challenge faced by ML models is the consumption of time
especially for data acquisition, feature extraction and retrieval.
Lack of specialist persons − As ML technology is still in its infancy stage, availability of
expert resources is a tough job.
No clear objective for formulating business problems − Having no clear objective and
well -defined goal for business problems is another key challenge for ML because this
technology is not that mature yet.
5.2.3 Applications of Machines Learning :-
Machine Learning is the most rapidly growing technology and according to researchers we
are in the golden year of AI and ML. It is used to solve many real-world complex problems
which cannot be solved with traditional approach. Following are some real-world
applications of ML −

• Emotion analysis
• Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation

24

5.2.4 How to Start Learning Machine Learning?

Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field of
study that gives computers the capability to learn without being explicitly
programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is one
of the most popular (if not the most!) career choices. According to Indeed, Machine Learning
Engineer Is The Best Job of 2019 with a 344% growth and an average base salary of $146,085
per year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start
learning it? So this article deals with the Basics of Machine Learning and also the path you
can follow to eventually become a full-fledged Machine Learning Engineer. Now let’s get
started!!!

5.2.5 How to start learning ML?

This is a rough roadmap you can follow on your way to becoming an insanely talented
Machine Learning Engineer. Of course, you can always modify the steps according to your
needs to reach your desired end-goal!
Step 1 – Understand the Prerequisites
In the case, you are a genius, you could start ML directly but normally, there are some
prerequisites that you need to know which include Linear Algebra, Multivariate Calculus,
Statistics, and Python. And if you don’t know these, never fear! You don’t need Ph.D.degree
in these topics to get started but you do need a basic understanding.

(a) Learn Linear Algebra and Multivariate Calculus

Both Linear Algebra and Multivariate Calculus are important in Machine Learning.
However, the extent to which you need them depends on your role as a data scientist. If you
are more focused on application heavy machine learning, then you will not be that heavily
focused on maths as there are many common libraries available. But if you want to focus on
R&D in Machine Learning, then mastery of Linear Algebra and Multivariate Calculus is very
important as you will have to implement many ML algorithms from scratch.

25

5.2.6 ADVANTAGES & DISADVANTAGES OF ML
Advantages of Machine learning :-
1. Easily identifies trends and patterns -

Machine Learning can review large volumes of data and discover specific trends and patterns
that would not be apparent to humans. For instance, for an e-commerce website like Amazon,
it serves to understand the browsing behaviors and purchase histories of its users to help cater
to the right products, deals, and reminders relevant to them.
2.No human intervention needed (automation)
With ML, you don’t need to babysit your project every step of the way. Since it means giving
machines the ability to learn, it lets them make predictions and also improve the algorithms
on their own. A common example of this is anti-virus softwares. they learn to filter new threats
as they are recognized. ML is also good at recognizing spam.
2. Continuous Improvement

As ML algorithms gain experience, they keep improving in accuracy and efficiency. This
lets them make better decisions. Say you need to make a weather forecast model. As the
amount of data you have keeps growing, your algorithms learn to make more accurate
predictions faster.
Disadvantages of Machine Learning :-

1. Data Acquisition

Machine Learning requires massive data sets to train on, and these should be
inclusive/unbiased, and of good quality. There can also be times where they must wait for
new data to be generated.
2. Time and Resources

ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose
with a considerable amount of accuracy and relevancy. It also needs massive resources to
function. This can mean additional requirements of computer power for you.

3. Interpretation of Results

Another major challenge is the ability to accurately interpret results generated by the
algorithms. You must also carefully choose the algorithms for your purpose.

26

5.3 PYTHON DEVELOPMENT STEPS

Guido Van Rossum published the first version of Python code (version 0.9.0) at alt.sources
in February 1991. This release included already exception handling, functions, and the core
data types of list, dict, str and others. It was also object oriented and had a module system.
Python version 1.0 was released in January 1994. The major new features included in this
release were the functional programming tools lambda, map, filter and reduce, which Guido
Van Rossum never liked.Six and a half years later in October 2000, Python 2.0
This release included list comprehensions, a full garbage collector and it was supporting
Unicode Python flourished for another 8 years in the versions 2.x before the next major release
as Python 3.0 (also known as "Python 3000" and "Py3K") was released. Python 3 is not
backwards compatible with Python 2.x. The emphasis in Python 3 had been on the removal
of duplicate programming constructs and modules, thus fulfilling or coming close to
fulfilling the 13th law of the Zen of Python: "There should be one -- and preferably only one
-- obvious way to do it. Some changes in Python 7.3:
• Print is now a function
• Views and iterators instead of lists
• The rules for ordering comparisons have been simplified. E.g. a heterogeneous list
cannot be sorted, because all the elements of a list must be comparable to each other.
• There is only one integer type left, i.e. int. long is int as well.
• The division of two integers returns a float instead of an integer. "//" can be used to have
the "old" behaviour.
• Text Vs. Data Instead Of Unicode Vs. 8-bit

Purpose :-

We demonstrated that our approach enables successful segmentation of intra-retinal
layers—even with low-quality images containing speckle noise, low contrast, and different
intensity ranges throughout—with the assistance of the ANIS feature.
Python
Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significant whitespace.

27

5.4 MODULES USED IN PROJECT

Tensor flow
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google.
Numpy
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays.
It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:
• A powerful N-dimensional array object
• Sophisticated (broadcasting) functions
• Tools for integrating C/C++ and Fortran code
Pandas
Pandas is an open-source Python Library providing high-performance data manipulation
and analysis tool using its powerful data structures. Python was majorly used for data
munging and preparation. It had very little contribution towards data analysis. Pandas
solved this problem. Using Pandas, we can accomplish five typical steps in the processing
and analysis of data, regardless of the origin of data load, prepare, manipulate, model, and
analyze. Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.
Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats and interactive environments across platforms. Matplotlib can
be used in Python scripts, the Python and IPython shells, the Jupyter Notebook, web
application servers, and four graphical user interface toolkits. Matplotlib tries to make easy
things easy and hard things possible. You can generate plots, histograms, power spectra,
bar charts, error charts, scatter plots, etc., with just a few lines of code.

28


5.5 INSTALL PYTHON STEP-BY-STEP IN WINDOWS AND MAC

Python a versatile programming language doesn’t come pre-installed on your computer
devices. Python was first released in the year 1991 and until today it is a very popular
high-level programming language. Its style philosophy emphasizes code readability
with its notable use of great whitespace.
The object-oriented approach and language construct provided by Python enables
programmers to write both clear and logical code for projects.
First, download the latest version of Python from the download page.
Second, double-click the installer file to launch the setup wizard.
In the setup window, you need to check the Add Python 3.8 to PATH and click Install Now
to begin the installation.


It’ll take a few minutes to complete the setup.

29



Once the setup completes, you’ll see the following window:


Verify the installation

30

To verify the installation, you open the Run window and type cmd and press Enter:


In the Command Prompt, type python command as follows:


If you see the output like the above screenshot, you’ve successfully installed Python on
your computer.
To exit the program, you type Ctrl-Z and press Enter.
If you see the following output from the Command Prompt after typing the python
command:

'python' is not recognized as an internal or external command,
operable program or batch file.

Likely, you didn’t check the Add Python 3.8 to PATH checkbox when you install Python.

31

Install Python on macOS
It’s recommended to install Python on macOS using an official installer. Here are the steps:
• First, download a Python release for macOS.
• Second, run the installer by double-clicking the installer file.
• Third, follow the instruction on the screen and click the Next button until the installer
completes.
Install Python on Linux
Before installing Python 3 on your Linux distribution, you check whether Python 3 was
already installed by running the following command from the terminal:
python3 --version
If you see a response with the version of Python, then your computer already has Python 3
installed. Otherwise, you can install Python 3 using a package management system.
For example, you can install Python 3.10 on Ubuntu using apt:
sudo apt install python3.10
To install the newer version, you replace 3.10 with that version.
A quick introduction to the Visual Studio Code
Visual Studio Code is a lightweight source code editor. The Visual Studio Code is often
called VS Code. The VS Code runs on your desktop. It’s available for Windows, macOS,
and Linux.
VS Code comes with many features such as IntelliSense, code editing, and extensions that
allow you to edit Python source code effectively. The best part is that the VS Code is open-
source and free.
Besides the desktop version, VS Code also has a browser version that you can use directly
in your web browser without installing it.
This tutorial teaches you how to set up Visual Studio Code for a Python environment so
that you can edit, run, and debug Python code.
Setting up Visual Studio Code
To set up the VS Code, you follow these steps:
First, navigate to the VS Code official website and download the VS code based on your
platform (Windows, macOS, or Linux).
Second, launch the setup wizard and follow the steps.
Once the installation completes, you can launch the VS code application:

32


Install Python Extension
To make the VS Code works with Python, you need to install the Python extension from
the Visual Studio Marketplace.
The following picture illustrates the steps:


• First, click the Extensions tab.
• Second, type the python extension pack keyword on the search input.
• Third, click the Python extension pack. It’ll show detailed information on the right
pane.
• Finally, click the Install button to install the Python extension.
Now, you’re ready to develop the first program in Python.

33

Creating a new Python project
First, create a new folder called helloworld.
Second, launch the VS code and open the helloworld folder.
Third, create a new app.py file and enter the following code and save the file:
print('Hello, World!')
Code language: Python (python)
The print() is a built-in function that displays a message on the screen. In this example, it’ll
show the message 'Hello, Word!'.
What is a function
When you sum two numbers, that’s a function. And when you multiply two numbers, that’s
also a function.
Each function takes your inputs, applies some rules, and returns a result.
In the above example, the print() is a function. It accepts a string and shows it on the screen.
Python has many built-in functions like the print() function to use them out of the box in
your program.
In addition, Python allows you to define your functions, which you’ll learn how to do it
later.
Executing the Python Hello World program
To execute the app.py file, you first launch the Command Prompt on Windows or Terminal
on macOS or Linux.
Then, navigate to the helloworld folder.
After that, type the following command to execute the app.py file:
python app.py
Code language: Python (python)
If you use macOS or Linux, you use python3 command instead:
python3 app.py
Code language: CSS (css)
If everything is fine, you’ll see the following message on the screen:
Hello, World!
Code language: Python (python)
If you use VS Code, you can also launch the Terminal within the VS code by:
• Accessing the menu Terminal > New Terminal
• Or using the keyboard shortcut Ctrl+Shift+`.
Typically, the backtick key (`) locates under the Esc key on the keyboard.

34



Python IDLE
Python IDLE is the Python Integration Development Environment (IDE) that comes with
the Python distribution by default.
The Python IDLE is also known as an interactive interpreter. It has many features such as:
• Code editing with syntax highlighting
• Smart indenting
• And auto-completion
In short, the Python IDLE helps you experiment with Python quickly in a trial-and-error
manner.
The following shows you step by step how to launch the Python IDLE and use it to execute
the Python code:
First, launch the Python IDLE program:

A new Python Shell window will display as follows:


Now, you can enter the Python code after the cursor >>> and press Enter to execute it.

35

For example, you can type the code print('Hello, World!') and press Enter, you’ll see the message Hello,
World! immediately on the screen:




Python Syntax
Whitespace and indentation
If you’ve been working in other programming languages such as Java, C#, or C/C++, you know
that these languages use semicolons (;) to separate the statements.
However, Python uses whitespace and indentation to construct the code structure.
The following shows a snippet of Python code:
# define main function to print out something
def main():
i = 1
max = 10
while (i < max):
print(i)
i = i + 1
# call function main
main()
The meaning of the code isn’t important to you now. Please pay attention to the code structure
instead.

36

At the end of each line, you don’t see any semicolon to terminate the statement. And the code
uses indentation to format the code.
By using indentation and whitespace to organize the code, Python code gains the following
advantages:
• First, you’ll never miss the beginning or ending code of a block like in other programming
languages such as Java or C#.
• Second, the coding style is essentially uniform. If you have to maintain another
developer’s code, that code looks the same as yours.
• Third, the code is more readable and clearer in comparison with other programming
languages.
Comments
The comments are as important as the code because they describe why a piece of code was
written.
When the Python interpreter executes the code, it ignores the comments.
In Python, a single-line comment begins with a hash (#) symbol followed by the comment. For
example:
# This is a single line comment in Python

Continuation of statements
Python uses a newline character to separate statements. It places each statement on one line.
However, a long statement can span multiple lines by using the backslash (\) character.
The following example illustrates how to use the backslash (\) character to continue a statement
in the second line:
if (a == True) and (b == False) and \
(c == True):
print("Continuation of statements")
Identifiers
Identifiers are names that identify variables, functions, modules, classes, and other objects in
Python.
The name of an identifier needs to begin with a letter or underscore (_). The following
characters can be alphanumeric or underscore.
Python identifiers are case-sensitive. For example, the counter and Counter are different
identifiers.
In addition, you cannot use Python keywords for naming identifiers.
Keywords

37

Some words have special meanings in Python. They are called keywords.
The following shows the list of keywords in Python:

False class finally is return
None continue for lambda try
True def from nonlocal while
and del global not with
as elif if or yield
assert else import pass
break except in raise

Python is a growing and evolving language. So, its keywords will keep increasing and
changing.
Python provides a special module for listing its keywords called keyword.
To find the current keyword list, you use the following code:
import keyword

print(keyword.kwlist)

String literals

Python uses single quotes ('), double quotes ("), triple single quotes (''') and triple-double quotes
(""") to denote a string literal.
The string literal need to be surrounded with the same type of quotes. For example, if you use
a single quote to start a string literal, you need to use the same single quote to end it.
The following shows some examples of string literals:
s = 'This is a string'
print(s)
s = "Another string using double quotes"
print(s)
s = ''' string can span
multiple line '''
print(s)

38

6 IMPLEMENTATIONS

6.1 SOFTWARE ENVIRONMENT

Python is a high-level, general-purpose, interpreted programming language.
1) High-level
Python is a high-level programming language that makes it easy to learn. Python doesn’t
require you to understand the details of the computer in order to develop programs efficiently.
2) General-purpose
Python is a general-purpose language. It means that you can use Python in various domains
including:
• Web applications
• Big data applications
• Testing
• Automation
• Data science, machine learning, and AI
• Desktop software
• Mobile apps
The targeted language like SQL which can be used for querying data from relational databases.
3) Interpreted
Python is an interpreted language. To develop a Python program, you write Python code into a
file called source code.

6.1.1 PYTHON

Python increases your productivity. Python allows you to solve complex problems in less time
and fewer lines of code. It’s quick to make a prototype in Python.
Python becomes a solution in many areas across industries, from web applications to data
science and machine learning.
Python is quite easy to learn in comparison with other programming languages. Python syntax
is clear and beautiful.
Python has a large ecosystem that includes lots of libraries and frameworks.
Python is cross-platform. Python programs can run on Windows, Linux, and macOS.
Python has a huge community. Whenever you get stuck, you can get help from an active
community.
Python developers are in high demand.

39


6.1.2 SAMPLE CODE

from django.shortcuts import render,redirect
from mainapp.models import *
from django.contrib import messages
from userapp.models import *
from adminapp.models import *
import pandas as pd
# Create your views here.
def user_index(request):
user_id = request.session['user_id']
user = UserModel.objects.get(user_id=user_id)
if request.method == 'POST':
source= request.POST.get("source")
to=request.POST.get('to')
airline= request.POST.get("airline")
dept_time = request.POST.get("dept_time")
stops=request.POST.get('stops')
arr_time=request.POST.get('arr_time')
print(source,to,airline,dept_time,stops,arr_time)
obj = PredModel.objects.create
(source=source,to=to,airline=airline,dept_time=dept_time,stops=stops,arr_time=arr_time)
print(obj,'kkkkkkkkkkkkkkkkkk')
return redirect("Predict",id=obj.id)

40

return render(request,'user/user-index.html')
def user_myprofile(request):
user_id = request.session['user_id']
user = UserModel.objects.get(user_id=user_id)
if request.method == 'POST':
username = request.POST.get("user_username")
userppnum=request.POST.get('user_passportnumber')
email = request.POST.get("user_email")
contact = request.POST.get("user_contact")
password = request.POST.get("user_password")
address=request.POST.get('user_address')
print(username,userppnum,email,contact,password,address)
if len(request.FILES) != 0:
image = request.FILES["user_image"]
user.user_passportnumber=userppnum
user.user_username = username
user.user_contact = contact
user.user_email=email
user.user_password = password
user.user_image = image
user.user_address=address
user.save()
messages.success(request,'Updated Successfully')
else:

41

user.user_username = username
user.user_passportnumber=userppnum
user.user_contact = contact
user.user_contact = contact
user.user_email=email
# user.user_image=image
user.user_password = password
user.user_address=address
user.save()
messages.success(request,'Updated Successfully')
return redirect('user_myprofile')
return render(request,'user/user-myprofile.html',{'user':user})
def Predict(request,id):
data = Dataset.objects.all().first()
user_data = PredModel.objects.get(pk=id)
if(user_data.source == 'Chennai'):
Chennai=1
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.source == 'Delhi'):
Chennai=0

42

Delhi=1
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.source == 'Kolkata'):
Chennai=0
Delhi=0
Kolkata=1
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.source == 'Mumbai'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=1
Cochin=0
Hyderabad=0
elif(user_data.source == 'Cochin'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0

43

Cochin=1
Hyderabad=0
elif(user_data.source == 'Hyderabad'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=1
else:
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
if(user_data.to == 'Chennai'):
Chennai=1
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.to == 'Delhi'):

44

Chennai=0
Delhi=1
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.to == 'Kolkata'):
Chennai=0
Delhi=0
Kolkata=1
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.to == 'Mumbai'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=1
Cochin=0
Hyderabad=0
elif(user_data.to == 'Cochin'):
Chennai=0
Delhi=0
Kolkata=0

45

Mumbai=0
Cochin=1
Hyderabad=0
elif(user_data.to == 'Hyderabad'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=1
else:
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
if(user_data.airline == 'Air_India'):
Air_India=1
GoAir=0
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0

46

Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=0
Vistara_Premium_economy=0
elif(user_data.airline == 'GoAir'):
Air_India=0
GoAir=1
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=0
Vistara_Premium_economy=0
elif(user_data.airline =='IndiGo'):
Air_India=0
GoAir=0
IndiGo=1
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0

47

Vistara=0
Vistara_Premium_economy=0
elif(user_data.airline == 'Vistara'):
Air_India=0
GoAir=0
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=1
Vistara_Premium_economy=0
elif(user_data.airline == 'Vistara_Premium_economy'):
Air_India=0
GoAir=0
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0

48

Vistara=0
Vistara_Premium_economy=1
else:
Air_India=0
GoAir=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=0
Vistara_Premium_economy=0
journey_day=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").day)
journey_month=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").month)
Dep_Time_hour=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").hour)
Dep_Time_min=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").minute)
Arrival_Time_hour=int(pd.to_datetime(user_data.arr_time,format="%Y-%m-
%dT%H:%M").hour)
Arrival_Time_min=int(pd.to_datetime(user_data.arr_time,format="%Y-%m-

49

%dT%H:%M").minute)
dur_hour=abs(Arrival_Time_hour-Dep_Time_hour)
dur_min=abs(Arrival_Time_min-Dep_Time_min)
lp=[journey_day,Chennai,Hyderabad,Cochin,Mumbai,Air_India,Jet_Airways,Jet_Airways
_Business,Multiple_carriers,Multiple_carriers_Premium_economy,IndiGo,Vistara_Premiu
m_economy,Vistara,
Trujet,SpiceJet,dur_hour,dur_min,int(user_data.stops),journey_month,Dep_Time_hour,De
p_Time_min,Arrival_Time_hour,Arrival_Time_min,Delhi,GoAir,Kolkata]
# output=lp
print(lp,'lllllllllllllllllllllll')
# print(Predict,'llllllllllll')
# from sklearn.ensemble import RandomForestRegressor
# reg_rf=RandomForestRegressor()
# data = Dataset.objects.get(data_id = data)
# # y=reg_rf.fit(output)
# y_pred=reg_rf.predict(lp)
# print(y_pred)
# output1=round(y_pred,2)
# # print(lp)
# id = request.session['id']
# user = PredictModel.objects.get(pk=id)
test = TestingModel.objects.create(Total_Stops=lp[17],Air_India=lp[5]
,Jet_Airways=lp[6],journey_day =lp[0],
Chennai=lp[1],Hyderabad=lp[2],Cochin=lp[3],

50


Mumbai=lp[4],Jet_Airways_Business=lp[7],Multiple_carriers=lp[8],
Multiple_carriers_Premium_economy=lp[9],
IndiGo=lp[10],Vistara_Premium_economy=lp[11],Vistara=lp[12],Trujet=lp[13]
SpiceJet=lp[14],dur_hour=lp[15],dur_min=lp[16],journey_month=lp[18],Dep_Time_hour
=lp[19],Dep_Time_min=lp[20],
Arrival_Time_hour=lp[21],Arrival_Time_min=lp[22],
Delhi=lp[23],GoAir=lp[24],Kolkata=lp[25])
print(test,'kkkkkkkkkkkkkkkkkk')
print(test.id,'jjjjjjjj')
return redirect('button',id=test.id)
# file = str(data.data_set)
# df = pd.read_csv('./media/'+ file)
# from sklearn.preprocessing import LabelEncoder
# le = LabelEncoder()
# for col in lp:
# if type(lp[col]) == 'str':
# lp[col] = le.transform(lp[col])
# return render(request,'user/user-index.html')

51

7.SYSTEM TESTING

7.1 INTRODUCTION TO TESTNG

Types of Software Testing: Different Testing Types with Details
We, as testers, are aware of the various types of Software Testing like Functional Testing, Non-
Functional Testing, Automation Testing, Agile Testing, and their sub-types, etc.
Each type of testing has its own features, advantages, and disadvantages as well. However, in
this tutorial, we have covered mostly each and every type of software testing which we usually
use in our day-to-day testing life.

Different Types of Software Testing


7.2 Testing Strategies:-
There are four main types of functional testing.
#1) Unit Testing
Unit testing is a type of software testing which is done on an individual unit or
component to test its corrections. Typically, Unit testing is done by the developer
at the application development phase. Each unit in unit testing can be viewed as
a method, function, procedure, or object. Developers often use test automation

52

For example, there is a simple calculator application. The developer can write
the unit test to check if the user can enter two numbers and get the correct sum
for addition functionality.

a) White Box Testing
White box testing is a test technique in which the internal structure or code of an
application is visible and accessible to the tester. In this technique, it is easy to
find loopholes in the design of an application or fault in business logic. Statement
coverage and decision coverage/branch coverage are examples of white box test
techniques.
b) Gorilla Testing
Gorilla testing is a test technique in which the tester and/or developer test the
module of the application thoroughly in all aspects. Gorilla testing is done to
check how robust your application is.
For example, the tester is testing the pet insurance company’s website, which
provides the service of buying an insurance policy, tag for the pet, Lifetime
membership. The tester can focus on any one module, let’s say, the insurance
policy module, and test it thoroughly with positive and negative test scenarios.

#2) Integration Testing
Integration testing is a type of software testing where two or more modules of an
application are logically grouped together and tested as a whole. The focus of this
type of testing is to find the defect on interface, communication, and data flow
among modules. Top-down or Bottom-up approach is used while integrating
modules into the whole system.

53

This type of testing is done on integrating modules of a system or between
systems. For example, a user is buying a flight ticket from any airline website.
Users can see flight details and payment information while buying a ticket, but
flight details and payment processing are two different systems. Integration
testing should be done while integrating of airline website and payment
processing system.
a) Gray box testing
As the name suggests, gray box testing is a combination of white-box testing and
black-box testing. Testers have partial knowledge of the internal structure or code
of an application.
#3) System Testing
System testing is types of testing where tester evaluates the whole system against
the specified requirements.
a) End to End Testing
It involves testing a complete application environment in a situation that mimics
real-world use, such as interacting with a database, using network
communications, or interacting with other hardware, applications, or systems if
appropriate.
For example, a tester is testing a pet insurance website. End to End testing
involves testing of buying an insurance policy, LPM, tag, adding another pet,
updating credit card information on users’ accounts, updating user address
information, receiving order confirmation emails and policy documents.
b) Black Box Testing
Blackbox testing is a software testing technique in which testing is performed
without knowing the internal structure, design, or code of a system under test.
Testers should focus only on the input and output of test objects.
Detailed information about the advantages, disadvantages, and types of Black.

54

c) Smoke Testing
Smoke testing is performed to verify that basic and critical functionality of the
system under test is working fine at a very high level.
Whenever a new build is provided by the development team, then the Software
Testing team validates the build and ensures that no major issue exists. The testing
team will ensure that the build is stable, and a detailed level of testing will be
carried out further.
#4) Acceptance Testing
Acceptance testing is a type of testing where client/business/customer test the
software with real time business scenarios.
The client accepts the software only when all the features and functionalities work
as expected. This is the last phase of testing, after which the software goes into
production. This is also called User Acceptance Testing (UAT).
a) Alpha Testing
Alpha testing is a type of acceptance testing performed by the team in an
organization to find as many defects as possible before releasing software to
customers.
For example, the pet insurance website is under UAT. UAT team will run real-
time scenarios like buying an insurance policy, buying annual membership,
changing the address, ownership transfer of the pet in a same way the user uses
the real website. The team can use test credit card information to process
payment-related scenarios.
b) Beta Testing
Beta Testing is a type of software testing which is carried out by the
clients/customers. It is performed in the Real Environment before releasing the
product to the market for the actual end-users.
Beta Testing is carried out to ensure that there are no major failures in the software

55

or product, and it satisfies the business requirements from an end-user
perspective. Beta Testing is successful when the customer accepts the software.
Usually, this testing is typically done by the end-users. This is the final testing
done before releasing the application for commercial purposes. Usually, the Beta
version of the software or product released is limited to a certain number of users
in a specific area.
So, the end-user uses the software and shares the feedback with the company. The
company then takes necessary action before releasing the software worldwide.
c) Operational acceptance testing (OAT)
Operational acceptance testing of the system is performed by operations or system
administration staff in the production environment. The purpose of operational
acceptance testing is to make sure that the system administrators can keep the
system working properly for the users in a real-time environment.
The focus of the OAT is on the following points:
• Testing of backup and restore.
• Installing, uninstalling, upgrading software.
• The recovery process in case of natural disaster.
• User management.
• Maintenance of the software.
Non-Functional Testing
There are four main types of functional testing.
#1) Security Testing
It is a type of testing performed by a special team. Any hacking method can
penetrate the system.
Security Testing is done to check how the software, application, or website is
secure from internal and/or external threats. This testing includes how much

56

software is secure from malicious programs, viruses and how secure & strong the
authorization and authentication processes are.It also checks how software
behaves for any hacker’s attack & malicious programs and how software.
a) Penetration Testing
Penetration Testing or Pen testing is the type of security testing performed as an
authorized cyberattack on the system to find out the weak points of the system in
terms of security.
Pen testing is performed by outside contractors, generally known as ethical
hackers. That is why it is also known as ethical hacking. Contractors perform
different operations like SQL injection, URL manipulation, Privilege Elevation,
session expiry, and provide reports to the organization.
Notes: Do not perform the Pen testing on your laptop/computer. Always take
written permission to do pen tests.
#2) Performance Testing
Performance testing is testing of an application’s stability and response time by
applying load.
The word stability means the ability of the application to withstand in the presence
of load. Response time is how quickly an application is available to users.
Performance testing is done with the help of tools. Loader.IO, JMeter,
LoadRunner, etc. are good tools available in the market.
a) Load testing
Load testing is testing of an application’s stability and response time by applying
load, which is equal to or less than the designed number of users for an
application.
b) Stress Testing
Stress testing is testing an application’s stability and response time by applying
load, which is more than the designed number of users for an application.

57

8.SCREENSHOTS

a. Home Page:-

Fig-8.1
b. Admin :-


Fig-8.2

58

c. Register Page

Fig-8.3

59

d. User Page:-

Fig-8.4

60

e. Contact Information:-

Fig-8.5

61

f. Dashboard:-



Fig-8.6

62

g. Uploading Data :-


Fig-8.7



h. Data Formed:-




Fig-8.8

63

i. Graph Analysis:-


Fig-8.9


j. About Page:-


Fig-8.10

64


k. About Page Success:-



Fig-8.11

65

l. Predicted Page:-



Fig-8.12

66

m. User Profile:-


Fig-8.13

67

9.CONCLUSION
To estimate the dynamic fare of flights, three different datasets from three different sources
have been used. Many insights have been found while visualizing the dataset. Seven
different machine learning algorithms have been used to build the model. Only limited
information can be obtained because data is acquired from websites that sell flight tickets.
The correctness of the model is determined by the evaluation metrics table I values
obtained from the procedure. The Random Forest Regressor outperformed the other
algorithms with good accuracy. So, Random Forest Regressor works fine for predicting
the airline fare price. If more data, such as actual seat availability, could be obtained in
the future, the anticipated results would be more accurate. Prediction-based services are
currently employed in a variety of sectors, including stock price predictor programs used
by stock brokers and services like Zestimate, which provides an estimate of housing
values. As a result, in the aviation business, a service like this is required to assist clients
in reserving tickets. There have been numerous studies conducted on this topic using
various methodologies, and additional research is required to increase the accuracy of
prediction utilizing various algorithms. To acquire more reliable findings, more accurate
data with greater features might be employed.

68

10 REFERENCES

[1] T. Janssen, "A linear quantile mized regression model for prediction of airline ticket
prices," in A Treatise on Electricity and Magnetism 3rd ed., vol. 2, 2014, pp. 68- 73.
[2] Yiwei Chen and F. Vivek Farias, " Robust Dynamic Pricing With Strategic Customers,"
Mathematics of Operations Research 43, pp. 1119-1142, 2018.
[3] Juhar Ahmed Abdella, Nazar Zaki, Khaled Shuaib and Fahad Khan, "Airline ticket price
and demand prediction: A survey.," Journal od King Saud University - Computer and
Information Sciences, vol. 33, no. 4, pp. 375-391, 2021.
[4] Lantseva, Anastasia, Mukhina, Ksenia, Nikishova, Anna, Ivanov, Sergey, Knyazkov and
Konstantin, "Data-driven Modeling of Airlines Pricing," Procedia Computer Science, vol.
66, pp. 267-276, 2015.
[5] K. Tziridis, T. Kalampokas, G. A. Papakostas and K. I. Diamantaras, "Airfare prices
predictiono using machine learning techniques," in 25th European Signal Processing
Conference (EUSIPCO). Kos 2017, 2017.
[6] A. Boruah, K. Baruah, B. Das, M. Das and N. Gohain, "A Bayesian Approach for Flight
Fare Prediction Based on Kalman Filter," in Progress in Advanced Computing and
Intelligent Engineering, Singapore, 2019, pp. 191-203.
[7] William Groves and Maria Gini, "A regression model for predicting optimal purchase
timing for airline tickets.," Technical report, University of Minnesota, Minneapolis, USA,
Report number 11-025, 2011.
[8] D. Tanouz, R. R. Subramanian, D. Eswar, G. V. P. Reddy, A. R. Kumar and C. V. N. M.
Praneeth, "Credit Card Fraud Detection Using Machine Learning," in 5th International
Conference on Intelligent Computing and Control Systems (ICICCS), 2021.
[9] R. R. Subramanian, N. Akshith, G. N. Murthy, M. Vikas, S. Amara and K. Balaji, "A
Survey on Sentiment Analysis," in 11th International Conference on Cloud Computing,
Data Science & Engineering (Confluence), 2021, 2021.
[10] S. Amara and R. R. Subramanian, " Collaborating personalized recommender system
and content-based recommender system using TextCorpus," in 6th International
Conference on Advanced Computing and Communication Systems (ICACCS),
Coimbatore, India, 2020.
[11] Andi and Hari Kirshnan, "An Accurate Bitcoin Price Prediction using logistic regression
with LSTM Machine Learning model," Journal of Soft Computing Paradigm 3, pp. 205-
217, 2021.

69

[12] Manoharan and J. Samuel, "Study of Variants of Extreme Learning Machine (ELM)
Brands and its Performance Measure on Classification Algorithm," Journal of Soft
Computing Paradigm (JSCP) 3, pp. 83- 95, 2021.
[13] V. Suma and Shavige Malleshwara Hills, "Data Mining based Prediction of Demand in
Indian Market for Refurbished Electronics," Journal of Soft Computing Paradigm (JSCP)
2, pp. 101-110, 2020.
[14] W. K. Michael and A. G. Thomas, "A Framework for the Evaluation of Statistical
Prediction Models," CHEST, vol. 158, no. 1, pp. S29-S38, 2020.
[15] L. Yuling and L. Zhichao, "Design and implementation of ticket price forecasting
system," in AIP Conference Proceedings, 2018.
[16] Elizaveta Stavinova, Petr Chunaev and Klavdiya Bochenina, "Forecasting railway ticket
dynamic price with Google Trends open data," Procedia Computer Science, vol. 193, pp.
333-342, 2021.
[17] S. Deepa, A. Alli, Sheetac and S. Gokila, "Machine learning regression model for
material synthesis prices prediction in agriculture," in materialstoday, 2021.
[18] S. Matthew and Lewis, "Identifying airline price discrimination and the effect of
competition," International Journal of Industrial Organization, vol. 78, 2021.
[19] Ismail Koc and Emel Arslan, "Dynamic ticket pricing of airlines using variant batch size
interpretable multivariable long short-term memory," Expert Systems with Applications,
vol. 175, 2021.
[20] Rian Mehta, Stephen Rice, John Deaton and Scott R. Winter, "Creating a prediction
model of passenger preference between low cost and legacy airlines," Transportation
Research Interdisciplinary Perspectives, vol. 3, 2019.