In this project presentation, we explore the application of machine learning techniques to detect and predict crime hotspots. By analyzing historical crime data, we aim to identify patterns and trends that can help law enforcement agencies allocate resources more efficiently and proactively address ...
In this project presentation, we explore the application of machine learning techniques to detect and predict crime hotspots. By analyzing historical crime data, we aim to identify patterns and trends that can help law enforcement agencies allocate resources more efficiently and proactively address crime-prone areas. Key components of the project include data preprocessing, feature engineering, model selection, and evaluation. The presentation will also cover the implementation of visualization tools to highlight crime hotspots on a map, making the findings easily interpretable for stakeholders. This project demonstrates the potential of data science to enhance public safety and support informed decision-making in crime prevention efforts. for more information visit: https://bostoninstituteofanalytics.org/cyber-security-and-ethical-hacking/
Size: 10.73 MB
Language: en
Added: Jun 28, 2024
Slides: 13 pages
Slide Content
CRIME HOTSPOT DETECTION Presented by Aaron Lewis J
TABLE CONTENT 01 PROBLEM IDENTIFICATION 02 APPROACH TO SOLVE PROBLEM 03 DATASET AND LIBRARIES 04 EDA(EXPLATORY DATA ANALYSIS) 05 TRAIN TEST SPLIT 06 BUILDING CLASSIFICATION MODEL 07 RESULTS AND CONCLUSION
PROBLEM IDENTIFICATION To develop a machine learning model capable of accurately identifying crime hotspots within Los Angeles. By harnessing LAPD crime data, the model will provide actionable insights, enabling the development of targeted security recommendations for clients operating in high-risk areas. Through this endeavor, the project aims to strengthen community safety initiatives, foster collaboration with local law enforcement, and uphold their commitment to data-driven solutions for public safety.
APPROACH TO SOLVE THE PROBLEM Comparison and Conclusion . Evaluation . Modeling . Model Selection . Collect & Pre processing Data .
Dataset and Libraries "DATE OCC": "MM/DD/YYYY", "TIME OCC": "In 24 hour military time.", "AREA": "The LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21.", "AREA NAME": "The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles.", " Rpt Dist No": "A four-digit code that represents a sub-area within a Geographic Area. ", "Part 1-2": “Indicates the type of crime", " Crm Cd": "Indicates the crime committed. (Same as Crime Code 1)", " Vict Age": "Two character numeric", " Vict Sex": "F - Female M - Male X - Unknown", " Vict Descent": "Descent Code: A - Other Asian B - Black C - Chinese D - Cambodian F - Filipino G - Guamanian H - Hispanic/Latin/Mexican I - American Indian/Alaskan Native J - Japanese K - Korean L - Laotian O - Other P - Pacific Islander S - Samoan U - Hawaiian V - Vietnamese W - White X - Unknown Z - Asian Indian", " Premis Cd": "The type of structure, vehicle, or location where the crime took place.", " Premis Desc": "Defines the Premise Code provided.", "Weapon Used Cd": "The type of weapon used in the crime. ", "Weapon Desc": "Defines the Weapon Used Code provided.", "LOCATION": "Street address of crime incident rounded to the nearest hundred block to maintain anonymity.", "LAT": "Latitude", "LON": " Longtitude "
Dataset and Libraries Libraries: Pandas: To Process the data as the data was in CSV format Matplotlib and Seaborn : It is commonly used for data visualization and creating various types of charts and plots Scikit-learn: Scikit-Learn, also known as Sklearn is a python library to implement machine learning models and statistical modelling
EXPLORATORY DATA ANALYSIS FUNCTION OPERATIONS df=pd.read_csv(“”) Importing our dataset into Data frame and storing in df ( i.e variable) (pd refers to pandas). df.head(), df.tail() To Display the first 5 Rows and last 5 Rows . df.shape() array dimensions that tells the number of rows and columns of a given Data Frame. df.info() Display columns ,datatypes, non-null count and memory usage df.describe() Provides summary statistics of data like mean, median, minimum, maximum and more df.isnull().sum() Check the Total missing /null values. df.duplicated().sum() Check the duplicate values. LabelEncoder() Replace Categorial Value to Numerical StandardScaler() Scales your data into equal range sns.histplot() Display distribution of your continuous dataset sns.boxplot() To identify Outliers sns.countplot() It count of the number of records by category
Building Classification Model We have used 3 Algorithm to find out the best accuracy according to our variables: Decision Tree A decision tree is a machine learning model used for classification and regression. It splits data into branches based on feature values, creating a tree-like structure. It's easy to interpret, but can overfit, requiring techniques like pruning to optimize. Naïve Bayes Naive Bayes is a probabilistic machine learning model used for classification. It assumes feature independence and applies Bayes' theorem. It's efficient, particularly for large datasets, and performs well with text data like spam detection, despite its naive assumptions. KNN The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning algorithm that uses a distance-based approach to classify or predict the grouping of a data point.
Decision Tree . . Importing Algorithm And training the model Classification report
Naïve Bayes . . Importing Algorithm And training the model Classification report
KNN . . Importing Algorithm And training the model Classification report
Results and Conclusion MODEL ACCURACY MODEL FIT Decision Tree 1 Overfit Naïve Bayes 0.71 Partially overfit KNN 0.86 Good fit