A tool for Detecting Source Code Plagarism-SourcePlag

ntu727 15 views 9 slides Mar 02, 2025
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

Paper Accepted At ICBDS-2024.


Slide Content

SourcePlag A Source Code Plagiarism Detector 2024 IEEE International Conference on Blockchain and Distributed System (IEEE ICBDS 2024) Nakul Sharma, Siddharth Shinde, Swarup Bhosale, Suyog Patil Vishwakarma Institute of Information Technology

Abstract Code plagiarism poses a significant challenge in programming communities, necessitating effective detection mechanisms. This paper introduces a novel system that employs Abstract Syntax Trees (ASTs) for code representation and comparison. The system utilizes ASTs to capture the structural essence of code, facilitating a comprehensive analysis of code similarity, it offers flexibility for application across multiple programming languages. In addition to leveraging the Levenshtein Distance Algorithm for Python code comparison, the system incorporates node counting for other languages such as Java and C/C++. By integrating AST-based representation and a combination of comparison techniques, the system offers a robust solution for identifying plagiarized code instances accurately across various programming environments. Through a detailed exploration of the system's methodology, this paper underscores its potential to address the pervasive issue of code plagiarism in programming communities .

Motivation: Why is the problem of interest? Prior studies’ historical context to your research An overview of the work, results, and contributions How the article is organized Introduction Importance of Academic Integrity : Ensuring originality in software is crucial in both academic and professional environments. Limitations of Traditional Methods : Basic textual comparisons are often insufficient for detecting source code plagiarism, especially with code modifications like renaming variables or altering formatting. Role of Abstract Syntax Trees (ASTs) : ASTs provide a structural representation of the code, capturing its logical flow beyond mere text. Plagiarism Detection Using ASTs : AST-based analysis helps detect plagiarism attempts, even when obfuscation techniques are used, making it more reliable. Objective : This paper presents a robust method to detect source code plagiarism by leveraging ASTs, offering a more effective solution than text-based methods.

Motivation: Why is the problem of interest? Prior studies’ historical context to your research An overview of the work, results, and contributions How the article is organized Related work Winnowing Algorithm Based Models: The approach segments data into N-grams, selects the least costly hash per segment as a fingerprint, and hashes it to create unique identifiers. Techniques like MOSS use cosine similarity and other methods to enhance plagiarism detection accuracy. Abstract Syntax Tree Based Models: AST-based models for plagiarism detection include DECKARD, which uses Euclidean distance and LSH for efficient code comparison, and Greenan’s AST-based exact matching with the Smith Waterman Algorithm. Chilowicz’s tool combines hashing and AST, utilizing cryptographic hash functions for subtree matching. CodEx uses AST and hashing to measure node contributions with a Weight-Based Depth First Search, generating similarity scores. These models enhance plagiarism detection efficiency.

Methodology Overview for Source Code Plagiarism Detection Input: The system accepts source code files written in Python, Java, or C++. Preprocessing: Comments and unnecessary whitespace are removed from the source code to standardize the input across languages. AST Generation: The Abstract Syntax Tree (AST) is generated from the preprocessed code to convert it into a structured form for further analysis. Similarity Analysis: The system takes two approaches based on the programming language: For  Python : It applies the  Levenshtein Distance  algorithm to calculate the similarity between code sequences. For  Java/C++ : It uses  Node Counting  within the AST to assess structural similarity. Similarity Score & Report Generation: The results from both Python and Java/C++ analyses are combined to generate a final similarity score and plagiarism report.

Results/discussion show Experiments & Analysis: We tested the system on a dataset of source code files across  Python ,  Java , and  C++ . The dataset included pairs of source code samples with known levels of similarity, ranging from identical copies to functionally similar but structurally different code. We evaluated the effectiveness of the system using  Levenshtein Distance  for Python and  Node Counting  for Java/C++.

Results/discussion show Discussion: Interpretation of Results The  Levenshtein Distance  for Python proved effective in detecting plagiarized code even with minor changes such as variable renaming or formatting differences. Node Counting  for Java/C++ was particularly robust in identifying structural similarities, such as function or class rearrangements, making it an excellent fit for these languages. Overall, the system performed well across different programming languages and code structures, demonstrating its potential to accurately detect source code plagiarism in varied scenarios.

Conclusion: An online assignment plagiarism checker is a crucial resource for preserving the integrity of education. It empowers both educators and students to uphold the values of originality and honesty. By discouraging plagiarism, it fosters a deeper engagement with learning, ensuring that academic assessments are a true reflection of students' knowledge and skills. This tool not only supports the academic community but also helps educational institutions maintain their reputation for excellence and ethical scholarship. Future Scope: Improving and developing a plagiarism detector for source code In real world scenarios, software projects often involve multiple languages or components written in different languages. By supporting multiple languages, the tool can accommodate the complexities of modern software development practices and provide practical solutions for plagiarism detection in heterogeneous codebases. Conclusion and Future Scope

References Tip: Design_and_Implementation_of_Code_Plagiarism_Detection_System. International Seminar on Artificial Intelligence 979-8-3503-1452-6©2023 IEEE Implementing Knuth-Morris-Pratt Algorithm in Detecting The Plagiarism of Document. International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE) | 979-8-3503-2272-9/©2023 IEEE Intelligent Plagiarism Detection Mechanism using Semantic technology: A Different Approach| SIT, Jawaharlal Nehru Technological University, Hyderabad| 978-1-4673-6217-7/©2013 IEEE Plagiarism Detection in Programming Assignments Using Deep Features Jitendra Yasaswi, Suresh Purini, C. V. Jawahar IIIT Hyderabad, India 2327-0985/17© 2017 IEEE Plagiarism Detection and Prevention Techniques In Engineering Education, University of Southampton, Southampton, UK 978-1-4673-8584-8/16 ©2016 IEEE Applying Plagiarism Detection to Engineering Education ,School of Electrical and Information Engineering University of Sydney, 1-4244-0406-1/06/©2006 IEEE. Plagiarism Detection in Computer Programming Using Feature Extraction From Ultra-Fine-Grained Repositories ,VEDRAN LJUBOVIC AND ENIL PAJIC Faculty of Electrical Engineering, University of Sarajevo, Sarajevo 71000, Bosnia and Herzegovina ©2011 IEEE Plagiarism Detection on Electronic Text based Assignments using Vector Space Model MAC Jiffriya MAC Akmal Jahan Post Graduate Institute of Science University of Peradeniya,Sri Lanka. 978-1-4799-4598-6/14/ ©2014 IEEE