design and analysis of algorithm (Longest common subsequence)

RoneekPatel 22 views 14 slides Sep 06, 2024
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

To design an algorithm to determine the longest subsequence that is common to two given sequences. Given two sequences, sequence A and sequence B, the task is to identify the longest sequence of elements that appear in the same relative order in both A and B. The elements of the sequences may be cha...


Slide Content

LONGEST COMMON SUBSEQUENCE Presented by 1.Himanshu Sharma(RA2211031010146) 2.Pratyush Golwala (RA2211031010155) 3.Roneek Patel(RA2211031010154)

Problem Statement To design an algorithm to determine the longest subsequence that is common to two given sequences. Given two sequences, sequence A and sequence B, the task is to identify the longest sequence of elements that appear in the same relative order in both A and B. The elements of the sequences may be characters, numbers, or any comparable entities. The algorithm should efficiently compute the length of the longest common subsequence and optionally provide the actual subsequence itself. The output of the algorithm should enable users to understand the degree of similarity and the shared elements between the two sequences. Ensure that the algorithm is scalable and can handle sequences of varying lengths and contents.

Introduction The problem of finding the longest subsequence that is common to two given sequences, known as the Longest Common Subsequence (LCS), is a fundamental challenge in computer science with wide-ranging applications. 01. 02. 03. Versatility Across Domains: LCS ( Longest Common Subsequence) algorithms are versatile tools with applications in diverse fields, including bioinformatics, text comparison, plagiarism detection, and version control systems. Their ability to identify the maximum-length sequence of elements in the same order within two sequences, even if not consecutive, makes them valuable for solving various problems across different domains. Bioinformatics Applications: In bioinformatics, LCS algorithms play a crucial role in the analysis of biological sequences, such as DNA, RNA, and protein sequences. They are extensively used for tasks like sequence alignment, genome assembly, and evolutionary analysis. By identifying common subsequences and differences between genetic sequences, LCS algorithms contribute to understanding genetic structures and functions, aiding researchers in deciphering the complexities of biological data. Contributions to Knowledge and Innovation: The use of LCS algorithms in different domains contributes to advancements in knowledge and innovation. In bioinformatics, these algorithms help uncover relationships between biological sequences, leading to insights into genetic evolution and function. Similarly, in text comparison, plagiarism detection, and version control systems, LCS algorithms enhance the efficiency of content analysis and version tracking, fostering innovation in information management and software development processes.

Objective: Use r-Friendly Interface: Develop an intuitive and user-friendly interface for inputting sequences, configuring algorithm parameters, and visualizing the results. Ensure accessibility for users with varying levels of technical expertise, facilitating seamless integration into research workflows, educational environments, or practical applications. Alg orithm Development: Design and implement an efficient LCS algorithm capable of identifying the longest common subsequence between two given sequences, considering the non-consecutive nature of elements. Optimize the algorithm for performance, ensuring scalability to handle large datasets, and evaluate its time and space complexity. Cross-D omain Application: Extend the LCS algorithm's utility beyond bioinformatics by incorporating features for text comparison, plagiarism detection, or version control systems. Showcase the adaptability of the algorithm in different contexts, demonstrating its effectiveness in addressing problems related to content analysis, intellectual property protection, and software version tracking. Perf ormance Evaluation: Conduct rigorous testing and performance evaluation of the LCS algorithm in diverse scenarios, considering varying sequence lengths, types of data, and real-world use cases. Provide comprehensive documentation and benchmarks to showcase the algorithm's robustness, efficiency, and versatility across different application domains.

Comparison Of different Algorithms: ●The brute force algorithm is the most basic method for solving the LCS problem. ●It involves generating all possible subsequences of the two input sequences and checking if they are common to both sequences. ●The brute force algorithm has a time complexity of O(2^n) and a space complexity of O(n), where n is the length of the longer sequence. ●Although the brute force algorithm is simple to implement, it is not efficient for large input sizes. 1.Brute Force Algorithm

2.Divide and Conquer: ●The divide and conquer algorithm for LCS involves dividing the input sequences into smaller subproblems, solving these subproblems recursively, and then combining the solutions to obtain the LCS of the original sequences. ●The divide and conquer algorithm has a time complexity of O(n^2logn) and a space complexity of O(n^2). ●Although the divide and conquer algorithm is not as efficient as the dynamic programming algorithm, it is useful for solving other problems that involve finding common subsequences

3.Dynamic Programming ●The dynamic programming algorithm for LCS is a more efficient method for solving the problem. ● It involves breaking down the input sequences into smaller subproblems and storing the solutions to these subproblems in a table. ● The dynamic programming algorithm has a time complexity of O(mn), where m and n are the lengths of the input sequences, and a space complexity of O(mn). ● The dynamic programming algorithm is widely used for solving the LCS problem due to its efficiency.

Pseudocode Dynamic Programming: def Ics(string1, string2): if len (string1) == 0 or len(string2) == 0: return "" elif string1[-1] == string2[-1]: return Ics(string1[:-1], string2[:-1]) + string1[-1] else: Ics1 = Ics(string1[:-1], string2) Ics2 = Ics(string1, string2[:-1]) if len (Ics1) > len(Ics2): return Ics1 else: return Ics2 Divide and Conquer: def Ics (string1, string2): m = len (string1) n = len (string2) # initialize table with zeros table = [[0]*(n+1) for i in range(m+1)] for i in range(1, m+1): for j in range(1, n+1): if string1[i-1] == string2[j-1]: table[ i ][j] = table[i-1][j-1]+1 else: table[ i ][j] = max(table[i-1][j], table[ i ][j-1]) #backtrack to find LCS Ics = "" while i > 0 and j > 0: i = m j = n if string1 [i-1] == string2[j-1]: Ics = string1 [i-1] + Ics i -=1 elif table[i-1][j] > table[ i ][j-1]: i-1 else: j-=1 j-=1 return Ics Brute Force: def Ics(string1, string2): if len(string1) == 0 or len(string2) == 0: return "" elif string1[-1] == string2[-1]: return Ics(string1[:-1], string2[:-1]) + string1[-1] else:

Source Code: Brute Force Algorithm:

Source Code: Dynamic Programming:

Source Code: Divide and Conquer:

Time Complexity There are two choices for each character in a string: it can either be included in the subsequence or not. This means there are 2^m possible subsequences of X and 2^n possible subsequences of Y. So in this recursive approach, we are comparing each subsequence of X with each subsequence of Y. So the overall time complexity is O(2^m * 2^n) = O(2^(m + n)), which is inefficient for large values of m and n. The space complexity of this solution depends on the size of the call stack, which is equal to the height of the recursion tree. In this case, input parameters (m and n) are at most decreasing by 1 on each recursive call and terminate when either m or n becomes 0. The height of the recursion tree in the worst case will be O(max(m, n) and space complexity = O(max(m, n)). Space Complexity Brute Force Algorithm

Time Complexity In the worst case, we will be solving each subproblem only once, and there are (m + 1) x (n + 1) different subproblems. So, the time complexity is O(mn). We are using a 2D table of size (m + 1) x (n + 1) to store the solutions of the subproblems. So the space complexity is O(mn). Space Complexity Dynamic Programming

Conclusion: 1.We discussed three different algorithms for solving the LCS problem: brute force, dynamic programming, and divide and conquer. 2.We compared the time and space complexity of these algorithms and discussed their pros and cons. 3. We also highlighted the numerous applications of LCS in computer science and industry. 4.LCS is a fundamental problem in computer science, and efficient algorithms for solving it are crucial for many applications.