Data Representation of Strings

profansari 2,494 views 3 slides May 09, 2017
Slide 1
Slide 1 of 3
Slide 1
1
Slide 2
2
Slide 3
3

About This Presentation

Character String Operations
A close analysis of the essential string-handling facilities required of any text creation and editing system (formal or otherwise) should lead to the following list of primitive functions:
 Create a string of test
Concatenate two strings to form another string
�...


Slide Content

For More Visit: Https://www.ThesisScientist.com


Unit 7
Text and String Processing
Data Representation of Strings
In deciding on a data representation for a given data object one must take into consideration the cost of
performing different operations using that representation. In addition, a hidden cost resulting from the
necessary storage management operations must also be taken into account.
Strings are stored in three types of structures:
1) Fixed length structure
2) Variable length structure
3) Linked structure
Let us look at the first two of these schemes:
Sequential Fixed length Structure
In this representation successive characters of a string will be placed in consecutive character positions. The
string S = 'x1---xn' could then be represented as in Figure 7.1 with s as a pointer to the first character. x
1
x
2
----------x
n
C
i
C
i+1
C
i+n-1
array
x
1
x
2
----------x
n
x
1
x
2
----------x
n
C
i
C
i+1
C
i+n-1
array

Figure 7.1: Sequential Representation of S = 'x1----xn'
Now, if we want to pick a substring of size k from the string of size n, the time required to achieve this
would be O(k) plus the time needed to locate a free space big enough to hold the string.
Linked List Fixed Size Nodes
The available memory is divided into nodes of fixed size. Each node has two fields: Data and Link. The
size of a node is number of characters that can be stored in the DATA fields.

For More Visit: Https://www.ThesisScientist.com

1 2 3 4 5 6 7 8 9 10 11 12
data link data linked
Node 1 Node 1
1 2 3 4 5 6 7 8 9 10 11 12
data link data linked
Node 1 Node 1

Figure 7.2

In the above figure memory is divided into nodes of size 4 with a link field that is two characters long.
Deletion of a substring can be carried out by replacing all characters in this substring by 0 and freeing
nodes in which the data fields consist of only 0's.
Storage compaction can be carried out when there are no free nodes. String representation with variable
size is similar.
Character String Operations
A close analysis of the essential string-handling facilities required of any text creation and editing system
(formal or otherwise) should lead to the following list of primitive functions:
 Create a string of test
 Concatenate two strings to form another string
 Search and replace (if desired) a given substring within a string
 Test for the identity of a string
 Compute the length of a string
Pattern Matching
When one is searching for a substring within a given string there must be some method of returning the
position of the substring within the string. If the substring is found, this position is indicated by an integer
value indicating the character position of the left-most character of the substring being sought. Function
FIND (s, PAT, i) returns i as a value the cursor position of the left-most occurrence of the string PAT, in the
string s, if PAT does not occur in s, the value 0 is returned.
Given two string S and PAT, the value of PAT as a pattern to be searched for in s. If it occurs, then we want
to know the node in s where PAT begins.
Analysis of Algorithm
The algorithm is not very efficient.
Let S = 'aaa ... a'; PAT = 'aaa...ab'
where length (S) = m, and length (PAT) = n.

For More Visit: Https://www.ThesisScientist.com


If m is much larger than n, then the first n-1 letters of PAT will match with 'a's in string S but the nth letter
of PAT will not. The pointer p will be moved to the second occurrence of 'a' in S and n-1 a's of PAT will
match with S again.
Proceeding in this way we see there will be m-n+1 times that S and PAT have n-1 'a's in common.
Therefore the algorithm Find will require at least (m-n+1) (n-1) = O(mn) operations. This makes the cost of
Find proportional to the product of the length of the two lists or quadratic rather than linear.
Improvements
There are several improvements that can be made
i) To avoid the situation where length (PAT) is greater than the remaining length of S but the
algorithm is still searching for a match.
ii) To check that the first and last characters of PAT matched in S before checking the remaining
characters of PAT.