04_KM_Bio_State Space Search_about DNA and rna.pdf

someyamohsen2 19 views 38 slides May 27, 2024
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

This is about kernel bioinformatics


Slide Content

20
21
Dr. Mina Younan
Collected and Edited by:
Dr. Mina Younan
Lecturer of Computer Science,
Faculty of Computers and Information, Minia University
Kernel Methods in
Bioinformatics (CS711)
Lecture 04: State Space Search

20
21
Dr. Mina Younan
Outline of this lecture
•Introduction
•Structures for State Space Search
•Graph Theory Basics
•Data-driven and Goal-driven Search
oBacktrack
oBreadth-First Search Algorithm
oDepth-First Search Algorithm
•Local Search Algorithms (Travel Sales man Problem)
•A brief Summary on Search Algorithm
2PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
Introduction
Search Terminology
•Problem Space − It is the environment in which the search takes place. (A set of
states and set of operators to change those states)
•Problem Instance − It is Initial state + Goal state.
•Problem Space Graph − It represents problem state. States are shown by nodes
and operators are shown by edges.
•Depth of a problem − Length of a shortest path or shortest sequence of
operators from Initial State to goal state.
•Space Complexity − The maximum number of nodes that are stored in memory.
•Time Complexity − The maximum number of nodes that are created.
•Admissibility − A property of an algorithm to always find an optimal solution.
•Branching Factor − The average number of child nodes in the problem space
graph.
•Depth − Length of the shortest path from initial state to goal state.
Adv. Topics in AI 3

20
21
Dr. Mina Younan
Introduction
Brute-Force Search Strategies
•They are most simple, as they do not need any domain-specific knowledge.
They work fine with small number of possible states.
•Requirements:
—State description
—A set of valid operators
—Initial state
—Goal state description
Adv. Topics in AI 4

20
21
Dr. Mina Younan
5
Introduction
•Well-formed predicate calculus expressions provide a means of
describing objects and relations in a problem domain, and
inference rules such as modus ponens allow us to infer new
knowledge from these descriptions.
•These inferences define a space that is searched to find a
problem solution.
•By representing a problem as a state space graph, we can use
graph theory to analyze the structure and complexity of both the
problem and the search procedures that we employ to solve it.
•This lecture introduces the theory of state space search.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
6
Structures for State Space Search
Graph Theory Basics:
A graph G(N, A) consists of two sets:
N: a set of nodes N
1, N
2, …, N
n, …, which need not be finite;
A: a set of arcs that connect pairs of nodes.
The directed arc connecting nodes N
3 and N
4 is represented by the ordered pair (N
3, N
4).
The undirected arc connecting nodes N
3 and N
4 is represented by two ordered pairs (N
3,
N
4) and (N
4, N
3)
Terms used to describe relationships between nodes include parent, child, and sibling.
These are used in the usual familial fashion with the parent preceding its child along a
directed arc. The children of a node are called siblings.
A rooted graph has a unique node, called
the root, from which all paths in the graph
originate. That is the root has no parent in the graph.
N
3
N
4 N
3
N
4
N
3 N
4
N
3
N
4 N
5
Parent
Child
Child
Siblings
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
7
A tree is a graph with the following properties:
•There is a unique path between every pair of nodes.
•Each node has a unique parent.
•The paths contain no cycles.
The edges in a rooted tree are directed away from the root.
a
g
b c
d e f
A rooted tree
- a is the root
- b is the parent of d, e, and f
- d, e, and f are children of b,
and siblings of each other
- a and c are ancestors of g
- g is a descendant of a and c.
Graph Theory Basics
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
8
Data-driven and Goal-driven Search
A state space may be searched in two directions:
•from the given data of a problem instance toward a goal, or
•from the goal back to the data.
In data-driven search, sometimes called forward chaining:
•The problem solver begins with the given facts of the problem and a set of
legal moves or rules for changing state.
•Search proceeds by applying rules to facts to produce new facts, which are
in turn used by the rules to generate more new facts.
•This process continues until it generates a path that satisfies the goal.
In goal driven search, sometimes called backward chaining:
•The problem solver takes the goal, and finds the rules or legal moves that
could produce this goal, and determines the conditions that must be true to
use them.
•These conditions become the new goals, or sub-goals, for the search.
•Search continues, working backward through successive sub-goals until it
works back to the facts of the problem.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
9
Data-driven and Goal-driven Search
•The preferred strategy is determined by careful analysis of the problem,
considering such issues as:
1.The complexity of the rules.
2.The branching factor of rule application (on average how many new
states are generated by rule application in both directions?), i.e. the shape
of the state space.
3.The availability of data.
4.Ease of determining the potential goals.
•In solving a problem using either goal- or data-driven search, a problem solver
must find a path from a start state to a goal through the state space graph.
•The sequence of arcs in this path corresponds to the ordered steps of the solution.
•The problem solver must consider different paths through the space until it finds
a goal.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
10
The Backtrack Search Algorithm
Backtracking is a technique for systematically trying all paths through a state
space. i.e., begins at the start state and pursues a path until it reaches either a
goal or a "dead end".
•If it finds a goal, it quits and returns the solution path.
•If it reaches a dead end, it "backtracks" to the most recent node on the path
having unexamined children and continues down one of these branches.
The algorithm continues until it finds
a goal or exhausts the state space.
The following figure shows the
backtrack algorithm applied to
a hypothetical state space.
The dashed arrows indicate the
progress of the search up and
down the space. The number beside each node indicates the order in
which it is visited.
10
9
8
7
6
5
4
3
2
1
A
B DC
FE G
H
I
J
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
11
The Backtrack Search Algorithm
We now define an algorithm that performs a backtrack using three lists to keep
track of nodes in the state space:
•SL, for state list, lists the states in the current path being tried. If a goal is
found, SL contains the ordered list of states on the solution path.
•NSL, for new state list, contains nodes whose descendants have not yet been
generated and searched.
•DE, for dead ends, lists states whose descendants have failed to contain a
goal node. If these states are encountered again, they will be eliminated from
consideration again.
In addition, the algorithm uses the variable CS to hold the state currently under
consideration.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
12
The Backtrack Search Algorithm
Function backtrack
begin
SL = [Start]; NSL = [Start]; DE = [ ]; CS = Start; % initialize
while NSL  [ ] do
begin
//--------------------------------------------------
if CS = goal (or meets goal description) then
return SL; % on success, return list on states in path
//--------------------------------------------------
if CS has no children (excluding nodes already on DE, SL, and NSL) then
begin % backtrack
while SL is not empty and CS = 1
st
element of SL do
begin % This means the state has been visited before.
add CS to DE; % record state as dead end
remove 1st element from SL; % backtrack
remove 1st element from NSL;
CS = 1st element of NSL;
end
add CS to SL;
end
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
13
The Backtrack Search Algorithm
else % CS has children
begin
generate and place children of CS (except nodes
already on DE, SL, and NSL) on NSL;
CS = 1st element of NSL;
add CS to SL;
end
end;
return FAIL;
end.
◼As presented here, backtrack implements data-driven search, taking the root as
a start state and evaluating its children to search for the goal.
◼The algorithm can be viewed as a goal-driven search by letting the goal be the
root of the graph and evaluating descendants back in an attempt to find a start
state.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
14
The Backtrack Search Algorithm
Initialize:
•SL = [A];
•NSL = [A];
•DE = [ ];
•CS = A
DEzNSLSLCSIteration #
[ ][A][A]A0
[ ][BCDA][BA]B1
[ ][EFBCDA][EBA]E2
[ ][HIEFBCDA][HEBA]H3
[H][IEFBCDA][EBA]I
[H][IEFBCDA][IEBA]I4
[IH][EFBCDA][EBA]E
[EIH][FBCDA][BA]F
[EIH][FBCDA][FBA]F5
[EIH][JFBCDA][JFBA]J6
[JEIH][FBCDA][FBA]F
[FJEIH][BCDA][BA]B
[BFJEIH][CDA][A]C
[BFJEIH][CDA][CA]C7
[BFJEIH][GCDA][GCA]G8
a
h
b d
e f
g
c
ij
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
The Backtrack Search Algorithm
Initialize:
•SL = [A];
•NSL = [A];
•DE = [ ];
•CS = A
DENSLSLCSIteration #
[ ][A][A]A0
[ ][BCDA][BA]B1
[ ][EFBCDA][EBA]E2
[ ][HIEFBCDA][HEBA]H3
[H][IEFBCDA][EBA]I
[H][IEFBCDA][IEBA]I4
[IH][EFBCDA][EBA]E
[EIH][FBCDA][BA]F
[EIH][FBCDA][FBA]F5
[EIH][JFBCDA][JFBA]J6
[JEIH][FBCDA][FBA]F
[FJEIH][BCDA][BA]B
[BFJEIH][CDA][A]C
[BFJEIH][CDA][CA]C7
[BFJEIH][GCDA][GCA]G8
G is the Goal, Path = [GCA]9
a
h
b d
e f
g
c
i j
15PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
16
Depth-First and Breadth-First Search
In addition to specifying a search direction (data-driven or goal-driven), a search
algorithm must determine the order in which states are examined in the tree or
the graph, for example: depth-first and breadth-first search.
Depth-first search (DFS) goes deeper into the search space whenever this is
possible. Only when no further descendants of a state can be found are its
siblings considered.
•DFS examines the states in the previous graph in the order: A, B, E, H, I, F,
J, C, G, D.
•The backtrack algorithm implemented depth-first search.
Breadth-first search (BFS), in contrast, explores the space in a level-by-level
fashion. Only when there are no more states to be explored at a given level does
the algorithm move on to the next level.
•A BFS of the previous graph considers the states in the order: A, B, C,
D, E, F, G, H, I, J.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
17
Depth-First and Breadth-First Search ..
BFS and DFS algorithms use two lists:
•open: lists states that have been generated but whose children
have not been examined. The order in which states are placed in
open determines the order of the search. It is implemented as a
queue in BFS and as a stack in DFS. (open is like NSL in
backtrack).
•closed: records states that have already been examined. (closed
is the union of the DE and SL lists of the backtrack algorithm).
The current state is stored in a variable X.
Child states are generated by inference rules, legal moves of a
game, or other state transition operators. Each iteration produces
all children of the state X and adds them to open.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
18
Breadth-First Search Algorithm
Function BFS
begin
open = [Start]; closed = [ ]; % initialize
while open  [ ] do
begin
remove leftmost state from open, and store it in X;
if X is a goal then
return SUCCESS; % goal found
else
begin
generate children of X;
put X on closed;
discard children of X if already on open or closed;
% loop check
put remaining children on right end of open; % queue
end
end
return FAIL
end.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
19
Breadth-First Search Algorithm
A trace of BFS on the previous graph, where the desired goal is G, appears below:
closedopenXIteration #
[ ][A]-0
[A][BCD]A1
[BA][CDEF]B2
[CBA][DEFG]C3
[DCBA][EFG]D4
[EDCBA][FGHI]E5
[FEDCBA][GHIJ]F6
G is the goalG7
Because BFS considers every node at each level of the graph before going deeper into the
space, all states are first reached along the shortest path from the start State, Breadth-first
search is therefore guaranteed to find the shortest path from the start state to the goal.
a
h
b d
e f
g
c
ij
right end of open;
% queue
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
20
Depth-First Search Algorithm
Function DFS
begin
open = [Start]; closed = [ ]; % initialize
while open  [ ] do
begin
remove leftmost state from open, and store it in X;
if X is a goal then
return SUCCESS; % goal found
else
begin
generate children of X;
put X on closed;
discard children of X if already on open or closed;
% loop check
put remaining children on left end of open; % stack
end
end
return FAIL
end.
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
21
Depth-First Search Algorithm
A trace of DFS on the previous graph, where the desired goal is G,
appears below:
closedopenXIteration #
[ ][A]-0
[A][BCD]A1
[BA][EFCD]B2
[EBA][HIFCD]E3
[HEBA][IFCD]H4
[IHEBA][FCD]I5
[FIHEBA][JCD]F6
[JFIHEBA][CD]J7
[CJFIHEBA][GD]C8
G is the goalG9
Unlike BFS, a DFS search is not
guaranteed to find the shortest path to
a state the first time that state is
encountered. Later in the search, a
different path may be found to any
state.
a
h
b d
e f
g
c
ij
left end of open;
% stack
PhD: Kernel Methods in Bioinformatics

20
21
Dr. Mina Younan
Local Search Algorithms
Travelling Salesman Problem
•Suppose a salesperson has five cities to visit and then must return home.
•The goal of the problem is to find the shortest path for the salesperson to travel,
visiting each city, and then returning to the starting city.
•The following figure gives an instance of this problem.
PhD: Kernel Methods in Bioinformatics 22
•The nodes of the graph represent cities, and
each arc is labeled with a weight indicating the
cost of traveling that arc.
•Assume that the salesperson lives in city A and
will return there.
•The path [A, D, C, B, E, A], with associated
cost of 450 miles, is an example of a possible
circuit.
•The goal description requires a complete circuit
with minimum cost.

20
21
Dr. Mina Younan
Local Search Algorithms
Travelling Salesman Problem
•The goal description is a property of the entire path, rather than of a single state.
•Now we present three different search techniques to solve the salesperson problem:
PhD: Kernel Methods in Bioinformatics 23
(1)Exhaustive Search:
•In this technique, beginning
with node A, possible next
states are added until all cities
are included and the path
returns home. The goal is the
lowest-cost path.
•The complexity of the
exhaustive search in the
traveling salesperson problem
is (N-1)!, where N is the
number of cities in the graph.

20
21
Dr. Mina Younan
Local Search Algorithms
Travelling Salesman Problem
•Complexity of N! grows so fast that very soon the search combinations become
intractable. The following two techniques can reduce the search complexity.
PhD: Kernel Methods in Bioinformatics 24
(2) Branch and Bound Technique:
•This technique generates paths one at a time, keeping track of the best circuit found
so far. This value is used as a bound on future candidates.
•As paths are constructed one city at a time, the algorithm examines each partially
completed path.
•If the algorithm determines that the best possible extension to a path, the branch, will
have greater cost than the bound, it eliminates that partial path and all of its possible
extensions.
•This reduces search considerably (1.26N rather than N!).

20
21
Dr. Mina Younan
Local Search Algorithms
Travelling Salesman Problem
(3) Nearest Neighbor Technique
•This technique constructs the path according to the rule "go to the closest
unvisited city".
•The nearest neighbor path through the above graph is [A, E, D. B, C, A], at
a cost of 375 miles.
•This method is highly efficient, as there is only one path to be tried!
•The nearest neighbor heuristic is fallible, as graphs exist for which it does
not find the shortest path, but it is a possible compromise when the time
required makes exhaustive search impractical.
•For example, the algorithm fails to find the shortest path if AC = 300.
PhD: Kernel Methods in Bioinformatics 25

20
21
Dr. Mina Younan
Local Search Algorithms
Travelling Salesman Problem
(3) Nearest Neighbor Technique
•An instance of the traveling salesperson problem with the nearest neighbor
path in bold.
PhD: Kernel Methods in Bioinformatics 26
•Note that this path
[A, E, D, B, C. A],
at a cost of 550, is not
the shortest path.
•The comparatively high
cost of arc (C, A)
defeated the heuristic.

20
21
Dr. Mina Younan
Breadth-First Search
Breadth-First Search
•It starts from the root node, explores the neighboring nodes first and moves
towards the next level neighbors. It generates one tree at a time until the
solution is found. It can be implemented using FIFO queue data structure.
This method provides shortest path to the solution.
•If branching factor (average number of child nodes for a given node) = b
and depth = d, then number of nodes at level d = b
d
.
•The total no of nodes created in worst case is b + b
2
+ b
3
+ … + b
d
.
•Disadvantage − Since each level of nodes is saved for creating next one, it
consumes a lot of memory space. Space requirement to store nodes is
exponential.
•Its complexity depends on the number of nodes. It can check duplicate
nodes.
Adv. Topics in AI 27

20
21
Dr. Mina Younan
Breadth-First Search
•It starts from the root node, explores the neighboring nodes first and moves towards
the next level neighbors. It generates one tree at a time until the solution is found. It
can be implemented using FIFO queue data structure. This method provides shortest
path to the solution.
•If branching factor (average number of child nodes for a given node) = b and depth
= d, then number of nodes at level d = b
d
.
•The total no of nodes created in worst case is b + b
2
+ b
3
+ … + b
d
.
•Disadvantage − Since each level of nodes is saved for creating next one, it
consumes a lot of memory space. Space requirement to store nodes is exponential.
•Its complexity depends on the number of nodes. It can check duplicate nodes.
Adv. Topics in AI 28

20
21
Dr. Mina Younan
Depth-First Search
•It is implemented in recursion with LIFO stack data structure. It creates the same set
of nodes as Breadth-First method, only in the different order.
•As the nodes on the single path are stored in each iteration from root to leaf node,
the space requirement to store nodes is linear. With branching factor b and depth as
m, the storage space is bm.
•Disadvantage − This algorithm may not terminate and go on infinitely on one path.
The solution to this issue is to choose a cut-off depth. If the ideal cut-off is d, and if
chosen cut-off is lesser than d, then this algorithm may fail. If chosen cut-off is more
than d, then execution time increases.
•Its complexity depends on the number of paths. It cannot check duplicate nodes.
Adv. Topics in AI 29

20
21
Dr. Mina Younan
Bidirectional Search
•It searches forward from initial state and backward from goal state till both
meet to identify a common state.
•The path from initial state is concatenated with the inverse path from the
goal state. Each search is done only up to half of the total path.
•Uniform Cost Search
•Sorting is done in increasing cost of the path to a node. It always expands
the least cost node. It is identical to Breadth First search if each transition
has the same cost.
•It explores paths in the increasing order of cost.
•Disadvantage − There can be multiple long paths with the cost ≤ C*.
Uniform Cost search must explore them all.
Adv. Topics in AI 30

20
21
Dr. Mina Younan
Iterative Deepening Depth-First Search
•It performs depth-first search to level 1, starts over, executes a complete
depth-first search to level 2, and continues in such way till the solution is
found.
•It never creates a node until all lower nodes are generated. It only saves a
stack of nodes. The algorithm ends when it finds a solution at depth d. The
number of nodes created at depth d is b
d
and at depth d-1 is b
d-1.
Adv. Topics in AI 31

20
21
Dr. Mina Younan
Comparison of Various Algorithms Complexities
•Let us see the performance of algorithms based on various criteria, where
number of nodes created at depth d is represented as b
d

Adv. Topics in AI 32
Criterion
Breadth
First
Depth
First
Bidirectional
Uniform
Cost
Interactive
Deepening
Time b
d
b
m
b
d/2
b
d
b
d
Space b
d
b
m
b
d/2
b
d
b
d
Optimality Yes No Yes Yes Yes
Completeness Yes No Yes Yes Yes

20
21
Dr. Mina Younan
Informed (Heuristic) Search Strategies
Informed (Heuristic) Search Strategies
•To solve large problems with large number of possible states, problem-specific
knowledge needs to be added to increase the efficiency of search algorithms.
Heuristic Evaluation Functions
•They calculate the cost of optimal path between two states. A heuristic function for
sliding-tiles games is computed by counting number of moves that each tile makes
from its goal state and adding these number of moves for all tiles.
Pure Heuristic Search
•It expands nodes in the order of their heuristic values. It creates two lists, a closed
list for the already expanded nodes and an open list for the created but unexpanded
nodes.
•In each iteration, a node with a minimum heuristic value is expanded, all its child
nodes are created and placed in the closed list. Then, the heuristic function is applied
to the child nodes and they are placed in the open list according to their heuristic
value. The shorter paths are saved and the longer ones are disposed.
Adv. Topics in AI 33

20
21
Dr. Mina Younan
Informed (Heuristic) Search Strategies
A * Search
•It is best-known form of Best First search. It avoids expanding paths that are
already expensive, but expands most promising paths first.
•f(n) = g(n) + h(n), where
•g(n) the cost (so far) to reach the node
•h(n) estimated cost to get from the node to the goal
•f(n) estimated total cost of path through n to goal. It is implemented using
priority queue by increasing f(n).
Greedy Best First Search
•It expands the node that is estimated to be closest to goal. It expands nodes
based on f(n) = h(n). It is implemented using priority queue.
•Disadvantage − It can get stuck in loops. It is not optimal.
Adv. Topics in AI 34

20
21
Dr. Mina Younan
Local Search Algorithms
Local Search Algorithms
They start from a prospective solution and then move to a
neighboring solution. They can return a valid solution even if it is
interrupted at any time before they end.
Adv. Topics in AI 35

20
21
Dr. Mina Younan
Local Search Algorithms
Hill-Climbing Search
•It is an iterative algorithm that starts with an arbitrary solution to a problem and
attempts to find a better solution by changing a single element of the solution
incrementally. If the change produces a better solution, an incremental change is
taken as a new solution. This process is repeated until there are no further
improvements.
•function Hill-Climbing (problem), returns a state that is a local maximum.
Adv. Topics in AI 36
Disadvantage −
This algorithm is
neither complete,
nor optimal.

20
21
Dr. Mina Younan
Local Search Algorithms
Local Beam Search
•In this algorithm, it holds k number of states at any given time. At the start, these
states are generated randomly. The successors of these k states are computed with
the help of objective function. If any of these successors is the maximum value of
the objective function, then the algorithm stops.
•Otherwise the (initial k states and k number of successors of the states = 2k) states
are placed in a pool. The pool is then sorted numerically. The highest k states are
selected as new initial states. This process continues until a maximum value is
reached.
•function BeamSearch( problem, k), returns a solution state.
Adv. Topics in AI 37

20
21
Dr. Mina Younan
Local Search Algorithms
Simulated Annealing
•Annealing is the process of heating and cooling a metal to change its
internal structure for modifying its physical properties. When the metal
cools, its new structure is seized, and the metal retains its newly obtained
properties. In simulated annealing process, the temperature is kept variable.
•We initially set the temperature high and then allow it to ‘cool' slowly as the
algorithm proceeds. When the temperature is high, the algorithm is allowed
to accept worse solutions with high frequency.
Adv. Topics in AI 38
Start
•Initialize k = 0; L = integer number of variables;
•From i → j, search the performance difference Δ.
•If Δ <= 0 then accept else if exp(-Δ/T(k)) > random(0,1) then accept;
•Repeat steps 1 and 2 for L(k) steps.
•k = k + 1;
Repeat steps 1 through 4 till the criteria is met.
End
Tags