Smaller fully-functional bidirectional BWT indexes

FabioCunial 71 views 41 slides Oct 14, 2020
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

Burrows-Wheeler indexes that support both extending and contracting any substring of the text $T$ of length $n$ on which they are built, in any direction, provide substantial flexibility in traversing the text and can be used to implement several algorithms. The practical appeal of such indexes is c...


Slide Content

Smallerfully-functional
CFnFBli1Fbgam2,3FgnlTlI
Djamal Belazzougui
1
Fabio Cunial
2,3
1DTISI,CERIST,Algiers.
2 Max Plan!" Ins#i#u#e $or Mole!ular Cell Biolog% an& 'ene#i!s, Dres&en.
3 Cen#er $or S%s#ems Biolog% Dres&en

Constant-spacedescriptorofastringW:
Bidirectionalindex(synchronos
ppication:ariae-orderderngrapencodingaorders
andafreenciessitaneos

Time PracticalSpace
and
andandacd
and constant
cPtmr
lSar
cPtmr
cPtmr
cPtmr
[1] Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and in nite-order de Bru!n gra"hs. C#$ 2%1&.
[2] 'agie, (a)arro, #rezza. Fully functional suf x trees and o"ti*al text searching in BW+-runs bounded s"ace. ,-C$ 2%2%.
and
and
Bidirectionalindencrono

C
A
A•# #
#x
#
#
#
C
C•
C•A
x
A
xx
A
AC•
x
x
x
A C
ST
A
A
x• x•
x•
x
C•
x•
Maximalre ea
=8

C
A
A•# #
#g
#
#
#
C
C•
C•A
g
A
gg
A
AC•
g
g
g
A C
ST
A
A
g• g•
g•
g
C•
g•
Right-extensionsofmaximalreeats

C
A
A•# #
#p
#
#
#
C
C•
C•A
p
A
pp
A
AC•
p
p
p
A C
ST
A
A
p• p•
p•
p
C•
p•
Depthofthem m epe ttee
=3

C
A
A•# #
#r
#
#
#
C
C•
C•A
r
A
rr
A
AC•
r
r
r
A C
ST
A
A
r• r•
r•
r
C•
r•
Stringdepthofthemaximalrepeattree
=6

C
A
A•# #
#o
#
#
#
C
C•
C•A
o
A
oo
A
AC•
o
o
o
A C
ST
A
A
o• o•
o•
o
C•
o•
Frontierm im ree t

C
A
A•# #
#g
#
#
#
C
C•
C•A
g
A
gg
A
AC•
g
g
g
A C
ST
A
A
g• g•
g•
g
C•
g•
Rightmostm im ts

Backgrou

C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
f
f
A C
ST
A
A
f• f•
f•
f
C•
f•
Left-contractionfromright-maximal

C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
f
f
A C
ST
A
A
f• f•
f•
f
C•
f•
Suffixlink

Suffixlink
CG CGCG
G GG
C
C
C
C
C
C
C
C
C
(a) (b) Sxf

Left-contractionfromnon-right-maximal
C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
ff
A C
Le
A
A
f• f•
f•
f
C•
f•
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and innite-order de Bruijn graphs. C!" #$%&.

Left-contractionfromnon-right-maximal
C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
ff
A C
CA
A
A
f• f•
f•
f
C•
f•
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and innite-order de Bruijn graphs. C!" #$%&.

Left-contractionfromnon-right-maximal
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and innite-order de Bruijn graphs. CPM !"#$.
Left-conr
Checkingifanoei aaxialeea
enghofaaxialeea
eigheleelanceoueie
onaxialeea ugah
eet

word

PruningtheSTtopology
ST
...

Sizeoftheprunedtopology
Run-lengthencoded takes wods
heeaeatost ednodes
CCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAACCCAAAAA

Suffixlinkwiththeprunedtopolo
CG CGCG
G GG
C
C
C
C
C
C
C
C
C
(a) (b) Sxf
nhned

Suffixlinkwiththeprunedtopolo
ST

Suffixlinkwiththeprunedtopolo
ST

Suffixlinkwiththeprunedtopolo
ST

Left-contractionfromnon-right-maximal
Left-coneftr-ort
aiLLofctrmg
Sufxln
enfanodeaaxalrepeat
entofaaxalrepeat
etedleelanetoruere
onaxalrepeatubrap
oetaxalrepeatanetor
eet

Computingthereverseinterval
Reduces toimplementing
onthereverseRLBWT(fromamaximalrepeatW),
andexploitingtheisomorphismofsubtrees ofST.
inducsecanbeimplementedvia
fromamaximalrepeatcanalsobeimplemented
inducse.
NontrivialonlywhenWisleft!maximalbutnotright!maximal
bytraversingtheeducsutmaximalrepeattreetop!down"
time, words.

word

Compressingtheprunedtopology

CCCAAACCCCGTTTCAAAAACCCAAACCCCC
mightslchotnauhrrhyltatptfb
might still contain unary paths of black nodes.
Every node in such a path can be charged to a
distinct run boundary.
The deepest nodes in (black) are rightmost
maximal repeats, so they can be charged to distinct
run boundaries.
Thus, there are black nodes with more than
one maximal repeat child.
Every blue node can be charged to its black
descendant.
Red nodes can still be charged to a constant
number of runs or of blue/black nodes.
Run-lengthencoded takes words

Suffixlinkwiththecompeetopolo
After the LCA, we might end u in ue nde
ut we wnt the inter f the highet nde
inSTwith tring deth t et
Let dente thi rem with tue

words
Int str
ST

words
Int str
STST wordos Irdn

words
Int str
STST wordos Irdn

words
Hig sodd
ST

words
Hig sodd
STST wordos Hrdi

words
Hig sodd
STST wordos Hrdi

words
wordos Hrdi
Hig sodd
STST

words
Severalothercasesarepossible
wor dsSrevaltvad
th cdps
th a hpciarb oS a cdps iarb
straddles multiple blue paths
straddles itself
It can be shown that all cases can be handled with
rtvs
After at mostHWeiner links, every maximal repeat loses
its right-maximality permanently.
So the length of theSTpath between the rst interval of an instance
and the solution interval becomes zero.

Suffixlinkwiththecompeetopolo
After the LCA from interval , e miht en in a
re noe of that i the hil of a le noe
t e ant the interval of the re noe of that
ontain , an of it le arent
Can e olve ith a imilar rerive roere

Computingthereverseinterval
NontrivialonlywhenWis left-aialtnotriht-aial
anthelos ofWis alenoeof
ilesoltionisseanownnerof
eries onthereverseine
tie
tillnees onthereverseine
froaaialreeatW.
tie

Unidirectionaloperations
TimeSpace ordsorard direction
Unidr rectreo ialaeps
(needs ID of longest
left-maximal sufx)
amortized
Application:variable-orderdeBruijngraph
thatsupportsjustonedirection!

Smallerfully-functional
CFnFBli1Fbgam2,3FgnlTlI
Djamal Belazzougui
1
Fabio Cunial
2,3
1DTISI,CERIST,Algiers.
2 Max Plan!" Ins#i#u#e $or Mole!ular Cell Biolog% an& 'ene#i!s, Dres&en.
3 Cen#er $or S%s#ems Biolog% Dres&en