Burrows-Wheeler indexes that support both extending and contracting any substring of the text $T$ of length $n$ on which they are built, in any direction, provide substantial flexibility in traversing the text and can be used to implement several algorithms. The practical appeal of such indexes is c...
Burrows-Wheeler indexes that support both extending and contracting any substring of the text $T$ of length $n$ on which they are built, in any direction, provide substantial flexibility in traversing the text and can be used to implement several algorithms. The practical appeal of such indexes is contingent on them being compact, and current designs that are sensitive to the compressibility of the input take either $O(e+\REV{e})$ words of space, where $e$ and $\REV{e}$ are the number of right and left extensions of the maximal repeats of $T$, or $O(r\log(n/r)+\REV{r}\log(n/\REV{r}))$ words, where $r$ and $\REV{r}$ are the number of runs in the Burrows-Wheeler transform of $T$ and of its reverse. In this paper we describe a fully-functional bidirectional index that takes $O(m+r+\REV{r})$ words, where $m$ is the number of maximal repeats of $T$, as well as a variant that takes $O(r+\REV{r})$ words.
Time PracticalSpace
and
andandacd
and constant
cPtmr
lSar
cPtmr
cPtmr
cPtmr
[1] Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and in nite-order de Bru!n gra"hs. C#$ 2%1&.
[2] 'agie, (a)arro, #rezza. Fully functional suf x trees and o"ti*al text searching in BW+-runs bounded s"ace. ,-C$ 2%2%.
and
and
Bidirectionalindencrono
C
A
A•# #
#x
#
#
#
C
C•
C•A
x
A
xx
A
AC•
x
x
x
A C
ST
A
A
x• x•
x•
x
C•
x•
Maximalre ea
=8
C
A
A•# #
#g
#
#
#
C
C•
C•A
g
A
gg
A
AC•
g
g
g
A C
ST
A
A
g• g•
g•
g
C•
g•
Right-extensionsofmaximalreeats
≥
≥
C
A
A•# #
#p
#
#
#
C
C•
C•A
p
A
pp
A
AC•
p
p
p
A C
ST
A
A
p• p•
p•
p
C•
p•
Depthofthem m epe ttee
=3
C
A
A•# #
#r
#
#
#
C
C•
C•A
r
A
rr
A
AC•
r
r
r
A C
ST
A
A
r• r•
r•
r
C•
r•
Stringdepthofthemaximalrepeattree
=6
C
A
A•# #
#o
#
#
#
C
C•
C•A
o
A
oo
A
AC•
o
o
o
A C
ST
A
A
o• o•
o•
o
C•
o•
Frontierm im ree t
C
A
A•# #
#g
#
#
#
C
C•
C•A
g
A
gg
A
AC•
g
g
g
A C
ST
A
A
g• g•
g•
g
C•
g•
Rightmostm im ts
Backgrou
C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
f
f
A C
ST
A
A
f• f•
f•
f
C•
f•
Left-contractionfromright-maximal
C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
f
f
A C
ST
A
A
f• f•
f•
f
C•
f•
Suffixlink
Suffixlink
CG CGCG
G GG
C
C
C
C
C
C
C
C
C
(a) (b) Sxf
Left-contractionfromnon-right-maximal
C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
ff
A C
Le
A
A
f• f•
f•
f
C•
f•
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and innite-order de Bruijn graphs. C!" #$%&.
Left-contractionfromnon-right-maximal
C
A
A•# #
#f
#
#
#
C
C•
C•A
f
A
ff
A
AC•
f
ff
A C
CA
A
A
f• f•
f•
f
C•
f•
Belazzougui, Cunial. Fully-functional bidirectional Burrows-Wheeler indexes and innite-order de Bruijn graphs. C!" #$%&.
CCCAAACCCCGTTTCAAAAACCCAAACCCCC
mightslchotnauhrrhyltatptfb
might still contain unary paths of black nodes.
Every node in such a path can be charged to a
distinct run boundary.
The deepest nodes in (black) are rightmost
maximal repeats, so they can be charged to distinct
run boundaries.
Thus, there are black nodes with more than
one maximal repeat child.
Every blue node can be charged to its black
descendant.
Red nodes can still be charged to a constant
number of runs or of blue/black nodes.
Run-lengthencoded takes words
Suffixlinkwiththecompeetopolo
After the LCA, we might end u in ue nde
ut we wnt the inter f the highet nde
inSTwith tring deth t et
Let dente thi rem with tue
words
Int str
ST
words
Int str
STST wordos Irdn
words
Int str
STST wordos Irdn
words
Hig sodd
ST
words
Hig sodd
STST wordos Hrdi
words
Hig sodd
STST wordos Hrdi
words
wordos Hrdi
Hig sodd
STST
words
Severalothercasesarepossible
wor dsSrevaltvad
th cdps
th a hpciarb oS a cdps iarb
straddles multiple blue paths
straddles itself
It can be shown that all cases can be handled with
rtvs
After at mostHWeiner links, every maximal repeat loses
its right-maximality permanently.
So the length of theSTpath between the rst interval of an instance
and the solution interval becomes zero.
Suffixlinkwiththecompeetopolo
After the LCA from interval , e miht en in a
re noe of that i the hil of a le noe
t e ant the interval of the re noe of that
ontain , an of it le arent
Can e olve ith a imilar rerive roere
Computingthereverseinterval
NontrivialonlywhenWis left-aialtnotriht-aial
anthelos ofWis alenoeof
ilesoltionisseanownnerof
eries onthereverseine
tie
tillnees onthereverseine
froaaialreeatW.
tie
Unidirectionaloperations
TimeSpace ordsorard direction
Unidr rectreo ialaeps
(needs ID of longest
left-maximal sufx)
amortized
Application:variable-orderdeBruijngraph
thatsupportsjustonedirection!