Partitioning Multitape Transducers
Fran¸cois Barth´elemy
CNAM-C´edric, 292 rue Saint-Martin, F-75003 Paris, France
and INRIA, domaine de Voluceau, F-78153 Rocquencourt cedex, France
[email protected]
Abstract.In this paper, we define a class of transducers closed under
intersection and complementation, which are the operations used for
contextual rule compilation. This class of transducers is not theoreti-
cally more powerful than the Epsilon-Free Letter Transducers most com-
monly used. But they are more convenient for morphological description
whenever the correspondence between lexical and surface forms is not a
symbol-to-symbol matching.
A complete set of operations on transducers is defined, including some
operations (projection and join) which change the number of tapes of the
transducers.
1 Introduction
Finite State morphology use contextual rules (rewrite rules or two-level rules)
which are compiled into finite-state transducers. The key idea of compilation, due
to Kaplan and Kay [3] is subtractive: first, compute a superset of the specified
language, namely the concatenation closure of the rule centers; then retract what
is not allowed by the contexts specified in the rules. At the operational level, the
fundamental operations are complementation and subtraction.
Finite State Transducers are not closed under complementation and subtrac-
tion in general. The subclass of transducers where the transitions are labeled by
exactly one symbol on each tape is closed under complementation, subtraction
and intersection. It is calledEpsilon-Free Letter Transducer[5]. This subclass is
used for Finite State morphology. A special symbol 0 is artificially introduced to
represent the empty string using an ordinary symbol. Introduction and removal
of 0s are done as pre and post-processing when executing a transducer.
In this paper, we present another subclass of transducers that are closed under
complementation, intersection and subtraction, called the Partitioning Multi-
tape Transducers (PMTs). The relations defined are not necessarily same-length
relations. The transitions are labeled by an independent regular expression for
each tape. The same-length constraint does not apply on symbols but on parti-
tion components, which are possibly empty symbol sequences (i.e. strings). Each
transition defines a partition.
PMTs have the same expressive power as Epsilon-Free Letter Transducer
and regular languages. We give a compilation algorithm which compiles a PMT
into a Finite State Automaton. We also define a computation framework with
operations which change the number of tapes of the transducers.
A. Yli-Jyr¨a, L. Karttunen, and J. Karhum¨aki (Eds.): FSMNLP 2005, LNAI 4002, pp. 11–20, 2006.
cffiSpringer-Verlag Berlin Heidelberg 2006