[ Tcllib Home | Main Table Of Contents | Table Of Contents | Keyword Index | Categories | Modules | Applications ]

grammar::fa::op - Operations on finite automatons

- package require
**Tcl 8.4** - package require
**snit** - package require
**struct::list** - package require
**struct::set** - package require
**grammar::fa::op ?0.4.1?**

**::grammar::fa::op::constructor***cmd***::grammar::fa::op::reverse***fa***::grammar::fa::op::complete***fa*?*sink*?**::grammar::fa::op::remove_eps***fa***::grammar::fa::op::trim***fa*?*what*?**::grammar::fa::op::determinize***fa*?*mapvar*?**::grammar::fa::op::minimize***fa*?*mapvar*?**::grammar::fa::op::complement***fa***::grammar::fa::op::kleene***fa***::grammar::fa::op::optional***fa***::grammar::fa::op::union***fa**fb*?*mapvar*?**::grammar::fa::op::intersect***fa**fb*?*mapvar*?**::grammar::fa::op::difference***fa**fb*?*mapvar*?**::grammar::fa::op::concatenate***fa**fb*?*mapvar*?**::grammar::fa::op::fromRegex***fa**regex*?*over*?**::grammar::fa::op::toRegexp***fa***::grammar::fa::op::toRegexp2***fa***::grammar::fa::op::toTclRegexp***regexp**symdict***::grammar::fa::op::simplifyRegexp***regexp*

This package provides a number of complex operations on finite
automatons (Short: FA),
as provided by the package **grammar::fa**.
The package does not provide the ability to create and/or manipulate
such FAs, nor the ability to execute a FA for a stream of symbols.
Use the packages **grammar::fa**
and **grammar::fa::interpreter** for that.
Another package related to this is **grammar::fa::compiler**
which turns a FA into an executor class which has the definition of
the FA hardwired into it.

For more information about what a finite automaton is see section
*FINITE AUTOMATONS* in package **grammar::fa**.

The package exports the API described here. All commands modify their
first argument. I.e. whatever FA they compute is stored back into
it. Some of the operations will construct an automaton whose states
are all new, but related to the states in the source
automaton(s). These operations take variable names as optional
arguments where they will store mappings which describe the
relationship(s).
The operations can be loosely partitioned into structural and language
operations. The latter are defined in terms of the language the
automaton(s) accept, whereas the former are defined in terms of the
structural properties of the involved automaton(s). Some operations
are both.
*Structure operations*

**::grammar::fa::op::constructor***cmd*This command has to be called by the user of the package before any other operations is performed, to establish a command which can be used to construct a FA container object. If this is not done several operations will fail as they are unable to construct internal and transient containers to hold state and/or partial results.

Any container class using this package for complex operations should set its own class command as the constructor. See package

**grammar::fa**for an example.**::grammar::fa::op::reverse***fa*Reverses the

*fa*. This is done by reversing the direction of all transitions and swapping the sets of*start*and*final*states. The language of*fa*changes unpredictably.**::grammar::fa::op::complete***fa*?*sink*?Completes the

*fa**complete*, but nothing is done if the*fa*is already*complete*. This implies that only the first in a series of multiple consecutive complete operations on*fa*will perform anything. The remainder will be null operations.The language of

*fa*is unchanged by this operation.This is done by adding a single new state, the

*sink*, and transitions from all other states to that sink for all symbols they have no transitions for. The sink itself is made complete by adding loop transitions for all symbols.Note: When a FA has epsilon-transitions transitions over a symbol for a state S can be indirect, i.e. not attached directly to S, but to a state in the epsilon-closure of S. The symbols for such indirect transitions count when computing completeness of a state. In other words, these indirectly reached symbols are

*not*missing.The argument

*sink*provides the name for the new state and most not be present in the*fa*if specified. If the name is not specified the command will name the state "sink**n**", where**n**is set so that there are no collisions with existing states.Note that the sink state is

*not useful*by definition. In other words, while the FA becomes complete, it is also*not useful*in the strict sense as it has a state from which no final state can be reached.**::grammar::fa::op::remove_eps***fa*Removes all epsilon-transitions from the

*fa*in such a manner the the language of*fa*is unchanged. However nothing is done if the*fa*is already*epsilon-free*. This implies that only the first in a series of multiple consecutive complete operations on*fa*will perform anything. The remainder will be null operations.*Note:*This operation may cause states to become unreachable or not useful. These states are not removed by this operation. Use**::grammar::fa::op::trim**for that instead.**::grammar::fa::op::trim***fa*?*what*?Removes unwanted baggage from

*fa*. The legal values for*what*are listed below. The command defaults to**!reachable|!useful**if no specific argument was given.**!reachable**Removes all states which are not reachable from a start state.

**!useful**Removes all states which are unable to reach a final state.

**!reachable&!useful****!(reachable|useful)**Removes all states which are not reachable from a start state and are unable to reach a final state.

**!reachable|!useful****!(reachable&useful)**Removes all states which are not reachable from a start state or are unable to reach a final state.

**::grammar::fa::op::determinize***fa*?*mapvar*?Makes the

*fa*deterministic without changing the language accepted by the*fa*. However nothing is done if the*fa*is already*deterministic*. This implies that only the first in a series of multiple consecutive complete operations on*fa*will perform anything. The remainder will be null operations.The command will store a dictionary describing the relationship between the new states of the resulting dfa and the states of the input nfa in

*mapvar*, if it has been specified. Keys of the dictionary are the handles for the states of the resulting dfa, values are sets of states from the input nfa.*Note*: An empty dictionary signals that the command was able to make the*fa*deterministic without performing a full subset construction, just by removing states and shuffling transitions around (As part of making the FA epsilon-free).*Note*: The algorithm fails to make the FA deterministic in the technical sense if the FA has no start state(s), because determinism requires the FA to have exactly one start states. In that situation we make a best effort; and the missing start state will be the only condition preventing the generated result from being*deterministic*. It should also be noted that in this case the possibilities for trimming states from the FA are also severely reduced as we cannot declare states unreachable.**::grammar::fa::op::minimize***fa*?*mapvar*?Creates a FA which accepts the same language as

*fa*, but has a minimal number of states. Uses Brzozowski's method to accomplish this.The command will store a dictionary describing the relationship between the new states of the resulting minimal fa and the states of the input fa in

*mapvar*, if it has been specified. Keys of the dictionary are the handles for the states of the resulting minimal fa, values are sets of states from the input fa.*Note*: An empty dictionary signals that the command was able to minimize the*fa*without having to compute new states. This should happen if and only if the input FA was already minimal.*Note*: If the algorithm has no start or final states to work with then the result might be technically minimal, but have a very unexpected structure. It should also be noted that in this case the possibilities for trimming states from the FA are also severely reduced as we cannot declare states unreachable.

*Language operations*
All operations in this section require that all input FAs have at
least one start and at least one final state. Otherwise the language of
the FAs will not be defined, making the operation senseless (as it
operates on the languages of the FAs in a defined manner).

**::grammar::fa::op::complement***fa*Complements

*fa*. This is possible if and only if*fa*is*complete*and*deterministic*. The resulting FA accepts the complementary language of*fa*. In other words, all inputs not accepted by the input are accepted by the result, and vice versa.The result will have all states and transitions of the input, and different final states.

**::grammar::fa::op::kleene***fa*Applies Kleene's closure to

*fa*. The resulting FA accepts all strings**S**for which we can find a natural number**n**(0 inclusive) and strings**A1**...**An**in the language of*fa*such that**S**is the concatenation of**A1**...**An**. In other words, the language of the result is the infinite union over finite length concatenations over the language of*fa*.The result will have all states and transitions of the input, and new start and final states.

**::grammar::fa::op::optional***fa*Makes the

*fa*optional. In other words it computes the FA which accepts the language of*fa*and the empty the word (epsilon) as well.The result will have all states and transitions of the input, and new start and final states.

**::grammar::fa::op::union***fa**fb*?*mapvar*?Combines the FAs

*fa*and*fb*such that the resulting FA accepts the union of the languages of the two FAs.The result will have all states and transitions of the two input FAs, and new start and final states. All states of

*fb*which exist in*fa*as well will be renamed, and the*mapvar*will contain a mapping from the old states of*fb*to the new ones, if present.It should be noted that the result will be non-deterministic, even if the inputs are deterministic.

**::grammar::fa::op::intersect***fa**fb*?*mapvar*?Combines the FAs

*fa*and*fb*such that the resulting FA accepts the intersection of the languages of the two FAs. In other words, the result will accept a word if and only if the word is accepted by both*fa*and*fb*. The result will be useful, but not necessarily deterministic or minimal.The command will store a dictionary describing the relationship between the new states of the resulting fa and the pairs of states of the input FAs in

*mapvar*, if it has been specified. Keys of the dictionary are the handles for the states of the resulting fa, values are pairs of states from the input FAs. Pairs are represented by lists. The first element in each pair will be a state in*fa*, the second element will be drawn from*fb*.**::grammar::fa::op::difference***fa**fb*?*mapvar*?Combines the FAs

*fa*and*fb*such that the resulting FA accepts the difference of the languages of the two FAs. In other words, the result will accept a word if and only if the word is accepted by*fa*, but not by*fb*. This can also be expressed as the intersection of*fa*with the complement of*fb*. The result will be useful, but not necessarily deterministic or minimal.The command will store a dictionary describing the relationship between the new states of the resulting fa and the pairs of states of the input FAs in

*mapvar*, if it has been specified. Keys of the dictionary are the handles for the states of the resulting fa, values are pairs of states from the input FAs. Pairs are represented by lists. The first element in each pair will be a state in*fa*, the second element will be drawn from*fb*.**::grammar::fa::op::concatenate***fa**fb*?*mapvar*?Combines the FAs

*fa*and*fb*such that the resulting FA accepts the cross-product of the languages of the two FAs. I.e. a word W will be accepted by the result if there are two words A and B accepted by*fa*, and*fb*resp. and W is the concatenation of A and B.The result FA will be non-deterministic.

**::grammar::fa::op::fromRegex***fa**regex*?*over*?Generates a non-deterministic FA which accepts the same language as the regular expression

*regex*. If the*over*is specified it is treated as the set of symbols the regular expression and the automaton are defined over. The command will compute the set from the "S" constructors in*regex*when*over*was not specified. This set is important if and only if the complement operator "!" is used in*regex*as the complementary language of an FA is quite different for different sets of symbols.The regular expression is represented by a nested list, which forms a syntax tree. The following structures are legal:

- {S x}
Atomic regular expression. Everything else is constructed from these. Accepts the

**S**ymbol "x".- {. A1 A2 ...}
Concatenation operator. Accepts the concatenation of the regular expressions

**A1**,**A2**, etc.*Note*that this operator accepts zero or more arguments. With zero arguments the represented language is*epsilon*, the empty word.- {| A1 A2 ...}
Choice operator, also called "Alternative". Accepts all input accepted by at least one of the regular expressions

**A1**,**A2**, etc. In other words, the union of**A1**,**A2**.*Note*that this operator accepts zero or more arguments. With zero arguments the represented language is the*empty*language, the language without words.- {& A1 A2 ...}
Intersection operator, logical and. Accepts all input accepted which is accepted by all of the regular expressions

**A1**,**A2**, etc. In other words, the intersection of**A1**,**A2**.- {? A}
Optionality operator. Accepts the empty word and anything from the regular expression

**A**.- {* A}
Kleene closure. Accepts the empty word and any finite concatenation of words accepted by the regular expression

**A**.- {+ A}
Positive Kleene closure. Accepts any finite concatenation of words accepted by the regular expression

**A**, but not the empty word.- {! A}
Complement operator. Accepts any word not accepted by the regular expression

**A**. Note that the complement depends on the set of symbol the result should run over. See the discussion of the argument*over*before.

**::grammar::fa::op::toRegexp***fa*This command generates and returns a regular expression which accepts the same language as the finite automaton

*fa*. The regular expression is in the format as described above, for**::grammar::fa::op::fromRegex**.**::grammar::fa::op::toRegexp2***fa*This command has the same functionality as

**::grammar::fa::op::toRegexp**, but uses a different algorithm to simplify the generated regular expressions.**::grammar::fa::op::toTclRegexp***regexp**symdict*This command generates and returns a regular expression in Tcl syntax for the regular expression

*regexp*, if that is possible.*regexp*is in the same format as expected by**::grammar::fa::op::fromRegex**.The command will fail and throw an error if

*regexp*contains complementation and intersection operations.The argument

*symdict*is a dictionary mapping symbol names to pairs of*syntactic type*and Tcl-regexp. If a symbol occurring in the*regexp*is not listed in this dictionary then single-character symbols are considered to designate themselves whereas multiple-character symbols are considered to be a character class name.**::grammar::fa::op::simplifyRegexp***regexp*This command simplifies a regular expression by applying the following algorithm first to the main expression and then recursively to all sub-expressions:

Convert the expression into a finite automaton.

Minimize the automaton.

Convert the automaton back to a regular expression.

Choose the shorter of original expression and expression from the previous step.

This document, and the package it describes, will undoubtedly contain
bugs and other problems.
Please report such in the category *grammar_fa* of the
Tcllib Trackers.
Please also report any ideas for enhancements you may have for either
package and/or documentation.

automaton, finite automaton, grammar, parsing, regular expression, regular grammar, regular languages, state, transducer

Grammars and finite automata

Copyright © 2004-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>