See: Description
Interface | Description |
---|---|
CharSource |
defines the type from which a DfaRun reads the data.
|
FaAction |
describes the interface to objects implementing actions associated
with stop states of a finite automaton.
|
Formatter |
defines the interface to classes which rearrange text parts stored
in a
TextStore and/or in a java.util.Map
append them to a StringBuilder . |
NfaParserView | |
ReParser |
defines the interface needed by an
Nfa to "fill" itself
with a regular expression. |
ReParserFactory |
defines the factory interface to create instances of
ReParser . |
TextSplitter |
defines the interface of a method which separates a string into
parts and stores them in a
TextStore . |
Class | Description |
---|---|
AbstractFaAction |
implements the prototype of an
FaAction with initial
priority zero so that actions can be easily written as anonymous
classes. |
ByteCharSource |
implements a
CharSource which reads bytes and converts
them to characters of a given character set while keeping track of
byte positions of converted characters within the input stream. |
CharSequenceCharSource |
is a
CharSource wrapper around a CharSequence such that it can be used as an input source
of a DfaRun . |
Dfa |
implements a deterministic finite automaton.
|
DfaRun |
A
DfaRun is used to apply a Dfa to a
stream of characters. |
DfaRun.FailedMatchBehaviour |
defines typed enumerated values which describe
what a
DfaRun shall do in its read() and filter()
functions, if
no match can be found. |
EmptyCharSource |
represents a minimal implementation of the
CharSource
interface mainly providing the ability to push characters back into
the stream for later reading. |
FaToDot |
contains a static function to output a finite automaton as a graph
to be printed by the fine dot
utility.
|
Intervals<D> |
Used to construct
CharTrans objects of different implementations. |
IntervalsFaState |
Used to construct
CharTrans objects of different implementations. |
Misc |
Miscellaneous static functions.
|
Nfa |
models a non-deterministic finite automaton.
|
PlainSet<E> |
an implementation of
Set based on a simple hashing
scheme. |
PrintfFormatter | |
ReaderCharSource |
wraps Reader or an input stream into a CharSource.
|
ReClassicParser |
implements the regular
expression syntax classically used by
monq.jfa . |
Regexp |
convenience class for matching regular expressions.
|
RegexpSplitter |
implements a
TextSplitter to split text according to a
regular expression. |
SimpleFormatters |
This class is merely a container to wrap some very simple
Formatter s into one source file. |
SimpleFormatters.FixedString | |
SimpleFormatters.GetVar | |
SimpleFormatters.Part | |
SimpleFormatters.PartLen | |
SimpleFormatters.PartSeq | |
Statistics | |
TextStore |
stores a text and parts of it in a (hopefully) efficient
manner.
|
Xml |
provides static fields and methods which help to create regular
expressions needed to parse XML.
|
Exception | Description |
---|---|
CallbackException |
is the exception originally thrown by
FaAction.invoke(java.lang.StringBuilder, int, monq.jfa.DfaRun) . |
CompileDfaException |
represents things that go wrong during compilation of an Nfa into a
Dfa.
|
NomatchException |
is the exception thrown by methods in
DfaRun whenever no
match can be found at the current input position. |
ReSyntaxException |
is thrown whenever a syntax error is found in a regular
expression.
|
UnavailablePositionException |
is thrown by
FileCharSource.position() if the position of a character is
requested for which this position is no longer or not yet
available. |
Implements Finite Automata, deterministic and nondeterministic, with an engine for high throughput filtering of text. There is a complete example program available the main parts of which are described below.
A typical use of this package involves the following steps.
Nfa
comprising one or more regular expressions to which
FaAction
callbacks are attached. The
Nfa
defined below matches identifiers and positive
integers. It has actions attached to tag them with <id>
and <num> respectively.
import monq.jfa.*; import monq.jfa.actions.*; Nfa nfa = new Nfa("[A-Za-z_][A-Za-z0-9_]*", new Printf("<id>%0</id>")) .or("[0-9]+", new Printf("<num>%0</num>"));
Nfa
into a Dfa
.
Dfa dfa = nfa.compile(DfaRun.UNMATCHED_COPY
);
A Dfa
is a data structure that allows to match
the regular expessions which where stuck into the
Nfa
. The running time depends only linearly on the
length of the input string. The parameter given to
compile()
specifies how non-matching input shall
be treated. In this case it is copied unchanged to the output.
Dfa
, it must be wrapped in a
DfaRun
which applies the Dfa
to input
data, e.g. the standard input stream.
DfaRun run = new DfaRun(dfa, new ReaderCharSource(System.in));
DfaRun
is asked to filter to System.out
:
run.filter(System.out);There are also methods to
read()
from a
DfaRun
.
Now you may want to compile and run the slightly more elaborate example.
DfaRun
can be used as the input source for
another DfaRun
:DfaRun a = new DfaRun(..., new ReaderCharSource(System.in)); DfaRun b = new DfaRun(..., a);
FaAction
is allowed to change the Dfa
of
a DfaRun
on the fly. This allows for pieces of text to
be operated upon by different Dfa
s. A general
action to change the Dfa
is SwitchDfa
.
monq.jfa.actions
.
DfaRun
with DfaRun.unskip()
. This in particular allows matches with a
trailing context. You only have push back the context in the
action callback.
The model of operation implemented by this package was inspired by LEX — Lexical Analyzer Generator described in 1975 by M.E.Lesk and E.Schmidt in the UNIX Programmer's Manual and its more recent GNU incarnation flex. This means in particular that the software operates as a filter which changes text as it passes through the machinery. Filtering, i.e. changing text, is initiated whenever a regular expression encoded in the machinery matches. The filter operation itself is carried out by callbacks invoked for every matching piece of text. Apart from changing the text, the callback may count occurences, record things in a database, or do whatever can be done with a piece of java code.
The interface of such a callback is FaAction
. Normally you just need to extend AbstractFaAction
and implement the invoke()
method.
Commonly used callbacks can be found in the package monq.jfa.actions
.
While this implementation uses fast deterministic finite automata (DFA) for matching, it nevertheless implements the tracking of submatches. A description of the formal behaviour of the method used shows that it simply does not work in the general case, but it works quite well in many practical applications. Nevertheless care must be taken when using submatches, as sometimes results are a little surprising. For more information see the description of subexpressions as well as what to do if they don't work.