public class Nfa
extends java.lang.Object
models a non-deterministic finite automaton.
The most frequent use involves several calls to or(CharSequence, FaAction)
to bind regular expressions to
actions. After having added all necessary re/action
pairs, compile()
into a Dfa
and operate it
by a DfaRun
to filter text. Filtering means that
all regular expressions which were bound to actions compete for
matches. The regular expression with the longest match is selected
and its associated action will be invoked.
The regular expression syntax depends on the ReParser
implementation in use, which can be inspeced and changed with
getDefaultParserFactory()
and
setDefaultParserFactory()
. At the
time of this writing, the default parser is a ReClassicParser
.
Modifier and Type | Field and Description |
---|---|
static monq.jfa.Nfa.EmptyNfaType |
EPSILON
value of an enumeration type to be passed to
Nfa(Nfa.EmptyNfaType) . |
static monq.jfa.Nfa.EmptyNfaType |
NOTHING
value of an enumeration type to be passed to
Nfa(Nfa.EmptyNfaType) . |
Constructor and Description |
---|
Nfa()
creates an automaton that recognizes nothing but is suitable to be filled
with calls to any of the
or() methods. |
Nfa(java.lang.CharSequence regex)
calls
Nfa(CharSequence,FaAction) with the 2nd
parameter set to null . |
Nfa(java.lang.CharSequence regex,
FaAction a)
creates an
Nfa from the given
regular expression and
assigns the given action to the stop state. |
Nfa(monq.jfa.Nfa.EmptyNfaType type)
creates an
Nfa which does either not match
anything or only the empty string. |
Modifier and Type | Method and Description |
---|---|
Nfa |
addAction(FaAction a)
adds an action to the Nfa (expert only).
|
void |
allPrefixes()
converts the automaton into one which matches all non-empty prefixes of
the given automaton.
|
Dfa |
compile(DfaRun.FailedMatchBehaviour fmb)
compiles
this into a Dfa with the given
behaviour for non-matching input. |
Dfa |
compile(DfaRun.FailedMatchBehaviour fmb,
FaAction eofAction)
compiles this non-deterministic finite automaton into a
deterministic finite automaton.
|
Nfa |
completeToSkip(FaAction noMatchAction)
extends the automaton with an additonal automaton which matches
everything but all non-empty prefixes of this automaton.
|
Nfa |
copy()
creates a copy of this Nfa where all the states are duplicated, but the
actions are kept such that both Nfas referernce the same actions.
|
java.lang.String |
escape(java.lang.String s)
escapes special characters in
s according to the
rules of the current ReParser set for
this . |
void |
escape(java.lang.StringBuilder out,
java.lang.CharSequence in,
int startAt)
is only a pass-through to
getReParser().escape(...) |
static ReParserFactory |
getDefaultParserFactory()
returns the default parser factory used by
Nfa s
to obtain a default ReParser . |
double |
getMemoryForSpeedTradeFactor() |
ReParser |
getReParser()
returns the regular expression parser currently used by this
Nfa . |
Nfa |
invert()
inverts the automaton, such that the resulting automaton will
match the set-complement of the set of strings matched by the
given automaton.
|
boolean |
markAsSub()
marks this automaton as a reporting subautomaton.
|
Nfa |
not()
implements the request match everthing but X.
|
Nfa |
optional()
transforms this finite automaton to match the empty string
too.
|
Nfa |
or(java.lang.CharSequence re)
adds the given regular expression to the automaton (expert
only).
|
Nfa |
or(java.lang.CharSequence regex,
FaAction action)
adds the given regular expression
re to the
automaton and associates it with the given action. |
Nfa |
or(Nfa other)
joins the other
Nfa into this automaton while
keeping all stop states and assigned actions. |
Nfa |
plus()
applies the
+ operator to the NFA. |
Nfa |
seq(java.lang.CharSequence s)
calls
seq(CharSequence,FaAction) with
null as the 2nd parameter. |
Nfa |
seq(java.lang.CharSequence regex,
FaAction a)
extend the current automaton to recognize
regex
as a suffix of the strings already recognized and arrange for the
given action to be called if the suffix was recognized. |
Nfa |
seq(Nfa other)
Concatenates
other to this such that a
sequence of characters vw is matched where v is
matched by this and w is matched by
other . |
static void |
setDefaultParserFactory(ReParserFactory rpf)
sets the default parser factory used by
Nfa s to
obtain a default ReParser . |
void |
setMemoryForSpeedTradeFactor(float f)
when creating DFA transitions during NFA to DFA compilation, the
transition tables use different implementations, depending on how dense
the transition table is.
|
Nfa |
setReParser(ReParser reParser)
sets the parser to be used by this
Nfa . |
Nfa |
shortest()
transforms the Nfa into an Nfa which only matches the
shortest matches of the given Nfa.
|
java.lang.String |
specialChars()
is a shortcut for
.getReParser().specialChars() |
Nfa |
star()
applies the Kleene closure (
* ) operator to the NFA. |
void |
toDot(java.io.PrintStream out) |
void |
toDot(java.lang.String filename) |
public static final monq.jfa.Nfa.EmptyNfaType NOTHING
Nfa(Nfa.EmptyNfaType)
.public static final monq.jfa.Nfa.EmptyNfaType EPSILON
Nfa(Nfa.EmptyNfaType)
.public Nfa()
creates an automaton that recognizes nothing but is suitable to be filled
with calls to any of the or()
methods.
public Nfa(monq.jfa.Nfa.EmptyNfaType type)
public Nfa(java.lang.CharSequence regex, FaAction a) throws ReSyntaxException
creates an Nfa
from the given
regular expression and
assigns the given action to the stop state. More regex/action
pairs are then normally added with or(CharSequence,FaAction)
.
ReSyntaxException
public Nfa(java.lang.CharSequence regex) throws ReSyntaxException
calls Nfa(CharSequence,FaAction)
with the 2nd
parameter set to null
.
ReSyntaxException
public static void setDefaultParserFactory(ReParserFactory rpf)
sets the default parser factory used by Nfa
s to
obtain a default ReParser
.
setReParser(monq.jfa.ReParser)
public static ReParserFactory getDefaultParserFactory()
returns the default parser factory used by Nfa
s
to obtain a default ReParser
.
setReParser(monq.jfa.ReParser)
public java.lang.String escape(java.lang.String s)
escapes special characters in s
according to the
rules of the current ReParser
set for
this
.
public void escape(java.lang.StringBuilder out, java.lang.CharSequence in, int startAt)
is only a pass-through to
getReParser().escape(...)
.
public java.lang.String specialChars()
is a shortcut for .getReParser().specialChars()
public void setMemoryForSpeedTradeFactor(float f)
when creating DFA transitions during NFA to DFA compilation, the transition tables use different implementations, depending on how dense the transition table is. The fastest transition table uses an array that is indexed by the transition character. For sparse tables, this is a significant memory overhead, therefore a denser table can be used which uses a binary search for lookup.
The factor given defines how much larger the fast table may be, before the denser table is used. If the factor is 1.0, the implementation with the smaller estimated memory footprint is used. The larger the factor is, the more likely is it that even sparse tables still use the faster implementation. With a factor of several thousand, nearly all transition tables will be fast, possibly using up a lot of heap space.
The default is 1.0.public double getMemoryForSpeedTradeFactor()
public ReParser getReParser()
returns the regular expression parser currently used by this
Nfa
. If none was explicitely set, getDefaultParserFactory()
is used to set the parser, before
it is returned.
public Nfa setReParser(ReParser reParser)
sets the parser to be used by this Nfa
.
this
public Nfa addAction(FaAction a)
adds an action to the Nfa (expert only). This is done by adding an epsilon transition of the last state of this Nfa to a new last state which will carry the action.
If the automaton has free subgraphs, these will be bound to this action.
Note:If the automaton has already one or more actions (i.e. stop states), this operation may easily make the automaton uncompilable due to conflicting actions.
public void toDot(java.io.PrintStream out)
public void toDot(java.lang.String filename)
public boolean markAsSub()
marks this automaton as a reporting subautomaton. It will of
course only become a subautomaton, if further operators
are applied, or if this automaton is combined with another
Nfa
.
Marking subgraphs for reporting matching substrings is
expensive, therefore the number of subgraphs per action is
limited to a single byte. This method returns null
if 256 subgraph markings are exceeded before an action is added
to this Nfa
. In this case, the subgraph marking
requested was not performed. The Nfa
is not touched
in any way in this case.
this
if the subgraph marking could be
performed or null
if the marking could not be
performed, because the number of possible subgraphs was exceeded.public Nfa optional()
transforms this finite automaton to match the empty string
too. All actions defining stop states are kept as they are.
This method implements the operator ?
.
this
.public Nfa star()
applies the Kleene closure (*
) operator to the NFA.
public Nfa plus()
applies the +
operator to the NFA.
public Nfa completeToSkip(FaAction noMatchAction) throws CompileDfaException
extends the automaton with an additonal automaton which matches everything but all non-empty prefixes of this automaton.
For example, if this automaton matches "abc", the addional automaton matches every string that does not contain "a", "ab" or "abc". The additional automaton will have the given action attached.
The resulting, combined automaton will nearly never hit a no-match situation when used as a filter. The only exception is when the input ends in a true prefix of the original automaton. Informally, the operation performed is as follows:
this.or(this.copy().allPrefixes().not())
noMatchAction
- the action to call when the additional automaton
runs into a match, i.e. when the this
does not match.CompileDfaException
- if this
automaton does not compile.public void allPrefixes() throws CompileDfaException
CompileDfaException
- if the internally called compile operation
throws this exceptionpublic Nfa invert() throws CompileDfaException
inverts the automaton, such that the resulting automaton will match the set-complement of the set of strings matched by the given automaton.
Note: Except for rare circumstances you rather want to
use not()
instead. The behaviour of this method, while
theoretically sound, does not implement the request match
everthing but X for some regular expression X.
If the automaton has no actions assigned yet, its natural stop state defines the set of strings matched. If it has already actions assigned, those add to the set of recognized strings and no difference is made between them. The resulting automaton will not have any actions.
this
CompileDfaException
- if the automaton can not be
compiled. Compilation is a necessary step to invert the automaton.public Nfa not() throws CompileDfaException
implements the request match everthing but X. While
invert()
implements the more theoretical set complement
on the set of recognized strings of an automaton, this method
implements what most people intuitively expect when asking to
match anything but the strings recognized by the given
automaton.
As an example consider the regular expression
re="[A-Z]+"
which matches all strings
which are completely uppercase.After application of
invert()
, the result, namely
"([A-Z]+)~"
, will match all strings
which are not completely uppercase.
In particular "xHallo"
or "Hallo"
will
be matched.
To the contrary, "([A-Z]+)^
,
i.e. not()
applied to re
, will not
match them. In fact "(re)^"
is a shortcut for
"((.*re.*)?)~"
, which matches neither the
empty string nor any string containing re
.
The following is an example application of
not()
. It shows a way to enclose words
separated by white space with angle brackets:
Nfa n = new Nfa("([ \t\n\r]+)^", new AbstractFaAction.Printf("[%0]"))
The caret ("^"
) is the operator applying
not()
. The resulting automaton will call the
callback for every longest prefix of the input string which does
not contain a white space character. The callback will then
enclose the matching text in brackets.
CompileDfaException
public Nfa shortest() throws CompileDfaException
If this
has not stop state yet, the operation
is actually not well defined. Therefore and to avoid
counterintuitive results, the last state of the Nfa is treated
like a stop state, even if it is none.
CompileDfaException
public Nfa copy()
public Nfa seq(java.lang.CharSequence regex, FaAction a) throws ReSyntaxException
extend the current automaton to recognize regex
as a suffix of the strings already recognized and arrange for the
given action to be called if the suffix was recognized. If the
action is null
, no action will be added.
ReSyntaxException
public Nfa seq(java.lang.CharSequence s) throws ReSyntaxException
calls seq(CharSequence,FaAction)
with
null
as the 2nd parameter.
ReSyntaxException
public Nfa seq(Nfa other)
Concatenates other
to this
such that a
sequence of characters vw is matched where v is
matched by this
and w is matched by
other
. The other
Nfa
is
initialized to the same state as if just constructed with Nfa()
.
FIX ME: unassigned reporting subexpressions in other may create a mess
public Nfa or(java.lang.CharSequence regex, FaAction action) throws ReSyntaxException
adds the given regular expression re
to the
automaton and associates it with the given action.
ReSyntaxException
public Nfa or(java.lang.CharSequence re) throws ReSyntaxException
adds the given regular expression to the automaton (expert
only). To make it effective, addAction(monq.jfa.FaAction)
should be called
before compilation into a Dfa
.
ReSyntaxException
public Nfa or(Nfa other)
joins the other Nfa
into this automaton while
keeping all stop states and assigned actions. The
other
Nfa
is initialized to the same
state as if just created constructed with Nfa()
.
FIX ME: unassigned reporting subexpressions in other may create a mess
public Dfa compile(DfaRun.FailedMatchBehaviour fmb, FaAction eofAction) throws CompileDfaException
compiles this non-deterministic finite automaton into a deterministic finite automaton. The given automaton is not changed.
Note: under most circumstances it is not a good idea to
compile an Nfa
which has no actions added,
i.e. which has no stop state, because this automaton matches
nothing, not even the empty string. Nevertheless, it is not
forbidden to compile such an automaton into a
Dfa
. It can even be used within a
DfaRun
but will never produce any output.
fmb
- describes the initial behaviour of a DfaRun
which operates the Dfa
created hereeofAction
- describes the action to run at the end of input
when the resulting Dfa is used with a DfaRun
.CompileDfaException
- if stop states with differing actions
are found that can not be merged by their mergeWith()
methods. To get rid of the
exception, either change the regular expressions such that they
do not have matches in common, or make sure the actions involved
can be merged. One way of merging is to use extend AbstractFaAction
and prioritize the assigned actions.public Dfa compile(DfaRun.FailedMatchBehaviour fmb) throws CompileDfaException
compiles this
into a Dfa
with the given
behaviour for non-matching input.
CompileDfaException
compile(DfaRun.FailedMatchBehaviour,FaAction)