public class DfaRun extends EmptyCharSource implements java.io.Serializable
A DfaRun
is used to apply a Dfa
to a
stream of characters. After creation of a DfaRun
object, invoke one of its read()
or filter()
methods to filter the input data according to the patterns encoded
in the Dfa
and the FaAction
callback objects
attached to them.
The default behaviour of the machine on non-matching input is
initialized from whatever was specified when the Dfa
was compiled
. Initialization happens in the
constructor as well as every time one of the setIn()
methods is called. The method setOnFailedMatch()
should normally only be used in FaAction
callbacks.
Use field clientData
to store data to communicate
between different action callbacks. Don't let your action callbacks
communicate via a common object allocated alongside the
Dfa
, because this does not allow to share
Dfa
s between threads.
Set field collect
to true
in an action
callback to prevent the read()
methods from
returning. Thereby data already filtered is kept from shipping and
can be changed by further action callbacks. Eventually, however, an
action callback should set collect
to
false
again to allow the read()
method to
finally ship the filtered data.
A Dfa
that matches the empty string should not be
used in a DfaRun
, because this is usually a bug in the
regular expressions used. As soon as only the empty string
matches, methods like filter()
enter an infinitie
loop because they keep matching without reading input. Use Dfa.matchesEmpty()
if unsure whether your Dfa
is safe
to use.
It is safe to change the Dfa
with setDfa()
at any time within an action callback. This is
particularly useful to parse different parts of input with
different automata.
Note: This class is not synchronized. Objects of this
class should only be used within one thread at a time. However, the
Dfa
operated may be shared between threads, given that the
FaAction
callbacks in the Dfa
contain no
internal state. For the callbacks to communicate, use clientData
.
Hint For maximum speed try to complete your set of
regular expressions such that every piece of input is
matched. Don't rely on DfaRun
's feature to handle
unmatched input. Handling unmatched input is less efficient than
handling matches.
Modifier and Type | Class and Description |
---|---|
static class |
DfaRun.FailedMatchBehaviour
defines typed enumerated values which describe
what a
DfaRun shall do in its read() and filter()
functions, if
no match can be found. |
Modifier and Type | Field and Description |
---|---|
java.lang.Object |
clientData
Room for an arbitrary piece of data.
|
boolean |
collect
set this field to
true from an FaAction
callback to prevent the machinery to ship the filtered data. |
static java.lang.String |
EEPSMATCHER
is the error text used in a
IllegalArgumentException
if a DfaRun shall be created with a
Dfa that matches the empty string. |
static FaAction |
EOF
returned by
next() on EOF. |
int |
maxCopy
defines the maximum number of unmatched characters handled in
one chunk when the machinery is operating in
UNMATCHED_COPY mode. |
static DfaRun.FailedMatchBehaviour |
UNMATCHED_COPY
requests the
DfaRun object to copy input not
matched by the DFA to the output. |
static DfaRun.FailedMatchBehaviour |
UNMATCHED_DROP
requests the
DfaRun object to drop (delete) input
not matched by the DFA. |
static DfaRun.FailedMatchBehaviour |
UNMATCHED_THROW
requests the
DfaRun to throw an exception if it
encounters input not matched by the DFA. |
Constructor and Description |
---|
DfaRun(Dfa dfa)
creates a
DfaRun with empty initial input. |
DfaRun(Dfa dfa,
CharSource in)
creates a
DfaRun object to operate the given Dfa . |
Modifier and Type | Method and Description |
---|---|
void |
filter()
run the machine until EOF is hit.
|
void |
filter(java.io.PrintStream out)
reads and filters input, copying it to the output
until EOF is hit.
|
java.lang.String |
filter(java.lang.String sin)
reads and filters the given input and returns the filtered
result.
|
void |
filter(java.lang.StringBuilder out)
reads and filters input, copying it to the output
until EOF is hit.
|
Dfa |
getDfa()
returns the
Dfa operated by this . |
DfaRun.FailedMatchBehaviour |
getFailedMatchBehaviour()
returns the currently active behaviour for unmatched
input.
|
CharSource |
getIn()
returns the currently active input source.
|
int |
matchStart()
is a helper function which should only be called immediately after
calling
next() or read(StringBuilder) to get
the position where the match starts. |
FaAction |
next(java.lang.StringBuilder out)
finds the next match in the current input, appends it to
out and returns the FaAction associated with
the match. |
int |
read()
reads and filters input until at least one character is available
or EOF is hit.
|
boolean |
read(java.lang.StringBuilder out)
delivers filtered data in naturally occuring chunks by
appending to
out . |
boolean |
read(java.lang.StringBuilder out,
int count)
reads and filters input until
out is grown by
count characters. |
void |
setDfa(Dfa dfa)
changes the
Dfa to run. |
void |
setIn(CharSource in)
changes the input source.
|
void |
setOnFailedMatch(DfaRun.FailedMatchBehaviour b)
changes the way how unmatched input is handled.
|
int |
skip()
reads one character immediately from the input source and returns
it without filtering.
|
TextStore |
submatches(java.lang.StringBuilder txt,
int start)
may be called by a callback to
retrieve see submatches.
|
void |
unskip(java.lang.String s)
shoves back characters into the input of the
DfaRun . |
void |
unskip(java.lang.StringBuilder s,
int startAt)
shoves back characters into the input of the
DfaRun while deleting them from the given
StringBuilder . |
void |
unskip(TextStore ts,
int start)
shoves back characters into the input of the
DfaRun . |
pop, pop, pushBack
public static final DfaRun.FailedMatchBehaviour UNMATCHED_COPY
requests the DfaRun
object to copy input not
matched by the DFA to the output.
public static final DfaRun.FailedMatchBehaviour UNMATCHED_DROP
DfaRun
object to drop (delete) input
not matched by the DFA.public static final DfaRun.FailedMatchBehaviour UNMATCHED_THROW
DfaRun
to throw an exception if it
encounters input not matched by the DFA.public static final java.lang.String EEPSMATCHER
IllegalArgumentException
if a DfaRun
shall be created with a
Dfa
that matches the empty string.public boolean collect
set this field to true
from an FaAction
callback to prevent the machinery to ship the filtered data. It
allows action callbacks invoked
later to
be sure that their first argument still contains previously
filtered data. Make sure this field is set to false
by some other action callback as soon as possible, because
otherwise filtered data will pile up unneccessarily in memory.
public int maxCopy
defines the maximum number of unmatched characters handled in
one chunk when the machinery is operating in UNMATCHED_COPY
mode. When operating on a stretch of text that
contains no match at all, the machine runs in a tight inner loop
to find the next match as fast as possible. While doing so, no
output is delivered by the filter()
and
read()
methods because they call next()
, the method that runs the tight inner loop.
To prevent against memory overflow for really long stretches
of non-matching text, maxCopy
puts an upper
limit on the characters collected before next()
forcibly returns, even if no match is yet found. Except in very
special cases there should be no need to ever change this value
from its default of 8192. Any value ≤ 1 will result
in single character delivery by next()
. For the
filter()
methods this seems to
have a performance impact compared to large enough values of
30%.
public java.lang.Object clientData
Room for an arbitrary piece of data. If the callbacks of the
Dfa
want to communicate with each other — even
if only to count instances in the input stream — this field
should be used to store the data so that the Dfa
itself is kept thread safe. Storing e.g. counts in the callback
object itself would make the Dfa
no longer thread
safe.
public DfaRun(Dfa dfa, CharSource in)
creates a DfaRun
object to operate the given Dfa
. The behaviour on unmatched input and on EOF is initialized
from the Dfa
.
Because in nearly all cases it is a mistake to run a Dfa
that matches the empty string, such a Dfa
is
not allowed and throws an
IllegalArgumentException
. In the rare case that
a Dfa
matching the empty string must be run, you
have to first create a DfaRun
with a proper
Dfa
and then replace it with setDfa(monq.jfa.Dfa)
. It is
a hassle, but this is intended.
dfa
- is the automaton to operate initially. Callbacks may
change it.in
- is the initial input source.java.lang.IllegalArgumentException
- if the given dfa
matches the empty string, i.e. if Dfa.matchesEmpty()
returns
true
.setOnFailedMatch(monq.jfa.DfaRun.FailedMatchBehaviour)
public DfaRun(Dfa dfa)
creates a DfaRun
with empty initial input. This
method calls the 2 parameter constructur with an empty
CharSource
.
DfaRun(Dfa,CharSource)
public void setIn(CharSource in)
changes the input source. Within a thread, this is permissable at
all times because a DfaRun
object does not buffer
input data between calls to any of its methods.
Apart from (re)initializing the input source, this method initializes two other parameters:
Dfa
operated (see setOnFailedMatch()
).public CharSource getIn()
returns the currently active input source.
.public void setDfa(Dfa dfa)
changes the Dfa
to run. In addition the way to handle
unmatched input is (re)initialized from the given Dfa
.
If the given Dfa
matches the empty string,
reading and filtering methods may enter an infinite loop. Either
check with Dfa.matchesEmpty()
or know what you are
doing.
public void setOnFailedMatch(DfaRun.FailedMatchBehaviour b)
changes the way how unmatched input is handled. Any of the
values UNMATCHED_COPY
, UNMATCHED_DROP
or UNMATCHED_THROW
may be used. The behaviour is automatically
(re)set by setIn()
and by setDfa()
to the value found in the Dfa
operated.
This purpose of this method is rather to allow callbacks of
the Dfa
to change the handling of unmatched input
temporarily.
public DfaRun.FailedMatchBehaviour getFailedMatchBehaviour()
returns the currently active behaviour for unmatched input.
public int matchStart()
is a helper function which should only be called immediately after
calling next()
or read(StringBuilder)
to get
the position where the match starts. This is only needed when the
machine is in UNMATCHED_COPY
mode, because otherwise the
match will be the first thing appended to the
StringBuilder
given to next()
or
read()
.
Hint: When using this method together with
read(StringBuilder)
, be aware that the callback
handling the match is in principle allowed to delete characters
even before the value returned here, rendering the returned value
completely useless. — Know your callbacks!
public int skip() throws java.io.IOException
read()
, these are not
touched and will be used in the next call to one of the
read()
functions.java.io.IOException
public void unskip(java.lang.StringBuilder s, int startAt)
shoves back characters into the input of the
DfaRun
while deleting them from the given
StringBuilder
. The characters will be the first to be
read when the machine performs the next match, e.g. when read(java.lang.StringBuilder)
is called.
public void unskip(java.lang.String s)
shoves back characters into the input of the
DfaRun
.
Warning: Do not use this method in time critical
applications. It calls the other unskip method with a freshly
created StringBuilder
.
unskip(StringBuilder, int)
public void unskip(TextStore ts, int start)
shoves back characters into the input of the
DfaRun
. This method simply applies TextStore.drain()
to the input of
this
. Consequently, start
may be
negative to indicate a suffix of ts
to be pushed
back.
public TextStore submatches(java.lang.StringBuilder txt, int start)
may be called by a callback to
retrieve see submatches. Retrieving
submatches must be
done before the match is changed in any way. A typical call
within an FaAction
looks like
public void invoke(StringBuilder out, int start, DfaRun r) throws CallbackException { { TextStore ts = r.submatches(out, start); ... }
Parameter txt
is not changed in any way.
txt
- must contain the full match starting at position
start
. It may contain more characters.start
- is the position where the full match starts within
txt
TextStore
that contains the whole match as
part 0 and submatches as subsequent parts. The return value is
private to this
and its contents may only be used
locally in a callback. After returning from the callback, the
contents of the result may soon change.public FaAction next(java.lang.StringBuilder out) throws java.io.IOException
finds the next match in the current input, appends it to
out
and returns the FaAction
associated with
the match. Input is read until a match is found, maxCopy
is reached or EOF is hit. Non-matching input is handled
according to setOnFailedMatch()
. In
particular:
UNMATCHED_COPY
maxCopy
non-matching characters in front of the match. If
maxCopy
is reached before the match, no matching
text is returned, only the non-matching characters. In this
case the return value is null
, and should
maxCopy
be ≤ 1, then 1 character is always
delivered. If a match is found before maxCopy
is
reached, the match is appended to out
. To find
out where the match actually starts, call matchStart()
.UNMATCHED_DROP
out
.UNMATCHED_THROW
NomatchException
to be thrown. No text will
be appended to out
and the offenting text will still
be available in the CharSource
serving as input to
this
.Hint: Use this method if you are interested only in a
simple tokenization of the input. The actions returned may serve
as the token type. If you however want to apply the actions
returned immediately to the match, then rather use one of the
read
or filter
methods. If you find
yourself using if
statements on the
FaAction
returned, you are definitively doing
something wrong.
Dfa
operated has a action set for EOF
which is not null
this is returned (see Nfa.compile()
).EOF
eofAction
was already delivered or is
null
. The output may have non-matching input that
was found just before EOF.null
UNMATCHED_COPY
is
active and maxCopy
non-matching characters where
found before a match was encountered.java.io.IOException
public boolean read(java.lang.StringBuilder out) throws java.io.IOException
delivers filtered data in naturally occuring chunks by
appending to out
. As long as collect
is
false
, the naturally occuring chunk is determined by
one call to next()
, and the application of the
returned callback. The data may be prefixed with filtered data
not yet delivered by a previous call to read(StringBuilder,int)
. Because the callback may delete the
matching text, the string returned may be empty.
If an FaAction.invoke()
callback
switches to collect==true
, this function keeps
filtering until collect
is reset to
false
by another action callback. This allows the
action callbacks to hold back data from being delivered in cases
where several action callbacks cooperate in the decision about
shipping the data. The action callbacks have access to all the
filtered data held back and may treat it as needed. In particular
the data can be deleted before collect
is switched
back to false
.
Hint: This method can be used to tokenize the input. If
the machine is put into UNMATCHED_DROP
mode, every
call to this method will return exactly one match, treated by the
action bound to it.
true
, if some input was read and
filtered. It also means that this method should be called again
because there might be more input waiting to be processed. Only
if false
is returned, all input is completely
processed and out
was not changed.java.io.EOFException
- if
EOF is hit while collect==true
.CallbackException
- if a callback throws this exceptionjava.io.IOException
public boolean read(java.lang.StringBuilder out, int count) throws java.io.IOException
out
is grown by
count
characters. Less characters are returned if
all input was processed. The field collect
is
hounored in the same way as by read(StringBuilder)
.true
, if at least one character can be
delivered or if count==0
. A return of
false
signals that all input was processed.java.io.IOException
public int read() throws java.io.IOException
collect
is hounored in the same
way as by read(StringBuilder)
.read
in interface CharSource
read
in class EmptyCharSource
int
or -1
to signal EOF.java.io.IOException
public void filter(java.lang.StringBuilder out) throws java.io.IOException
reads and filters input, copying it to the output until EOF is hit.
java.io.IOException
public void filter(java.io.PrintStream out) throws java.io.IOException
reads and filters input, copying it to the output until EOF is hit.
java.io.IOException
public java.lang.String filter(java.lang.String sin) throws java.io.IOException
reads and filters the given input and returns the filtered result.
java.io.IOException
public void filter() throws java.io.IOException
run the machine until EOF is hit. This is useful, when the callbacks don't produce output text but rather perform different work.
Note:This method sets up aStringBuilder
into which filtered data is dumped. The buffer is regularly
cleared, in particular after each match. To prevent this from
happening, use collect
as for the other
filter
methods.java.io.IOException