Skip navigation links
monq-2.0.0

Package monq.jfa

Implements Finite Automata, deterministic and nondeterministic, with an engine for high throughput filtering of text.

See: Description

Package monq.jfa Description

Implements Finite Automata, deterministic and nondeterministic, with an engine for high throughput filtering of text. There is a complete example program available the main parts of which are described below.

A typical use of this package involves the following steps.

  1. Creation of an Nfa comprising one or more regular expressions to which FaAction callbacks are attached. The Nfa defined below matches identifiers and positive integers. It has actions attached to tag them with <id> and <num> respectively.
    import monq.jfa.*;
    import monq.jfa.actions.*;
    Nfa nfa = 
      new Nfa("[A-Za-z_][A-Za-z0-9_]*", new Printf("<id>%0</id>"))
          .or("[0-9]+", new Printf("<num>%0</num>"));
  2. Compilation of the Nfa into a Dfa.
    Dfa dfa = nfa.compile(DfaRun.UNMATCHED_COPY); 
    A Dfa is a data structure that allows to match the regular expessions which where stuck into the Nfa. The running time depends only linearly on the length of the input string. The parameter given to compile() specifies how non-matching input shall be treated. In this case it is copied unchanged to the output.
  3. To operate the Dfa, it must be wrapped in a DfaRun which applies the Dfa to input data, e.g. the standard input stream.
    DfaRun run = new DfaRun(dfa, new ReaderCharSource(System.in));
  4. Finally, the DfaRun is asked to filter to System.out:
    run.filter(System.out);
    There are also methods to read() from a DfaRun.

Now you may want to compile and run the slightly more elaborate example.

Feature Highlights

Computational Model

The model of operation implemented by this package was inspired by LEX — Lexical Analyzer Generator described in 1975 by M.E.Lesk and E.Schmidt in the UNIX Programmer's Manual and its more recent GNU incarnation flex. This means in particular that the software operates as a filter which changes text as it passes through the machinery. Filtering, i.e. changing text, is initiated whenever a regular expression encoded in the machinery matches. The filter operation itself is carried out by callbacks invoked for every matching piece of text. Apart from changing the text, the callback may count occurences, record things in a database, or do whatever can be done with a piece of java code.

The interface of such a callback is FaAction. Normally you just need to extend AbstractFaAction and implement the invoke() method. Commonly used callbacks can be found in the package monq.jfa.actions.

While this implementation uses fast deterministic finite automata (DFA) for matching, it nevertheless implements the tracking of submatches. A description of the formal behaviour of the method used shows that it simply does not work in the general case, but it works quite well in many practical applications. Nevertheless care must be taken when using submatches, as sometimes results are a little surprising. For more information see the description of subexpressions as well as what to do if they don't work.

Skip navigation links
monq-2.0.0