public class ByteCharSource extends EmptyCharSource
implements a CharSource
which reads bytes and converts
them to characters of a given character set while keeping track of
byte positions of converted characters within the input stream.
Because many character encodings use a variable number of coding
bytes for one character, counting characters is not
sufficient to keep track of the exact position of a character or
string in a byte oriented input stream. Objects of this class can
be used as an input source to a DfaRun
in cases where the
exact byte postions of matches are needed. Byte positions of
characters delivered are kept in a sliding window and can be
retrieved with position()
.
One particular problem is posed by the pushBack()
method required by the CharSource
interface. Its contract does not require the pushed back
characters to be the same as those previously read. Using this class,
however, this is required. Since Dfa.match()
obeys to this rule, a ByteCharSource
can
safely be used as the source for a DfaRun
as long as the
callbacks employed don't violate the rule.
Constructor and Description |
---|
ByteCharSource(java.io.InputStream in)
creates the same setup as
ByteCharSource(ReadableByteChannel) . |
ByteCharSource(java.io.RandomAccessFile in)
creates the same setup as
ByteCharSource(ReadableByteChannel) . |
ByteCharSource(java.nio.channels.ReadableByteChannel source)
creates a
ByteCharSource to read from the given
channel. |
ByteCharSource(java.lang.String filename)
creates the same setup as
ByteCharSource(ReadableByteChannel) . |
Modifier and Type | Method and Description |
---|---|
void |
close()
closes the underlying
ReadableByteChannel . |
long |
position(int charNo)
returns the byte position counted from the start of the input
stream for the character referenced by
charNo . |
void |
pushBack(java.lang.StringBuilder buf,
int start)
while satisfying
CharSource.pushBack(java.lang.StringBuilder, int) , this method also
adjusts the internal referencing the byte position of the
the most recently delivered character. |
int |
read()
returns a single character or -1 to indicate end of file.
|
ByteCharSource |
setDecoder(java.nio.charset.CharsetDecoder dec)
set the decoder to be used to decode the incoming bytes into
char . |
ByteCharSource |
setInputBufferSize(int size)
set the size of the input buffer.
|
ByteCharSource |
setSource(java.nio.channels.ReadableByteChannel source)
sets the source to be read by
this . |
ByteCharSource |
setWindowSize(int size)
sets the size of the sliding window which keeps byte
positions for characters recently delivered by
read() . |
pop, pop
public ByteCharSource(java.nio.channels.ReadableByteChannel source)
ByteCharSource
to read from the given
channel. The decoder will be taken from the system property
"file.encoding"
or default to
"UTF-8"
(see setDecoder()
). The
input buffer size is set to 4096 (see setInputBufferSize()
) and the window keeping track of file
positions of characters will have a size of 1000 (see setWindowSize()
).public ByteCharSource(java.io.InputStream in)
ByteCharSource(ReadableByteChannel)
.public ByteCharSource(java.io.RandomAccessFile in)
ByteCharSource(ReadableByteChannel)
.public ByteCharSource(java.lang.String filename) throws java.io.FileNotFoundException
ByteCharSource(ReadableByteChannel)
.java.io.FileNotFoundException
public ByteCharSource setSource(java.nio.channels.ReadableByteChannel source)
sets the source to be read by this
.
Hint: A ReadableByteChannel
can be created
for any InputStream
with
java.nio.channels.Channels.newChannel()
.
this
to make it easy to call more
configuration functions right away.public ByteCharSource setWindowSize(int size)
sets the size of the sliding window which keeps byte
positions for characters recently delivered by read()
.
this
to make it easy to call more
configuration functions right away.public ByteCharSource setInputBufferSize(int size)
set the size of the input buffer. Requests of very small size (1, 2 bytes) are adjusted upwards to make sure the buffer is able to hold at least the bytes encoding one character of the currently decoded character set.
this
to make it easy to call more
configuration functions right away.public ByteCharSource setDecoder(java.nio.charset.CharsetDecoder dec)
set the decoder to be used to decode the incoming bytes into
char
.
Hint: For a given character set name a decoder can
always be obtained with
Charset.forName(chsetName)
.
this
to make it easy to call more
configuration functions right away.public void close() throws java.io.IOException
closes the underlying ReadableByteChannel
. Normally
you want to call this if you passed in a file name to the
constructor.
java.io.IOException
public void pushBack(java.lang.StringBuilder buf, int start)
while satisfying CharSource.pushBack(java.lang.StringBuilder, int)
, this method also
adjusts the internal referencing the byte position of the
the most recently delivered character. To do so, this method only
looks at the number of characters pushed back, thereby assuming
that they are the very same characters previously delivered by
read()
.
pushBack
in interface CharSource
pushBack
in class EmptyCharSource
public long position(int charNo) throws UnavailablePositionException
returns the byte position counted from the start of the input
stream for the character referenced by
charNo
. Parameter charNo
is interpreted
relative to the first character not yet delivered. In particular,
the character most recently delivered by read()
has
charNo==-1
. In general, to find the start of a
recently read string of N characters, call
position(-N)
. Calling position(0)
is
also always possible and denotes the byte position of the next
character to be delivered.
Because only a sliding window of character positions is kept,
this method may fail with a UnavailablePositionException
when charNo
points
outside this window. If the window size is N and pushBack()
is never called, the safe range for
charNo
is -N+1 to 0. If, however, m
characters are pushed back, this must be adjusted to
-N+1+m...+m. Subsequent reads move forward
again in the window neutralizing the effect of the pushback after
m characters have been read.
UnavailablePositionException
public int read() throws java.io.IOException
CharSource
read
in interface CharSource
read
in class EmptyCharSource
java.io.IOException