EncodingDetector (monq packages)

java.lang.Object
- monq.stuff.EncodingDetector

```
public class EncodingDetector
extends java.lang.Object
```
provides static methods to guess the character encoding used in an InputStream supposedly containing XML or HTML.

Note: Only parts of the recommendation mentioned below are implemented.

Version:

$Revision: 1.3 $, $Date: 2005-07-08 13:01:26 $

Author:

© Harald Kirsch

See Also:

XML recommendation on guessing the encoding

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`defaultEnc` the platform's default encoding determined by opening an `InputStreamReader` on `System.in` and asking for its encoding.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static java.lang.String`	`detect(java.io.InputStream in)` tries to detect the character encoding used in the given `InputStream` within the first 1000 bytes.
`static java.lang.String`	`detect(java.io.InputStream in, int limit, java.lang.String deflt)` reads up to `limit` bytes from the given input stream to find out the character encoding used.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - defaultEnc
```
public static final java.lang.String defaultEnc
```
    the platform's default encoding determined by opening an InputStreamReader on System.in and asking for its encoding.
- Method Detail
  - detect
```
public static java.lang.String detect(java.io.InputStream in,
                                      int limit,
                                      java.lang.String deflt)
                               throws java.io.IOException
```
    reads up to limit bytes from the given input stream to find out the character encoding used. If no encoding can be derived, the given default value is returned.
    
    This method implements a partial and slightly modified version of the recommendations described in the XML specification.
    
    With a Byte Order Mark: These seem to apply also to HTML. If any of those is recognized, this method returns immediatly with the appropriate encoding name. However, for UCS-4 with unusual octet order, deflt is returned in lack of a useful Java encoding name. The possible return values in this case are UTF-32BE, UTF-32LE, UTF-16BE, UTF-16LE, UTF-8.
    
    Without a Byte Order Mark: To cover HTML too, first the position of the '<' byte is detected in the first four byte. This is taken as an indication of how many bytes have to be read per character and which of those contains the ASCII equivalent of the character — at least until the encoding name was found. If no 0x3C is found, deflt is returned immediatly. In particular this means that EBCDIC is not handled by this implemention.
    
    After the byte setup has been guessed, the input stream is scanned for up to limit bytes to either find an XML declaration or an HTML meta tag which describes the content type and character set used. The possible return values are whatever was found as either encoding (XML) or as charset (HTML) in the file.
    
    Under all circumstances, the InputStream is reset to it start before this method returns. This requires that the InputStream supports the mark() method.
    
    Parameters:
    
    in - the input stream to read
    
    limit - maximum number of bytes to read for guessing
    
    deflt - a default value to return when nothing can be guessed; consider passing in defaultEnc
    
    Returns:
    
    the encoding guessed or the given deflt.
    
    Throws:
    
    java.lang.IllegalArgumentException - if in does not support the mark() method.
    
    java.io.IOException
  - detect
```
public static java.lang.String detect(java.io.InputStream in)
                               throws java.io.IOException
```
    tries to detect the character encoding used in the given InputStream within the first 1000 bytes. It returns the defaultEnc if no encoding can be guessed.
    
    Throws:
    
    java.io.IOException
    
    See Also:
    
    detect(InputStream,int,String)

Class EncodingDetector

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

defaultEnc

Method Detail

detect

detect