|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectmoj.util.TagStripper
public class TagStripper
A small utility class that contains methods for stripping tags from documents. It tries to retain the structure of the original text by converting line breaks, paragraphs, headings etc to ordinary (hard) single and double line breaks. It also strips comments and converts entities. The design for this class is heavily influenced by an example in "Web Client Programming with Java" by Elliotte Rusty Harold.
Field Summary | |
---|---|
static java.util.regex.Pattern |
aamp
|
static java.util.regex.Pattern |
agt
|
static java.util.regex.Pattern |
alt
|
static java.util.regex.Pattern |
aquot
|
static java.util.regex.Pattern |
aring
|
static java.util.regex.Pattern |
Aring
|
static java.util.regex.Pattern |
auml
|
static java.util.regex.Pattern |
Auml
|
static java.util.regex.Pattern |
ouml
|
static java.util.regex.Pattern |
Ouml
|
static java.util.regex.Pattern |
xamp
|
static java.util.regex.Pattern |
xaring
|
static java.util.regex.Pattern |
xAring
|
static java.util.regex.Pattern |
xauml
|
static java.util.regex.Pattern |
xAuml
|
static java.util.regex.Pattern |
xgt
|
static java.util.regex.Pattern |
xlt
|
static java.util.regex.Pattern |
xouml
|
static java.util.regex.Pattern |
xOuml
|
static java.util.regex.Pattern |
xquot
|
Constructor Summary | |
---|---|
TagStripper()
|
Method Summary | |
---|---|
static java.lang.String |
fromControlCodes(java.lang.String text)
Replaces all C0/C1 control codes in text with its abbreviated name. |
static java.lang.String |
fromXMLentities(java.lang.String text)
Convert XML entities (< > & ") to markup characters (< > & "). |
static java.lang.String |
removeControlCodes(java.lang.String text)
Replaces all C0/C1 control codes in text with whitespace. |
java.lang.String |
stripHTML(java.io.Reader reader)
Strips HTML tags and comments from the given steam and returns a string while trying to retain the structure of the original text by converting HTML line breaks, paragraphs, headings etc to ordinary (hard) single and double line breaks. |
java.lang.String |
stripHTML(java.lang.String htmltext)
Strips HTML tags and comments from the given string and returns a string while trying to retain the structure of the original text by converting HTML line breaks, paragraphs, headings etc to ordinary (hard) single and double line breaks. |
static java.lang.String |
toXMLentities(java.lang.String text)
Convert markup characters (< > & ") to XML entities (< > & "). |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.util.regex.Pattern xlt
public static final java.util.regex.Pattern xgt
public static final java.util.regex.Pattern xamp
public static final java.util.regex.Pattern xquot
public static final java.util.regex.Pattern xaring
public static final java.util.regex.Pattern xauml
public static final java.util.regex.Pattern xouml
public static final java.util.regex.Pattern xAring
public static final java.util.regex.Pattern xAuml
public static final java.util.regex.Pattern xOuml
public static final java.util.regex.Pattern alt
public static final java.util.regex.Pattern agt
public static final java.util.regex.Pattern aamp
public static final java.util.regex.Pattern aquot
public static final java.util.regex.Pattern aring
public static final java.util.regex.Pattern auml
public static final java.util.regex.Pattern ouml
public static final java.util.regex.Pattern Aring
public static final java.util.regex.Pattern Auml
public static final java.util.regex.Pattern Ouml
Constructor Detail |
---|
public TagStripper()
Method Detail |
---|
public java.lang.String stripHTML(java.lang.String htmltext)
htmltext
- the string that is to be stripped from HTML
public java.lang.String stripHTML(java.io.Reader reader)
reader
- the stream that is to be stripped from HTML
public static java.lang.String toXMLentities(java.lang.String text)
text
- String
(possibly) containing markup characters
String
with markup characters converted to XML entitiespublic static java.lang.String fromXMLentities(java.lang.String text)
text
- String
(possibly) containing XML entities
String
with XML entities converted to markup characterspublic static java.lang.String removeControlCodes(java.lang.String text)
text
with whitespace.
text
- String
to remove C0/C1 control codes from
String
with C0/C1 control removedpublic static java.lang.String fromControlCodes(java.lang.String text)
text
with its abbreviated name.
text
- String
to replace C0/C1 control codes in
String
with C0/C1 control replaced
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |