Static Public Member Functions | |
static String | clean (String html) |
static String | removePunctuation (String text) |
Static Package Functions | |
static String | decodeEntities (String encoded) |
Static Package Attributes | |
static Matcher | punctuationMatcher = Pattern.compile("(^[^\\p{N}\\p{L}])|([^\\p{N}\\p{L}]$)").matcher("") |
static Matcher | entityMatcher = Pattern.compile("&[^;]+;").matcher("") |
Definition at line 13 of file HTMLUtils.java.
static String org.hfbk.util.HTMLUtils.clean | ( | String | html | ) | [static] |
clean HTML to plain text.
strips all tags, tabs and newlines. decodes hex entities to utf8.
html |
Definition at line 25 of file HTMLUtils.java.
References org.hfbk.util.HTMLUtils.decodeEntities().
Here is the call graph for this function:
static String org.hfbk.util.HTMLUtils.removePunctuation | ( | String | text | ) | [static] |
removes punctuation around words
Definition at line 41 of file HTMLUtils.java.
References org.hfbk.util.HTMLUtils.punctuationMatcher.
static String org.hfbk.util.HTMLUtils.decodeEntities | ( | String | encoded | ) | [static, package] |
Definition at line 48 of file HTMLUtils.java.
References org.hfbk.util.HTMLUtils.entityMatcher.
Referenced by org.hfbk.util.HTMLUtils.clean().
Here is the caller graph for this function:
Matcher org.hfbk.util.HTMLUtils.punctuationMatcher = Pattern.compile("(^[^\\p{N}\\p{L}])|([^\\p{N}\\p{L}]$)").matcher("") [static, package] |
Definition at line 39 of file HTMLUtils.java.
Referenced by org.hfbk.util.HTMLUtils.removePunctuation().
Matcher org.hfbk.util.HTMLUtils.entityMatcher = Pattern.compile("&[^;]+;").matcher("") [static, package] |
Definition at line 46 of file HTMLUtils.java.
Referenced by org.hfbk.util.HTMLUtils.decodeEntities().