Java simple sentence parser
13,075
Based on @Jarrod Roberson's answer, I have created a util method that uses BreakIterator and returns the list of sentences.
public static List<String> tokenize(String text, String language, String country){
List<String> sentences = new ArrayList<String>();
Locale currentLocale = new Locale(language, country);
BreakIterator sentenceIterator = BreakIterator.getSentenceInstance(currentLocale);
sentenceIterator.setText(text);
int boundary = sentenceIterator.first();
int lastBoundary = 0;
while (boundary != BreakIterator.DONE) {
boundary = sentenceIterator.next();
if(boundary != BreakIterator.DONE){
sentences.add(text.substring(lastBoundary, boundary));
}
lastBoundary = boundary;
}
return sentences;
}
Author by
Admin
Updated on June 27, 2022Comments
-
Admin almost 2 years
is there any simple way to create sentence parser in plain Java without adding any libs and jars.
Parser should not just take care about blanks between words, but be more smart and parse: . ! ?, recognize when sentence is ended etc.
After parsing, only real words could be all stored in db or file, not any special chars.
thank you very much all in advance :)