How to get numbers out of string?
Solution 1
StreamTokenizer is outdated, is is better to use Scanner, this is sample code for your problem:
String s = "$23.24 word -123";
Scanner fi = new Scanner(s);
//anything other than alphanumberic characters,
//comma, dot or negative sign is skipped
fi.useDelimiter("[^\\p{Alnum},\\.-]");
while (true) {
if (fi.hasNextInt())
System.out.println("Int: " + fi.nextInt());
else if (fi.hasNextDouble())
System.out.println("Double: " + fi.nextDouble());
else if (fi.hasNext())
System.out.println("word: " + fi.next());
else
break;
}
If you want to use comma as a floating point delimiter, use fi.useLocale(Locale.FRANCE);
Solution 2
Try this:
String sanitizedText = text.replaceAll("[^\\w\\s\\.]", "");
SanitizedText will contain only alphanumerics and whitespace; tokenizing it after that should be a breeze.
EDIT
Edited to retain the decimal point as well (at the end of the bracket). .
is "special" to regexp so it needs a backslash escape.
Solution 3
This worked for me :
String onlyNumericText = text.replaceAll("\\\D", "");
Solution 4
String str = "1,222";
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++)
{
if(Character.isDigit(str.charAt(i)))
sb.append(str.charAt(i));
}
return sb.toString()
Mr Morgan
Updated on April 03, 2020Comments
-
Mr Morgan about 4 years
I'm using a Java StreamTokenizer to extract the various words and numbers of a String but have run into a problem where numbers which include commas are concerned, e.g. 10,567 is being read as 10.0 and ,567.
I also need to remove all non-numeric characters from numbers where they might occur, e.g. $678.00 should be 678.00 or -87 should be 87.
I believe these can be achieved via the whiteSpace and wordChars methods but does anyone have any idea how to do it?
The basic streamTokenizer code at present is:
BufferedReader br = new BufferedReader(new StringReader(text)); StreamTokenizer st = new StreamTokenizer(br); st.parseNumbers(); st.wordChars(44, 46); // ASCII comma, - , dot. st.wordChars(48, 57); // ASCII 0 - 9. st.wordChars(65, 90); // ASCII upper case A - Z. st.wordChars(97, 122); // ASCII lower case a - z. while (st.nextToken() != StreamTokenizer.TT_EOF) { if (st.ttype == StreamTokenizer.TT_WORD) { System.out.println("String: " + st.sval); } else if (st.ttype == StreamTokenizer.TT_NUMBER) { System.out.println("Number: " + st.nval); } } br.close();
Or could someone suggest a REGEXP to achieve this? I'm not sure if REGEXP is useful here given that any parding would take place after the tokens are read from the string.
Thanks
Mr Morgan.