How to get numbers out of string?

35,102

Solution 1

StreamTokenizer is outdated, is is better to use Scanner, this is sample code for your problem:

    String s = "$23.24 word -123";
    Scanner fi = new Scanner(s);
    //anything other than alphanumberic characters, 
    //comma, dot or negative sign is skipped
    fi.useDelimiter("[^\\p{Alnum},\\.-]"); 
    while (true) {
        if (fi.hasNextInt())
            System.out.println("Int: " + fi.nextInt());
        else if (fi.hasNextDouble())
            System.out.println("Double: " + fi.nextDouble());
        else if (fi.hasNext())
            System.out.println("word: " + fi.next());
        else
            break;
    }

If you want to use comma as a floating point delimiter, use fi.useLocale(Locale.FRANCE);

Solution 2

Try this:

String sanitizedText = text.replaceAll("[^\\w\\s\\.]", "");

SanitizedText will contain only alphanumerics and whitespace; tokenizing it after that should be a breeze.

EDIT

Edited to retain the decimal point as well (at the end of the bracket). . is "special" to regexp so it needs a backslash escape.

Solution 3

This worked for me :

String onlyNumericText = text.replaceAll("\\\D", "");

Solution 4

    String str = "1,222";
    StringBuffer sb = new StringBuffer();
    for(int i=0; i<str.length(); i++)
    {
        if(Character.isDigit(str.charAt(i)))
            sb.append(str.charAt(i));
    }
    return sb.toString()
Share:
35,102
Mr Morgan
Author by

Mr Morgan

Updated on April 03, 2020

Comments

  • Mr Morgan
    Mr Morgan about 4 years

    I'm using a Java StreamTokenizer to extract the various words and numbers of a String but have run into a problem where numbers which include commas are concerned, e.g. 10,567 is being read as 10.0 and ,567.

    I also need to remove all non-numeric characters from numbers where they might occur, e.g. $678.00 should be 678.00 or -87 should be 87.

    I believe these can be achieved via the whiteSpace and wordChars methods but does anyone have any idea how to do it?

    The basic streamTokenizer code at present is:

            BufferedReader br = new BufferedReader(new StringReader(text));
            StreamTokenizer st = new StreamTokenizer(br);
            st.parseNumbers();
            st.wordChars(44, 46); // ASCII comma, - , dot.
            st.wordChars(48, 57); // ASCII 0 - 9.
            st.wordChars(65, 90); // ASCII upper case A - Z.
            st.wordChars(97, 122); // ASCII lower case a - z.
            while (st.nextToken() != StreamTokenizer.TT_EOF) {
                if (st.ttype == StreamTokenizer.TT_WORD) {                    
                    System.out.println("String: " + st.sval);
                }
                else if (st.ttype == StreamTokenizer.TT_NUMBER) {
                    System.out.println("Number: " + st.nval);
                }
            }
            br.close(); 
    

    Or could someone suggest a REGEXP to achieve this? I'm not sure if REGEXP is useful here given that any parding would take place after the tokens are read from the string.

    Thanks

    Mr Morgan.