Verify if String matches a format String

37,904

Solution 1

I don't know of a library that does that. Here is an example how to convert a format pattern into a regex. Notice that Pattern.quote is important to handle accidental regexes in the format string.

// copied from java.util.Formatter
// %[argument_index$][flags][width][.precision][t]conversion
private static final String formatSpecifier
    = "%(\\d+\\$)?([-#+ 0,(\\<]*)?(\\d+)?(\\.\\d+)?([tT])?([a-zA-Z%])";

private static final Pattern formatToken = Pattern.compile(formatSpecifier);

public Pattern convert(final String format) {
    final StringBuilder regex = new StringBuilder();
    final Matcher matcher = formatToken.matcher(format);
    int lastIndex = 0;
    regex.append('^');
    while (matcher.find()) {
        regex.append(Pattern.quote(format.substring(lastIndex, matcher.start())));
        regex.append(convertToken(matcher.group(1), matcher.group(2), matcher.group(3), 
                                  matcher.group(4), matcher.group(5), matcher.group(6)));
        lastIndex = matcher.end();
    }
    regex.append(Pattern.quote(format.substring(lastIndex, format.length())));
    regex.append('$');
    return Pattern.compile(regex.toString());
}

Of course, implementing convertToken will be a challenge. Here is something to start with:

private static String convertToken(String index, String flags, String width, String precision, String temporal, String conversion) {
    if (conversion.equals("s")) {
        return "[\\w\\d]*";
    } else if (conversion.equals("d")) {
        return "[\\d]{" + width + "}";
    }
    throw new IllegalArgumentException("%" + index + flags + width + precision + temporal + conversion);
}

Solution 2

You can use Java regular expressions - please see http://www.vogella.de/articles/JavaRegularExpressions/article.html

Thanks...

Solution 3

Since you do not know the format in advance, you will have to write a method that converts a format string into a regexp. Not trivial, but possible. Here is a simple example for the 2 testcases you have given:

public static String getRegexpFromFormatString(String format)
{
    String toReturn = format;

    // escape some special regexp chars
    toReturn = toReturn.replaceAll("\\.", "\\\\.");
    toReturn = toReturn.replaceAll("\\!", "\\\\!");

    if (toReturn.indexOf("%") >= 0)
    {
        toReturn = toReturn.replaceAll("%s", "[\\\\w]+"); //accepts 0-9 A-Z a-z _

        while (toReturn.matches(".*%([0-9]+)[d]{1}.*"))
        {
            String digitStr = toReturn.replaceFirst(".*%([0-9]+)[d]{1}.*", "$1");
            int numDigits = Integer.parseInt(digitStr);
            toReturn = toReturn.replaceFirst("(.*)(%[0-9]+[d]{1})(.*)", "$1[0-9]{" + numDigits + "}$3");
        }
    }

    return "^" + toReturn + "$";
}

and some test code:

public static void main(String[] args) throws Exception
{
    String formats[] = {"hello %s!", "song%03d.mp3", "song%03d.mp3"};
    for (int i=0; i<formats.length; i++)
    {
        System.out.println("Format in [" + i + "]: " + formats[i]);
        System.out.println("Regexp out[" + i + "]: " + getRegexp(formats[i]));
    }

    String[] words = {"hello world!", "song001.mp3", "potato"};
    for (int i=0; i<formats.length; i++)
    {
        System.out.println("Word [" + i + "]: " + words[i] +
            " : matches=" + words[i].matches(getRegexpFromFormatString(formats[i])));
    }
}

Solution 4

There is not a simple way to do this. A straight-forward way would be to write some code that converts format strings (or a simpler subset of them) to regular expressions and then match those using the standard regular expression classes.

A better way is probably to rethink/refactor your code. Why do you want this?

Share:
37,904
hpique
Author by

hpique

iOS, Android &amp; Mac developer. Founder of Robot Media. @hpique

Updated on August 26, 2020

Comments

  • hpique
    hpique over 3 years

    In Java, how can you determine if a String matches a format string (ie: song%03d.mp3)?

    In other words, how would you implement the following function?

    /**
    * @return true if formatted equals String.format(format, something), false otherwise.
    **/
    boolean matches(String formatted, String format);
    

    Examples:

    matches("hello world!", "hello %s!"); // true
    matches("song001.mp3", "song%03d.mp3"); // true
    matches("potato", "song%03d.mp3"); // false
    

    Maybe there's a way to convert a format string into a regex?

    Clarification

    The format String is a parameter. I don't know it in advance. song%03d.mp3 is just an example. It could be any other format string.

    If it helps, I can assume that the format string will only have one parameter.

  • hpique
    hpique over 12 years
    The format String is a parameter. I don't know it in advance. song%03d.mp3 was just an example.
  • hpique
    hpique over 12 years
    The format String is a parameter. I don't know it in advance. song%03d.mp3 was just an example.
  • hpique
    hpique over 12 years
    The format String is a parameter. I don't know it in advance. song%03d.mp3 was just an example.
  • hpique
    hpique over 12 years
    And how do you convert a generic format string into a pattern?
  • hpique
    hpique over 12 years
    I'd love to use regular expressions, but what I'm given is a format string.
  • Yhn
    Yhn over 12 years
    Hence my comment about replacing format codes like %03d with their regular expression equivalent :). The page you linked completely defines the possible codes and prefixes, you'd need to write a function that searches those codes and replaces them.a %d would be replaced with \d+; %03d could become \d{3}\d? (to ensure a minumum of 3, but possibly "infinite" digits.
  • hpique
    hpique over 12 years
    That's what I would like to avoid. I didn't write the whole code.
  • dtech
    dtech over 12 years
    There is definitly no native way to do this. So you either need to rethink your input/code, write your own converter or find a converter that does this. E.g. why exactly does you need to use a format string?
  • hpique
    hpique over 12 years
    We choose format strings because they're pretty much the same across all platforms, unlike regex.
  • Mario Duarte
    Mario Duarte over 12 years
    Well, if you're giving it as a parameter for a java application why don't you just use Java regexps?
  • Vishwas Mehra
    Vishwas Mehra over 12 years
    @hgpc ok I've modified my answer appropriately. It's more than I would usually do for a SO answer but I was intrigued. :) You would have to perfect/complete this for production use but it is an idea for how to approach this if necessary.
  • rascio
    rascio over 12 years
    But you have to write a regex...this is what i don't understand...how is created the regex? you need something that creates the regex automatically? or you need something that checks if a string contains a regex?
  • hpique
    hpique over 12 years
    Because the Java app is one of the many clients that receive this input.
  • hpique
    hpique over 12 years
    +1 This is more or less what I'm doing right now. Thanks for posting code.
  • dtech
    dtech over 12 years
    Perl5 compatible regular expressions are implemented in nearly all programming languages. But if you have to do this the only thing you can do is write a converter. It's not that hard. Also note that you're using the wrong format for the job. Format strings are only intended for data -> string(s). Regular expressions are broader. What you're doing now is basically re-inventing an impractical regex notation.
  • Cephalopod
    Cephalopod over 12 years
    If you want to be a hero, you can publish your code as open source.
  • hpique
    hpique over 12 years
    I'm no stranger to publishing open-source code, but this is too specific to publish.