SimpleDateFormat.parse() ignores the number of characters in pattern

10,423

Solution 1

There are serious issues with SimpleDateFormat. The default lenient setting can produce garbage answers, and I cannot think of a case where lenient has any benefit. The lenient setting is not a reliable approach to produce reasonable interpretations of human entered date variations. This should never have been the default setting.

Use DateTimeFormatter instead if you can, see Ole V.V.'s answer. This newer approach is superior and produces thread safe and immutable instances. If you share a SimpleDateFormat instance between threads they can produce garbage results without errors or exceptions. Sadly my suggested implementation inherits this bad behavior.

Disabling lenient is only part of the solution. You can still end up with garbage results that are hard to catch in testing. See the comments in the code below for examples.

Here is an extension of SimpleDateFormat that forces strict pattern match. This should have been the default behavior for that class.

import java.text.DateFormatSymbols;
import java.text.ParseException;
import java.text.ParsePosition;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;

/**
 * Extension of SimpleDateFormat that implements strict matching.
 * parse(text) will only return a Date if text exactly matches the
 * pattern. 
 * 
 * This is needed because SimpleDateFormat does not enforce strict 
 * matching. First there is the lenient setting, which is true
 * by default. This allows text that does not match the pattern and
 * garbage to be interpreted as valid date/time information. For example,
 * parsing "2010-09-01" using the format "yyyyMMdd" yields the date 
 * 2009/12/09! Is this bizarre interpretation the ninth day of the  
 * zeroth month of 2010? If you are dealing with inputs that are not 
 * strictly formatted, you WILL get bad results. You can override lenient  
 * with setLenient(false), but this strangeness should not be the default. 
 *
 * Second, setLenient(false) still does not strictly interpret the pattern. 
 * For example "2010/01/5" will match "yyyy/MM/dd". And data disagreement like 
 * "1999/2011" for the pattern "yyyy/yyyy" is tolerated (yielding 2011). 
 *
 * Third, setLenient(false) still allows garbage after the pattern match. 
 * For example: "20100901" and "20100901andGarbage" will both match "yyyyMMdd". 
 * 
 * This class restricts this undesirable behavior, and makes parse() and 
 * format() functional inverses, which is what you would expect. Thus
 * text.equals(format(parse(text))) when parse returns a non-null result.
 * 
 * @author zobell
 *
 */
public class StrictSimpleDateFormat extends SimpleDateFormat {

    protected boolean strict = true;

    public StrictSimpleDateFormat() {
        super();
        setStrict(true);
    }

    public StrictSimpleDateFormat(String pattern) {
        super(pattern);
        setStrict(true);
    }

    public StrictSimpleDateFormat(String pattern, DateFormatSymbols formatSymbols) {
        super(pattern, formatSymbols);
        setStrict(true);
    }

    public StrictSimpleDateFormat(String pattern, Locale locale) {
        super(pattern, locale);
        setStrict(true);
    }

    /**
     * Set the strict setting. If strict == true (the default)
     * then parsing requires an exact match to the pattern. Setting
     * strict = false will tolerate text after the pattern match. 
     * @param strict
     */
    public void setStrict(boolean strict) {
        this.strict = strict;
        // strict with lenient does not make sense. Really lenient does
        // not make sense in any case.
        if (strict)
            setLenient(false); 
    }

    public boolean getStrict() {
        return strict;
    }

    /**
     * Parse text to a Date. Exact match of the pattern is required.
     * Parse and format are now inverse functions, so this is
     * required to be true for valid text date information:
     * text.equals(format(parse(text))
     * @param text
     * @param pos
     * @return
     */
    @Override
    public Date parse(String text, ParsePosition pos) {
        Date d = super.parse(text, pos);
        if (strict && d != null) {
           String format = this.format(d);
           if (pos.getIndex() + format.length() != text.length() ||
                 !text.endsWith(format)) {
              d = null; // Not exact match
           }
        }
        return d;
    }
}

Solution 2

java.time

java.time is the modern Java date and time API and behaves the way you had expected. So it’s a matter of a simple translation of your code:

private static final DateTimeFormatter formatter1 = DateTimeFormatter.ofPattern("dd.MM.yyyy");
private static final DateTimeFormatter formatter2 = DateTimeFormatter.ofPattern("dd-MM-yyyy");
private static final DateTimeFormatter formatter3 = DateTimeFormatter.ofPattern("yyyy-MM-dd");

public static LocalDate parseDate(String dateString) {
    LocalDate parsedDate;
    try {
        parsedDate = LocalDate.parse(dateString, formatter1);
    } catch (DateTimeParseException dtpe1) {
        try {
            parsedDate = LocalDate.parse(dateString, formatter2);
        } catch (DateTimeParseException dtpe2) {
            parsedDate = LocalDate.parse(dateString, formatter3);
        }
    }
    return parsedDate;
}

(I put the formatters outside your method so they are not created anew for each call. You can put them inside if you prefer.)

Let’s try it out:

    LocalDate date = parseDate("2013-01-31");
    System.out.println(date);

Output is:

2013-01-31

For numbers DateTimeFormatter.ofPattern takes the number of pattern letters to be the minimum field width. It furthermore assumes that the day of month is never more than two digits. So when trying the format dd-MM-yyyy it successfully parsed 20 as a day of month and then threw a DateTimeParseException because there wasn’t a hyphen (dash) after 20. Then the method went on to try the next formatter.

What went wrong in your code

The SimpleDateFormat class that you tried to use is notoriously troublesome and fortunately long outdated. You met but one of the many problems with it. Repeating the important sentence from the documentation of how it handles numbers from the answer by Teetoo:

For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.

So new SimpleDateFormat("dd-MM-yyyy") happily parses 2013 as the day of month, 01 as the month and 31 as the year. Next we should have expected it to throw an exception because there aren’t 2013 days in January year 31. But a SimpleDateFormat with default settings doesn’t do that. It just keeps counting days through the following months and years and ends up at July 5 year 36, five and a half years later, the result you observed.

Link

Oracle tutorial: Date Time explaining how to use java.time.

Solution 3

A workaround could be to test the yyyy-MM-dd format with a regex:

public static Date parseDate(String dateString) throws ParseException {
    SimpleDateFormat sdf = new SimpleDateFormat("dd.MM.yyyy");
    SimpleDateFormat sdf2 = new SimpleDateFormat("dd-MM-yyyy");
    SimpleDateFormat sdf3 = new SimpleDateFormat("yyyy-MM-dd");

    Date parsedDate;
    try {
        if (dateString.matches("\\d{4}-\\d{2}-\\d{2}")) {
            parsedDate = sdf3.parse(dateString);
        } else {
            throw new ParseException("", 0);
        }
    } catch (ParseException ex) {
        try {
            parsedDate = sdf2.parse(dateString);
        } catch (ParseException ex2) {
            parsedDate = sdf.parse(dateString);
        }
    }
    return parsedDate;
}

Solution 4

It is documented in the SimpleDateFormat javadoc:

For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.

Share:
10,423
das Keks
Author by

das Keks

Computer science student and passionate software developer.

Updated on June 04, 2022

Comments

  • das Keks
    das Keks almost 2 years

    I'm trying to parse a date String which can have tree different formats. Even though the String should not match the second pattern it somehow does and therefore returns a wrong date.

    That's my code:

    import java.text.ParseException;
    import java.text.SimpleDateFormat;
    import java.util.Date;
    
    public class Start {
    
        public static void main(String[] args) {
            SimpleDateFormat sdf = new SimpleDateFormat("dd.MM.yyyy");
            try{
                System.out.println(sdf.format(parseDate("2013-01-31")));
            } catch(ParseException ex){
                System.out.println("Unable to parse");
            }
        }
    
        public static Date parseDate(String dateString) throws ParseException{
            SimpleDateFormat sdf = new SimpleDateFormat("dd.MM.yyyy");
            SimpleDateFormat sdf2 = new SimpleDateFormat("dd-MM-yyyy");
            SimpleDateFormat sdf3 = new SimpleDateFormat("yyyy-MM-dd");
    
            Date parsedDate;
            try {
                parsedDate = sdf.parse(dateString);
            } catch (ParseException ex) {
                try{
                    parsedDate = sdf2.parse(dateString);
                } catch (ParseException ex2){
                    parsedDate = sdf3.parse(dateString);    
                }
            }
            return parsedDate;
        }
    }
    

    With the input 2013-01-31 I get the output 05.07.0036.

    If I try to parse 31-01-2013 or 31.01.2013 I get 31.01.2013 as expected.

    I recognized that the programm will give me exactly the same output if I set the patterns like this:

    SimpleDateFormat sdf = new SimpleDateFormat("d.M.y");
    SimpleDateFormat sdf2 = new SimpleDateFormat("d-M-y");
    SimpleDateFormat sdf3 = new SimpleDateFormat("y-M-d");
    

    Why does it ignore the number of chars in my pattern?