decoding \u200e to string

10,325

You have a U+200E LEFT-TO-RIGHT MARK character in your input. It's a non-printing typesetting directive, instructing anything that is displaying the text to switch to left-to-right mode. The string, when printed to a console that is already set to display from left-to-right (e.g. the vast majority of terminals in the western world), will not look any different from one printed without the marker.

Since it is not part of the date, you could just strip such characters:

datetime.strptime(dateRegistered.strip('\u200e'), '%m-%d-%Y%I:%M %p')

or if it is always present, explicitly add it to the format you are parsing, just like the - and : and space characters already part of your format:

datetime.strptime(dateRegistered, '\u200e%m-%d-%Y%I:%M %p')

Demo:

>>> from datetime import datetime
>>> dateRegistered = '\u200e07-30-200702:38 PM'
>>> datetime.strptime(dateRegistered.strip('\u200e'), '%m-%d-%Y%I:%M %p')
datetime.datetime(2007, 7, 30, 14, 38)
>>> datetime.strptime(dateRegistered, '\u200e%m-%d-%Y%I:%M %p')
datetime.datetime(2007, 7, 30, 14, 38)
Share:
10,325
M24
Author by

M24

Updated on June 04, 2022

Comments

  • M24
    M24 almost 2 years

    In Python3, I receive the following error message:

    ValueError: time data '\u200e07-30-200702:38 PM' does not match format '%m-%d-%Y%I:%M %p'

    from datetime import datetime
    
    dateRegistered = '\u200e07-30-200702:38 PM'
    # dateRegistered = '07-30-200702:38 PM'
    dateRegistered = datetime.strptime(dateRegistered, '%m-%d-%Y%I:%M %p')
    print (dateRegistered)
    

    The code above serves to replicate the issue. It works if I uncomment the line. It seems the string I am receiving is encoded, but I could not find out which encoding it is using. Or do I have a non-printable character in my string?

    print ('\u200e07-30-200702:38 PM')
    >>>> 07-30-200702:38 PM
    
  • Corvax
    Corvax almost 2 years
    Be careful, this might work wrongly if your date started with 20. dateRegistered = '\u200e2007-30-200702:38 PM' >>> dateRegistered.strip('\u200e') '7-30-200702:38 PM' It's safer to use dateRegistered.replace('\u200e', '')
  • Martijn Pieters
    Martijn Pieters almost 2 years
    @Corvax: that's not how Python 3 string syntax works. '\u200e' is a single character. There are no 2 or 0 characters in that string, so they won't be stripped either. dateRegistered = '\u200e2007-30-200702:38 PM', then dateRegistered.strip('\u200e') outputs '2007-30-200702:38 PM'