Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date
Solution 1
pandas already reads that as a datetime
object! So what you want is to select rows between two dates and you can do that by masking:
df_masked = df[(df.date > '2012-04-01') & (df.date < '2012-04-04')]
Because you said that you were getting an error from the string for some reason, try this:
df_masked = df[(df.date > datetime.date(2012,4,1)) & (df.date < datetime.date(2012,4,4))]
Solution 2
Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query.
df['date'] = pd.to_datetime(df['date'])
df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))]
Solution 3
You probably need apply
, so something like:
df['date'] = df['date'].apply(dateutil.parser.parse)
Without an example of the column I can't guarantee this will work, but something in that direction should help you to carry on.
Solution 4
Don't confuse datetime.date
with Pandas pd.Timestamp
A "Pandas datetime
series" contains pd.Timestamp
elements, not datetime.date
elements. The recommended solution for Pandas:
s = pd.to_datetime(s) # convert series to Pandas
mask = s > '2018-03-10' # calculate Boolean mask against Pandas-compatible object
The top answers have issues:
- @RyanSaxe's accepted answer's first attempt doesn't work; the second answer is inefficient.
- As of Pandas v0.23.0, @Keith's highly upvoted answer doesn't work; it gives
TypeError
.
Any good Pandas solution must ensure:
- The series is a Pandas
datetime
series, notobject
dtype. - The
datetime
series is compared to a compatible object, e.g.pd.Timestamp
, or string in the correct format.
Here's a demo with benchmarking, demonstrating that the one-off cost of conversion can be immediately offset by a single operation:
from datetime import date
L = [date(2018, 1, 10), date(2018, 5, 20), date(2018, 10, 30), date(2018, 11, 11)]
s = pd.Series(L*10**5)
a = s > date(2018, 3, 10) # accepted solution #2, inefficient
b = pd.to_datetime(s) > '2018-03-10' # more efficient, including datetime conversion
assert a.equals(b) # check solutions give same result
%timeit s > date(2018, 3, 10) # 40.5 ms
%timeit pd.to_datetime(s) > '2018-03-10' # 33.7 ms
s = pd.to_datetime(s)
%timeit s > '2018-03-10' # 2.85 ms
Solution 5
You should iterate over the items and parse them independently, then construct a new list.
df['date'] = [dateutil.parser.parse(x) for x in df['date']]
Nyxynyx
Hello :) I have no formal education in programming :( And I need your help! :D These days its web development: Node.js Meteor.js Python PHP Laravel Javascript / jQuery d3.js MySQL PostgreSQL MongoDB PostGIS
Updated on October 02, 2020Comments
-
Nyxynyx over 3 years
I have a
DataFrame
with column nameddate
. How can we convert/parse the 'date' column to aDateTime
object?I loaded the date column from a Postgresql database using
sql.read_frame()
. An example of thedate
column is2013-04-04
.What I am trying to do is to select all rows in a dataframe that has their date columns within a certain period, like after
2013-04-01
and before2013-04-04
.My attempt below gives the error
'Series' object has no attribute 'read'
Attempt
import dateutil df['date'] = dateutil.parser.parse(df['date'])
Error
AttributeError Traceback (most recent call last) <ipython-input-636-9b19aa5f989c> in <module>() 15 16 # Parse 'Date' Column to Datetime ---> 17 df['date'] = dateutil.parser.parse(df['date']) 18 19 # SELECT RECENT SALES C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(timestr, parserinfo, **kwargs) 695 return parser(parserinfo).parse(timestr, **kwargs) 696 else: --> 697 return DEFAULTPARSER.parse(timestr, **kwargs) 698 699 C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs) 299 default = datetime.datetime.now().replace(hour=0, minute=0, 300 second=0, microsecond=0) --> 301 res = self._parse(timestr, **kwargs) 302 if res is None: 303 raise ValueError, "unknown string format" C:\Python27\lib\site-packages\dateutil\parser.pyc in _parse(self, timestr, dayfirst, yearfirst, fuzzy) 347 yearfirst = info.yearfirst 348 res = self._result() --> 349 l = _timelex.split(timestr) 350 try: 351 C:\Python27\lib\site-packages\dateutil\parser.pyc in split(cls, s) 141 142 def split(cls, s): --> 143 return list(cls(s)) 144 split = classmethod(split) 145 C:\Python27\lib\site-packages\dateutil\parser.pyc in next(self) 135 136 def next(self): --> 137 token = self.get_token() 138 if token is None: 139 raise StopIteration C:\Python27\lib\site-packages\dateutil\parser.pyc in get_token(self) 66 nextchar = self.charstack.pop(0) 67 else: ---> 68 nextchar = self.instream.read(1) 69 while nextchar == '\x00': 70 nextchar = self.instream.read(1) AttributeError: 'Series' object has no attribute 'read'
df['date'].apply(dateutil.parser.parse)
gives me the errorAttributeError: 'datetime.date' object has no attribute 'read'
df['date'].truncate(after='2013/04/01')
gives the errorTypeError: can't compare datetime.datetime to long
df['date'].dtype
returnsdtype('O')
. Is it already adatetime
object? -
Nyxynyx almost 11 yearsThanks, I tried
df['date'].apply(dateutil.parser.parse)
and it gave ethe error.AttributeError: 'datetime.date' object has no attribute 'read'
. An example of the column is2013-04-04
. The entire dataframe was loaded from a PostgreSQL database usingsql.readframe()
. -
Nyxynyx almost 11 years
df = df[df.date > '2012-01-01']
gives me an errorTypeError: can't compare datetime.date to str
. -
Ryan Saxe almost 11 yearsI use this all the time! That's very odd...your question is very similar to one I asked and I was given this answer and it worked. See it here
-
Nyxynyx almost 11 yearsYes.. it works when I created the dataframe manually... but if I create the dataframe from a SQL database using
sql.read_frame
,'2012-01-01'
gets treated as a string? -
Nyxynyx almost 11 yearsTrying
df[df.date > dateutil.parser.parse('2013-01-01') ]
gives meTypeError: can't compare datetime.datetime to datetime.date
-
Nyxynyx almost 11 yearsdf.date is type object, but i think
2013-01-01
is treated as a string. The error changes from having astr
to having adatetime.date
when I useddateutil.parser.parse()
as in above comments -
Ryan Saxe almost 11 yearsI added something, try that!