TypeError: unsupported operand type(s) for -: 'str' and 'str' in python 3.x Anaconda
Solution 1
I think you need change header=0
for select first row to header - then column names are replace by list cols
.
If still problem, need to_numeric
, because some values in StartTime
and StopTime
are strings, which are parsed to NaN
, replace by 0
an last convert column to int
:
cols = ['UserId', 'UserMAC', 'HotspotID', 'StartTime', 'StopTime']
df = pd.read_csv('canada_mini_unixtime.csv', header=0, names=cols)
#print (df)
df['StartTime'] = pd.to_numeric(df['StartTime'], errors='coerce').fillna(0).astype(int)
df['StopTime'] = pd.to_numeric(df['StopTime'], errors='coerce').fillna(0).astype(int)
No change:
df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)
freq = '1H' # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)
# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1
r['LogCount'] = 0
r['UniqueIDCount'] = 0
ix
is deprecated in last version of pandas, so use loc
and column name is in []
:
for i, row in r.iterrows():
# intervals overlap test
# https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
# i've slightly simplified the calculations of m and d
# by getting rid of division by 2,
# because it can be done eliminating common terms
u = df.loc[np.abs(df.m - 2*row.start - interval) < df.d + interval, 'UserId']
r.loc[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]
r['Date'] = pd.to_datetime(r.start, unit='s').dt.date
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time
print (r)
Solution 2
df['d'] = df.StopTime - df.StartTime
is attempting to subtract a string from another string. I don't know what your data looks like, but chances are that you want to parse StopTime
and StartTime
as dates. Try
df = pd.read_csv(fn, header=None, names=cols, parse_dates=[3,4])
instead of df = pd.read_csv(fn, header=None, names=cols)
.
Sitz Blogz
Updated on May 15, 2020Comments
-
Sitz Blogz about 4 years
I am trying to count some instances per hour time in a large dataset. The code below seems to work fine on python 2.7 but I had to upgrade it to 3.x latest version of python with all updated packages on Anaconda. When I am trying to execute the program I am getting following
str
errorCode:
import pandas as pd from datetime import datetime,time import numpy as np fn = r'00_input.csv' cols = ['UserId', 'UserMAC', 'HotspotID', 'StartTime', 'StopTime'] df = pd.read_csv(fn, header=None, names=cols) df['m'] = df.StopTime + df.StartTime df['d'] = df.StopTime - df.StartTime # 'start' and 'end' for the reporting DF: `r` # which will contain equal intervals (1 hour in this case) start = pd.to_datetime(df.StartTime.min(), unit='s').date() end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1) # building reporting DF: `r` freq = '1H' # 1 Hour frequency idx = pd.date_range(start, end, freq=freq) r = pd.DataFrame(index=idx) r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64) # 1 hour in seconds, minus one second (so that we will not count it twice) interval = 60*60 - 1 r['LogCount'] = 0 r['UniqueIDCount'] = 0 for i, row in r.iterrows(): # intervals overlap test # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test # i've slightly simplified the calculations of m and d # by getting rid of division by 2, # because it can be done eliminating common terms u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()] r['Date'] = pd.to_datetime(r.start, unit='s').dt.date r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3] r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time #r.to_csv('results.csv', index=False) #print(r[r.LogCount > 0]) #print (r['StartTime'], r['EndTime'], r['Day'], r['LogCount'], r['UniqueIDCount']) rout = r[['Date', 'StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ] #print rout rout.to_csv('o_1_hour.csv', index=False, header=False
)
Where do I make changes to get a error free execution
Error:
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py", line 686, in <lambda> lambda x: op(x, rvalues)) TypeError: unsupported operand type(s) for -: 'str' and 'str'
Appreciate the Help, Thanks in advance
-
Sitz Blogz about 7 yearsThank you for the answer: When I changed the statement I am getting this error
TypeError: Can't convert 'int' object to str implicitly
-
Sitz Blogz about 7 yearsSure ..Thank you .. I'll make the changes and try to execute the program and let you know. Thanks again
-
Sitz Blogz about 7 yearsI have a bunch of code to migrate from 2.7 to 3.x .. I hope this works for me .. One down from count of pending heap also counts.. Thanks a bunch ..
-
jezrael about 7 yearsYes, I starts with python 2 also, I know what is it. Good luck!
-
Sitz Blogz about 7 yearsGives me error in another place ` in <module> u = df[np.abs(df.StartTime - 2*row.start - interval) < df.StopTime + interval].UserID`
TypeError: incompatible type for a datetime/timedelta operation [__sub__]
-
jezrael about 7 yearsHmmm, I was wrong. Ther is another problem. I edit answer.
-
Sitz Blogz about 7 yearsI made all the changed you mention in the answer .. But unformtunately the outputs are
0
This is a program to find log of connections every hour and unique connections in one hour. -
jezrael about 7 yearsAre you sure data are OK? Is possible test it in python 2 if same output?
-
Sitz Blogz about 7 yearsYes i am sure the Python 2 version works as I have generated outputs for datasets. I will work on python 2 code for the same dataset and let you know.
-
Sitz Blogz about 7 yearsThe python 2.7 execution output in the dropbox, But rolling back and working with this dataset and the same programming isnt working .. dropbox.com/s/fcx683fgqctsqte/o_1_hour.csv?dl=0
-
jezrael about 7 yearsThank you, but now I am offline only on phone. Is possible send me your python 2 code with some sample data on my email? Because still dont understand difference, code should work same way in python 2 and python 3. If data confidental, is possinle anonymize it?
-
Sitz Blogz about 7 yearsSure I can share the code and input samples from a folder on dropbox, this data is already anonymous. Please can you share your email id ..
-
Sitz Blogz about 7 yearsShared the dropbox repository
-
jezrael about 7 yearsThank you for sharing, I think you can see new file
0_time_split_jez.py
. But I am a bit confused - How is possible compare outputs if not same inputs? Also I think there is problem in something else, because nopython 2
topython 3
conversion likemap
,print
... What do you think? -
Sitz Blogz about 7 yearsThe 'dart' input n output both are given, input dart is mini version and the Canada mini input is wat I want to work on now.. It might not be only python 2 or 3 but also the supporting packages like pandas, numpy and others, I did try to downgrade them aswell and check. But looks like getting the combination of same working versions a bit time taking process.. Hence i wanted to upgrade the code
-
jezrael about 7 yearsOk, I try Canada mini input, but how I can check if output is correct if impossible compare with
o_1_hour.csv
? Minimal there is difference in length and in dates (2010
vs2004
years) -
Sitz Blogz about 7 yearsCan I add u on Google hangout chat? Please
-
jezrael about 7 yearsinvitation sent
-
Sitz Blogz about 7 yearsI did from my Gmail u have to accept it "phani.lav()gmail()com"
-
jezrael about 7 yearsOk, no problem.