Python Script slowing down as it progresses?
Solution 1
This would be a good time to look at a profiler. You can profile the code to determine where time is being spent. It would appear likely that you issue is in the simulation code, but without being able to see that code the best help you're likely to get going to be vague.
Edit in light of added code:
You're doing a fair amount of copying of lists, which while not terribly expensive can consume a lot of time.
I agree the your code is probably unnecessarily confusing and would advise you to clean up the code. Changing the confusing names to meaningful ones may help you find where you're having a problem.
Finally, it may be the case that your simulation is simply computationally expensive. You might want to consider looking into a SciPy, Pandas, or some other Python mathematic package to get better performance and perhaps better tools for expressing the model you're simulating.
Solution 2
I experienced a similar problem with a Python 3.x script I wrote. The script randomly generated 1,000,000 (one million) JSON objects, writing them out to a file.
My problem was that the program was growing progressively slower as time proceeded. Here is a timestamp trace every 10,000 objects:
So far: Mar23-17:56:46: 0
So far: Mar23-17:56:48: 10000 ( 2 seconds)
So far: Mar23-17:56:50: 20000 ( 2 seconds)
So far: Mar23-17:56:55: 30000 ( 5 seconds)
So far: Mar23-17:57:01: 40000 ( 6 seconds)
So far: Mar23-17:57:09: 50000 ( 8 seconds)
So far: Mar23-17:57:18: 60000 ( 8 seconds)
So far: Mar23-17:57:29: 70000 (11 seconds)
So far: Mar23-17:57:42: 80000 (13 seconds)
So far: Mar23-17:57:56: 90000 (14 seconds)
So far: Mar23-17:58:13: 100000 (17 seconds)
So far: Mar23-17:58:30: 110000 (17 seconds)
So far: Mar23-17:58:51: 120000 (21 seconds)
So far: Mar23-17:59:12: 130000 (21 seconds)
So far: Mar23-17:59:35: 140000 (23 seconds)
As can be seen, the script takes progressively longer to generate groups of 10,000 records.
In my case it turned out to be the way I was generating unique ID numbers, each in the range of 10250000000000-10350000000000. To avoid regenerating the same ID twice, I stored a newly generated ID in a list, checking later that the ID does not exist in the list:
trekIdList = []
def GenerateRandomTrek ():
global trekIdList
while True:
r = random.randint (10250000000000, 10350000000000)
if not r in trekIdList:
trekIdList.append (r)
return r
The problem is that an unsorted list takes O(n) to search. As newly generated IDs are appended to the list, the time needed to traverse/search the list grows.
The solution was to switch to a dictionary (or map):
trekIdList = {}
. . .
def GenerateRandomTrek ():
global trekIdList
while True:
r = random.randint (10250000000000, 10350000000000)
if not r in trekIdList:
trekIdList [r] = 1
return r
The improvement was immediate:
So far: Mar23-18:11:30: 0
So far: Mar23-18:11:30: 10000
So far: Mar23-18:11:31: 20000
So far: Mar23-18:11:31: 30000
So far: Mar23-18:11:31: 40000
So far: Mar23-18:11:32: 50000
So far: Mar23-18:11:32: 60000
So far: Mar23-18:11:32: 70000
So far: Mar23-18:11:33: 80000
So far: Mar23-18:11:33: 90000
So far: Mar23-18:11:33: 100000
So far: Mar23-18:11:34: 110000
So far: Mar23-18:11:34: 120000
So far: Mar23-18:11:34: 130000
So far: Mar23-18:11:35: 140000
The reason is that accessing a value in a dictionary/map/hash is O(1).
Moral: When dealing with large numbers of items, use a dictionary/map or binary searching a sorted list rathen than an unordered list.
Austin Wismer
Updated on June 14, 2022Comments
-
Austin Wismer almost 2 years
I have a simulation running that has this basic structure:
from time import time def CSV(*args): #write * args to .CSV file return def timeleft(a,L,period): print(#details on how long last period took, ETA#) for L in range(0,6,4): for a in range(1,100): timeA = time() for t in range(1,1000): ## Manufacturer in Supply Chain ## inventory_accounting_lists.append(#simple calculations#) # Simulation to determine the optimal B-value (Basestock level) for B in range(1,100): for tau in range(1,1000): ## simple inventory accounting operations## ## Distributor in Supply Chain ## inventory_accounting_lists.append(#simple calculations#) # Simulation to determine the optimal B-value (Basestock level) for B in range(1,100): for tau in range(1,1000): ## simple inventory accounting operations## ## Wholesaler in Supply Chain ## inventory_accounting_lists.append(#simple calculations#) # Simulation to determine the optimal B-value (Basestock level) for B in range(1,100): for tau in range(1,1000): ## simple inventory accounting operations## ## Retailer in Supply Chain ## inventory_accounting_lists.append(#simple calculations#) # Simulation to determine the optimal B-value (Basestock level) for B in range(1,100): for tau in range(1,1000): ## simple inventory accounting operations## CSV(Simulation_Results) timeB = time() timeleft(a,L,timeB-timeA)
As the script continues, it seems to be getting slower and slower. Here is what it is for these values (and it increases linearly as a increases).
L = 0
,a = 1
: 1.15 minutesL = 0
,a = 99
: 1.7 minutesL = 2
,a = 1
: 2.7 minutesL = 2
,a = 99
: 5.15 minutesL = 4
,a = 1
: 4.5 minutesL = 4
,a = 15
: 4.95 minutes (this is the latest value it has reached)
Why would each iteration take longer? Each iteration of the loop essentially resets everything except for a master global list, which is being added to each time. However, loops inside each "period" aren't accessing this master list -- they are accessing the same local list every time.
EDIT 1: I will post the simulation code here, in case anyone wants to wade through it, but I warn you, it is rather long, and the variable names are probably unnecessarily confusing.
######### a = 0.01 L = 0 total = 1000 sim = 500 inv_cost = 1 bl_cost = 4 ######### # Functions import random from time import time time0 = time() # function to report ETA etc. def timeleft(a,L,period_time): if L==0: periods_left = ((1-a)*100)-1+2*99 if L==2: periods_left = ((1-a)*100)-1+99 if L==4: periods_left = ((1-a)*100)-1+0*99 minute_time = period_time/60 minutes_left = (periods_left*period_time)/60 hours_left = (periods_left*period_time)/3600 percentage_complete = 100*((297-periods_left)/297) print("Time for last period = ","%.2f" % minute_time," minutes") print("%.2f" % percentage_complete,"% complete") if hours_left<1: print("%.2f" % minutes_left," minutes left") else: print("%.2f" % hours_left," hours left") print("") return def dcopy(inList): if isinstance(inList, list): return list( map(dcopy, inList) ) return inList # Save values to .CSV file def CSV(a,L,I_STD_1,I_STD_2,I_STD_3,I_STD_4,O_STD_0, O_STD_1,O_STD_2,O_STD_3,O_STD_4): pass # Initialization # These are the global, master lists of data I_STD_1 = [[0],[0],[0]] I_STD_2 = [[0],[0],[0]] I_STD_3 = [[0],[0],[0]] I_STD_4 = [[0],[0],[0]] O_STD_0 = [[0],[0],[0]] O_STD_1 = [[0],[0],[0]] O_STD_2 = [[0],[0],[0]] O_STD_3 = [[0],[0],[0]] O_STD_4 = [[0],[0],[0]] for L in range(0,6,2): # These are local lists that are appended to at the end of every period I_STD_1_L = [] I_STD_2_L = [] I_STD_3_L = [] I_STD_4_L = [] O_STD_0_L = [] O_STD_1_L = [] O_STD_2_L = [] O_STD_3_L = [] O_STD_4_L = [] test = [] for n in range(1,100): # THIS is the start of the 99 value loop a = n/100 print ("L=",L,", alpha=",a) # Initialization for each Period F_1 = [0,10] # Forecast F_2 = [0,10] F_3 = [0,10] F_4 = [0,10] R_0 = [10] # Items Received R_1 = [10] R_2 = [10] R_3 = [10] R_4 = [10] for i in range(L): R_1.append(10) R_2.append(10) R_3.append(10) R_4.append(10) I_1 = [10] # Final Inventory I_2 = [10] I_3 = [10] I_4 = [10] IP_1 = [10+10*L] # Inventory Position IP_2 = [10+10*L] IP_3 = [10+10*L] IP_4 = [10+10*L] O_1 = [10] # Items Ordered O_2 = [10] O_3 = [10] O_4 = [10] BL_1 = [0] # Backlog BL_2 = [0] BL_3 = [0] BL_4 = [0] OH_1 = [20] # Items on Hand OH_2 = [20] OH_3 = [20] OH_4 = [20] OR_1 = [10] # Order received from customer OR_2 = [10] OR_3 = [10] OR_4 = [10] Db_1 = [10] # Running Average Demand Db_2 = [10] Db_3 = [10] Db_4 = [10] var_1 = [0] # Running Variance in Demand var_2 = [0] var_3 = [0] var_4 = [0] B_1 = [IP_1[0]+10] # Optimal Basestock B_2 = [IP_2[0]+10] B_3 = [IP_3[0]+10] B_4 = [IP_4[0]+10] D = [0,10] # End constomer demand for i in range(total+1): D.append(9) D.append(12) D.append(8) D.append(11) period = [0] from time import time timeA = time() # 1000 time periods t for t in range(1,total+1): period.append(t) #### MANUFACTURER #### # Manufacturing order from previous time period put into production R_4.append(O_4[t-1]) #recieve shipment from supplier, calculate items OH HAND if I_4[t-1]<0: OH_4.append(R_4[t]) else: OH_4.append(I_4[t-1]+R_4[t]) # Recieve and dispatch order, update Inventory and Backlog for time t if (O_3[t-1] + BL_4[t-1]) <= OH_4[t]: # No Backlog I_4.append(OH_4[t] - (O_3[t-1] + BL_4[t-1])) BL_4.append(0) R_3.append(O_3[t-1]+BL_4[t-1]) else: I_4.append(OH_4[t] - (O_3[t-1] + BL_4[t-1])) # Backlogged BL_4.append(-I_4[t]) R_3.append(OH_4[t]) # Update Inventory Position IP_4.append(IP_4[t-1] + O_4[t-1] - O_3[t-1]) # Use exponential smoothing to forecast future demand future_demand = (1-a)*F_4[t] + a*O_3[t-1] F_4.append(future_demand) # Calculate D_bar(t) and Var(t) Db_4.append((1/t)*sum(O_3[0:t])) s = 0 for i in range(0,t): s+=(O_3[i]-Db_4[t])**2 if t==1: var_4.append(0) # var(1) = 0 else: var_4.append((1/(t-1))*s) # Simulation to determine B(t) S_BC_4 = [10000000000]*10 Run_4 = [0]*10 for B in range(10,500): S_OH_4 = OH_4[:] S_I_4 = I_4[:] S_R_4 = R_4[:] S_BL_4 = BL_4[:] S_IP_4 = IP_4[:] S_O_4 = O_4[:] # Update O(t)(the period just before the simulation begins) # using the B value for the simulation if B - S_IP_4[t] > 0: S_O_4.append(B - S_IP_4[t]) else: S_O_4.append(0) c = 0 for i in range(t+1,t+sim+1): S_R_4.append(S_O_4[i-1]) #simulate demand demand = -1 while demand <0: demand = random.normalvariate(F_4[t+1],(var_4[t])**(.5)) # Receive simulated shipment, calculate simulated items on hand if S_I_4[i-1]<0: S_OH_4.append(S_R_4[i]) else: S_OH_4.append(S_I_4[i-1]+S_R_4[i]) # Receive and send order, update Inventory and Backlog (simulated) owed = (demand + S_BL_4[i-1]) S_I_4.append(S_OH_4[i] - owed) if owed <= S_OH_4[i]: # No Backlog S_BL_4.append(0) c += inv_cost*S_I_4[i] else: S_BL_4.append(-S_I_4[i]) # Backlogged c += bl_cost*S_BL_4[i] # Update Inventory Position S_IP_4.append(S_IP_4[i-1] + S_O_4[i-1] - demand) # Update Order, Upstream member dispatches goods if (B-S_IP_4[i]) > 0: S_O_4.append(B - S_IP_4[i]) else: S_O_4.append(0) # Log Simulation costs for that B-value S_BC_4.append(c) # If the simulated costs are increasing, stop if B>11: dummy = [] for i in range(0,10): dummy.append(S_BC_4[B-i]-S_BC_4[B-i-1]) Run_4.append(sum(dummy)/float(len(dummy))) if Run_4[B-3] > 0 and B>20: break else: Run_4.append(0) # Use minimum cost as new B(t) var = min((val, idx) for (idx, val) in enumerate(S_BC_4)) optimal_B = var[1] B_4.append(optimal_B) # Calculate O(t) if B_4[t] - IP_4[t] > 0: O_4.append(B_4[t] - IP_4[t]) else: O_4.append(0) #### DISTRIBUTOR #### #recieve shipment from supplier, calculate items OH HAND if I_3[t-1]<0: OH_3.append(R_3[t]) else: OH_3.append(I_3[t-1]+R_3[t]) # Recieve and dispatch order, update Inventory and Backlog for time t if (O_2[t-1] + BL_3[t-1]) <= OH_3[t]: # No Backlog I_3.append(OH_3[t] - (O_2[t-1] + BL_3[t-1])) BL_3.append(0) R_2.append(O_2[t-1]+BL_3[t-1]) else: I_3.append(OH_3[t] - (O_2[t-1] + BL_3[t-1])) # Backlogged BL_3.append(-I_3[t]) R_2.append(OH_3[t]) # Update Inventory Position IP_3.append(IP_3[t-1] + O_3[t-1] - O_2[t-1]) # Use exponential smoothing to forecast future demand future_demand = (1-a)*F_3[t] + a*O_2[t-1] F_3.append(future_demand) # Calculate D_bar(t) and Var(t) Db_3.append((1/t)*sum(O_2[0:t])) s = 0 for i in range(0,t): s+=(O_2[i]-Db_3[t])**2 if t==1: var_3.append(0) # var(1) = 0 else: var_3.append((1/(t-1))*s) # Simulation to determine B(t) S_BC_3 = [10000000000]*10 Run_3 = [0]*10 for B in range(10,500): S_OH_3 = OH_3[:] S_I_3 = I_3[:] S_R_3 = R_3[:] S_BL_3 = BL_3[:] S_IP_3 = IP_3[:] S_O_3 = O_3[:] # Update O(t)(the period just before the simulation begins) # using the B value for the simulation if B - S_IP_3[t] > 0: S_O_3.append(B - S_IP_3[t]) else: S_O_3.append(0) c = 0 for i in range(t+1,t+sim+1): #simulate demand demand = -1 while demand <0: demand = random.normalvariate(F_3[t+1],(var_3[t])**(.5)) S_R_3.append(S_O_3[i-1]) # Receive simulated shipment, calculate simulated items on hand if S_I_3[i-1]<0: S_OH_3.append(S_R_3[i]) else: S_OH_3.append(S_I_3[i-1]+S_R_3[i]) # Receive and send order, update Inventory and Backlog (simulated) owed = (demand + S_BL_3[i-1]) S_I_3.append(S_OH_3[i] - owed) if owed <= S_OH_3[i]: # No Backlog S_BL_3.append(0) c += inv_cost*S_I_3[i] else: S_BL_3.append(-S_I_3[i]) # Backlogged c += bl_cost*S_BL_3[i] # Update Inventory Position S_IP_3.append(S_IP_3[i-1] + S_O_3[i-1] - demand) # Update Order, Upstream member dispatches goods if (B-S_IP_3[i]) > 0: S_O_3.append(B - S_IP_3[i]) else: S_O_3.append(0) # Log Simulation costs for that B-value S_BC_3.append(c) # If the simulated costs are increasing, stop if B>11: dummy = [] for i in range(0,10): dummy.append(S_BC_3[B-i]-S_BC_3[B-i-1]) Run_3.append(sum(dummy)/float(len(dummy))) if Run_3[B-3] > 0 and B>20: break else: Run_3.append(0) # Use minimum cost as new B(t) var = min((val, idx) for (idx, val) in enumerate(S_BC_3)) optimal_B = var[1] B_3.append(optimal_B) # Calculate O(t) if B_3[t] - IP_3[t] > 0: O_3.append(B_3[t] - IP_3[t]) else: O_3.append(0) #### WHOLESALER #### #recieve shipment from supplier, calculate items OH HAND if I_2[t-1]<0: OH_2.append(R_2[t]) else: OH_2.append(I_2[t-1]+R_2[t]) # Recieve and dispatch order, update Inventory and Backlog for time t if (O_1[t-1] + BL_2[t-1]) <= OH_2[t]: # No Backlog I_2.append(OH_2[t] - (O_1[t-1] + BL_2[t-1])) BL_2.append(0) R_1.append(O_1[t-1]+BL_2[t-1]) else: I_2.append(OH_2[t] - (O_1[t-1] + BL_2[t-1])) # Backlogged BL_2.append(-I_2[t]) R_1.append(OH_2[t]) # Update Inventory Position IP_2.append(IP_2[t-1] + O_2[t-1] - O_1[t-1]) # Use exponential smoothing to forecast future demand future_demand = (1-a)*F_2[t] + a*O_1[t-1] F_2.append(future_demand) # Calculate D_bar(t) and Var(t) Db_2.append((1/t)*sum(O_1[0:t])) s = 0 for i in range(0,t): s+=(O_1[i]-Db_2[t])**2 if t==1: var_2.append(0) # var(1) = 0 else: var_2.append((1/(t-1))*s) # Simulation to determine B(t) S_BC_2 = [10000000000]*10 Run_2 = [0]*10 for B in range(10,500): S_OH_2 = OH_2[:] S_I_2 = I_2[:] S_R_2 = R_2[:] S_BL_2 = BL_2[:] S_IP_2 = IP_2[:] S_O_2 = O_2[:] # Update O(t)(the period just before the simulation begins) # using the B value for the simulation if B - S_IP_2[t] > 0: S_O_2.append(B - S_IP_2[t]) else: S_O_2.append(0) c = 0 for i in range(t+1,t+sim+1): #simulate demand demand = -1 while demand <0: demand = random.normalvariate(F_2[t+1],(var_2[t])**(.5)) # Receive simulated shipment, calculate simulated items on hand S_R_2.append(S_O_2[i-1]) if S_I_2[i-1]<0: S_OH_2.append(S_R_2[i]) else: S_OH_2.append(S_I_2[i-1]+S_R_2[i]) # Receive and send order, update Inventory and Backlog (simulated) owed = (demand + S_BL_2[i-1]) S_I_2.append(S_OH_2[i] - owed) if owed <= S_OH_2[i]: # No Backlog S_BL_2.append(0) c += inv_cost*S_I_2[i] else: S_BL_2.append(-S_I_2[i]) # Backlogged c += bl_cost*S_BL_2[i] # Update Inventory Position S_IP_2.append(S_IP_2[i-1] + S_O_2[i-1] - demand) # Update Order, Upstream member dispatches goods if (B-S_IP_2[i]) > 0: S_O_2.append(B - S_IP_2[i]) else: S_O_2.append(0) # Log Simulation costs for that B-value S_BC_2.append(c) # If the simulated costs are increasing, stop if B>11: dummy = [] for i in range(0,10): dummy.append(S_BC_2[B-i]-S_BC_2[B-i-1]) Run_2.append(sum(dummy)/float(len(dummy))) if Run_2[B-3] > 0 and B>20: break else: Run_2.append(0) # Use minimum cost as new B(t) var = min((val, idx) for (idx, val) in enumerate(S_BC_2)) optimal_B = var[1] B_2.append(optimal_B) # Calculate O(t) if B_2[t] - IP_2[t] > 0: O_2.append(B_2[t] - IP_2[t]) else: O_2.append(0) #### RETAILER #### #recieve shipment from supplier, calculate items OH HAND if I_1[t-1]<0: OH_1.append(R_1[t]) else: OH_1.append(I_1[t-1]+R_1[t]) # Recieve and dispatch order, update Inventory and Backlog for time t if (D[t] +BL_1[t-1]) <= OH_1[t]: # No Backlog I_1.append(OH_1[t] - (D[t] + BL_1[t-1])) BL_1.append(0) R_0.append(D[t]+BL_1[t-1]) else: I_1.append(OH_1[t] - (D[t] + BL_1[t-1])) # Backlogged BL_1.append(-I_1[t]) R_0.append(OH_1[t]) # Update Inventory Position IP_1.append(IP_1[t-1] + O_1[t-1] - D[t]) # Use exponential smoothing to forecast future demand future_demand = (1-a)*F_1[t] + a*D[t] F_1.append(future_demand) # Calculate D_bar(t) and Var(t) Db_1.append((1/t)*sum(D[1:t+1])) s = 0 for i in range(1,t+1): s+=(D[i]-Db_1[t])**2 if t==1: # Var(1) = 0 var_1.append(0) else: var_1.append((1/(t-1))*s) # Simulation to determine B(t) S_BC_1 = [10000000000]*10 Run_1 = [0]*10 for B in range(10,500): S_OH_1 = OH_1[:] S_I_1 = I_1[:] S_R_1 = R_1[:] S_BL_1 = BL_1[:] S_IP_1 = IP_1[:] S_O_1 = O_1[:] # Update O(t)(the period just before the simulation begins) # using the B value for the simulation if B - S_IP_1[t] > 0: S_O_1.append(B - S_IP_1[t]) else: S_O_1.append(0) c=0 for i in range(t+1,t+sim+1): #simulate demand demand = -1 while demand <0: demand = random.normalvariate(F_1[t+1],(var_1[t])**(.5)) S_R_1.append(S_O_1[i-1]) # Receive simulated shipment, calculate simulated items on hand if S_I_1[i-1]<0: S_OH_1.append(S_R_1[i]) else: S_OH_1.append(S_I_1[i-1]+S_R_1[i]) # Receive and send order, update Inventory and Backlog (simulated) owed = (demand + S_BL_1[i-1]) S_I_1.append(S_OH_1[i] - owed) if owed <= S_OH_1[i]: # No Backlog S_BL_1.append(0) c += inv_cost*S_I_1[i] else: S_BL_1.append(-S_I_1[i]) # Backlogged c += bl_cost*S_BL_1[i] # Update Inventory Position S_IP_1.append(S_IP_1[i-1] + S_O_1[i-1] - demand) # Update Order, Upstream member dispatches goods if (B-S_IP_1[i]) > 0: S_O_1.append(B - S_IP_1[i]) else: S_O_1.append(0) # Log Simulation costs for that B-value S_BC_1.append(c) # If the simulated costs are increasing, stop if B>11: dummy = [] for i in range(0,10): dummy.append(S_BC_1[B-i]-S_BC_1[B-i-1]) Run_1.append(sum(dummy)/float(len(dummy))) if Run_1[B-3] > 0 and B>20: break else: Run_1.append(0) # Use minimum as your new B(t) var = min((val, idx) for (idx, val) in enumerate(S_BC_1)) optimal_B = var[1] B_1.append(optimal_B) # Calculate O(t) if B_1[t] - IP_1[t] > 0: O_1.append(B_1[t] - IP_1[t]) else: O_1.append(0) ### Calculate the Standard Devation of the last half of time periods ### def STD(numbers): k = len(numbers) mean = sum(numbers) / k SD = (sum([dev*dev for dev in [x-mean for x in numbers]])/(k-1))**.5 return SD start = (total//2)+1 # Only use the last half of the time periods to calculate the standard deviation I_STD_1_L.append(STD(I_1[start:])) I_STD_2_L.append(STD(I_2[start:])) I_STD_3_L.append(STD(I_3[start:])) I_STD_4_L.append(STD(I_4[start:])) O_STD_0_L.append(STD(D[start:])) O_STD_1_L.append(STD(O_1[start:])) O_STD_2_L.append(STD(O_2[start:])) O_STD_3_L.append(STD(O_3[start:])) O_STD_4_L.append(STD(O_4[start:])) from time import time timeB = time() timeleft(a,L,timeB-timeA) I_STD_1[L//2] = I_STD_1_L[:] I_STD_2[L//2] = I_STD_2_L[:] I_STD_3[L//2] = I_STD_3_L[:] I_STD_4[L//2] = I_STD_4_L[:] O_STD_0[L//2] = O_STD_0_L[:] O_STD_1[L//2] = O_STD_1_L[:] O_STD_2[L//2] = O_STD_2_L[:] O_STD_3[L//2] = O_STD_3_L[:] O_STD_4[L//2] = O_STD_4_L[:] CSV(a,L,I_STD_1,I_STD_2,I_STD_3,I_STD_4,O_STD_0, O_STD_1,O_STD_2,O_STD_3,O_STD_4) from time import time timeE = time() print("Run Time: ",(timeE-time0)/3600," hours")