Switch from Microsofts STL to STLport

c++ windows visual-studio performance stl

134

Solution 1

I haven't compared the performance of STLPort to MSCVC but I'd be surprised if there were a significant difference. (In release mode of course - debug builds are likely to be quite different.) Unfortunately the link you provided - and any other comparison I've seen - is too light on details to be useful.

Before even considering changing standard library providers I recommend you heavily profile your code to determine where the bottlenecks are. This is standard advice; always profile before attempting any performance improvements!

Even if profiling does reveal performance issues in standard library containers or algorithms I'd suggest you first analyse how you're using them. Algorithmic improvements and appropriate container selection, especially considering Big-O costs, are far more likely to bring greater returns in performance.

Solution 2

Before making the switch, be sure to test the MS (in fact, Dinkumware) library with checked iterators turned off. For some weird reason, they are turned on by default even in release builds and that makes a big difference when it comes to performance.

Solution 3

In a project i worked on that makes quite heavy use of stl, switching to STLport resulted in getting things done in half the time it took with Microsoft's STL implementation. It's no proof, but it's a good sign of performance, i guess. I believe it's partly due to STLport's advanced memory management system.

I do remember getting some warnings when making this change, but nothing that couldn't be worked around fast. As a drawback, I'd add that debugging with STLport is less easy with Visual Studio's debugger than with Microsoft's STL (Update : it seems there is a way to explain to the debugger how to handle STLport containers, thanks Jalf !).

The latest version goes back to October 2008 so there are still people working on it. See here for downloading it.

Solution 4

I've done the exact opposite a year ago and here is why:

StlPort is updated very rarely (as far as I know only one developer is working on it, you can take a look at their commit history)
Problems building it whenever you switch to new Visual Studio release. You wait for the new make file or you create it yourself but sometimes you can't build it because of some configuration option that you're using. Then you wait for them to make it build.
When you submit a bug report you wait forever, so basically no support (maybe if you pay). You usually end up fixing it yourself, if you know how.
STL in Visual Studio has checked iterators and debug iterator support that is much better than the one in StlPort. This is where most of the slowdown comes from especially in debug. Checked iterators are enabled in both debug and release and this is not something everybody knows (you have to disable them yourself).
STL in Visual Studio 2008 SP1 comes with TR1 and you don't have this in StlPort
STL in Visual Studio 2010 uses rvalue references from C++0x and this is where you get a real performance benefit.

Solution 5

If you use the STLPort you will enter a world where every STL-based third party library you use will have to be recompiled with STLPort as well to avoid problems...

STLPort does have a different memory strategy, but if this is your bottleneck then your performance gain path is changing the allocator (switching to Hoard for example), not changing the STL.

View more solutions

134

Author by

Antoine T

Updated on June 07, 2022

Comments

Antoine T almost 2 years
I have dozens of dataframes I would like to merge with a "reference" dataframe. I want to merge the columns when they exist in both dataframes, or conversely, create a new one when they don't already exist. I have the feeling that this is closely related to this topic but I cannot figure out out to make it work in my case. Also, note that the key used for merging never contains duplicates.
```
# Reference dataframe
df = pd.DataFrame({'date_time':['2018-06-01 00:00:00','2018-06-01 00:30:00','2018-06-01 01:00:00','2018-06-01 01:30:00']})

# Dataframes to merge to reference dataframe
df1 = pd.DataFrame({'date_time':['2018-06-01 00:30:00','2018-06-01 01:00:00'],
                'potato':[13,21]})

df2 = pd.DataFrame({'date_time':['2018-06-01 01:30:00','2018-06-01 02:00:00','2018-06-01 02:30:00'],
                'carrot':[14,8,32]})

df3 = pd.DataFrame({'date_time':['2018-06-01 01:30:00','2018-06-01 02:00:00'],
                'potato':[27,31]})


df = df.merge(df1, how='left', on='date_time')
df = df.merge(df2, how='left', on='date_time')
df = df.merge(df3, how='left', on='date_time')
```
The result is :
```
              date_time  potato_x  carrot  potato_y
0  2018-06-01 00:00:00       NaN     NaN       NaN
1  2018-06-01 00:30:00      13.0     NaN       NaN
2  2018-06-01 01:00:00      21.0     NaN       NaN
3  2018-06-01 01:30:00       NaN    14.0      27.0 
```
While I would like :
```
              date_time  potato  carrot 
0  2018-06-01 00:00:00       NaN     NaN  
1  2018-06-01 00:30:00      13.0     NaN   
2  2018-06-01 01:00:00      21.0     NaN 
3  2018-06-01 01:30:00      27.0    14.0 
```
Edit (following @sammywemmy's answer): I have no idea what will be the dataframe columns name before importing them (in a loop). Usually, the dataframes that are merged with my reference dataframe contain about 100 columns, from which 90%-95% are common with the other dataframes.
- sammywemmy over 4 years
  
  so the final dataframe will have about 100 columns?
- Antoine T over 4 years
  
  Every new dataframe to be merge contains about 100 columns. Among these 100 columns, there might be 10 columns that have a name that is not present in the previous dataframes. So, assuming that I want to merge 15 dataframes, I will have at the end 100 columns + 15*10 = 250 columns
- sammywemmy over 4 years
  
  it seems the other columns are food names (potato, carrot,...) and the common key is date_time. 100 columns is a lot and i dont see how you can keep track of that. I suggest you write code that melts every dataframe, using date_time as your index_var, then perform the merge.
Antoine T over 4 years

I have no idea what will be the columns name before importing them. To be more precise, every new dataframe contains about 100 columns, from which 90%-95% are common with the other dataframes. I edited my question to add these information.
Antoine T over 4 years

I think your solution works only if the merged/concatenated dataframe is either fully similar or diffirent from df. For example, I wouldn't know how to deal with a dataframe such as : df3 = pd.DataFrame({'date_time':['2018-06-01 01:30:00', '2018-06-01 02:00:00'],'potato':[27,31], 'zucchini':[11,1]})