Monday, February 15, 2010

STL Strings and reference counting ( again ! )

In the last post, I mentioned that I was (un)fortunate enough to get bitten by some trickiness that some folks put into their stl implementations :). Other than the common data structures and iterators, we (for some definition of "we" which mostly includes my team in my company ) hardly use any stl features. And the one data structure that is most ubiquitously used - well, because I can't imagine programming without it - is "The fun begins when stl implementations begin implementing strings using reference counting. I wrote about some nice (or not so nice) stuff happening related to reference counting some time back, though I guess some may argue that it may not be a common scenario. This post is going to speak about a more common scenario though.

It all starts when one plans to use a reference counted string in a multi-threaded application. The STL implementation has to guarantee that the reference counting is thread safe. The part of reference counting that is not thread safe by default is incrementing/decrementing the reference counters. Strings shared across threads can cause enough problems, and this is well documented and researched in many places. This document goes through how this objective is achieved in some STL implementations found across platforms we use.

The simplest way in which one can guard the reference count is to guard it using a mutex. And this is exactly what the Roguewave implementation in HP - UX ( the version shipped with HP UX is 2.1 ) does when the application is compiled with -mt option. This is the one of the best remedies to kill the performance of an application. As one of our profiling results show, one third of the time was spent in mutexes, a large portion of which was for these mutexes which were guarding the reference counts of strings ! And the worst part is that hardly any strings were being shared across threads - so it was a pure waste.

We had half a mind to actually write our own string classes from scratch with char*. But more specifically, what we were looking for was effectively something as mentioned in this blog post + the comments on that post : a way to enable/disable thread safety on the string class depending on the usage.


* HP - UX ( Roguewave 2.1 )

** Disable locks on reference counts}
Roguewave 2.1 does provide an option to disable locks on reference counts altogether. To achieve that, one has to compile the application with the following pre-processor definition -
__HPACC_SKIP_REFCNT_MUTEX. Now that is sort of ok - but not exactly what we were looking for as we do want to control thread safety in certain cases.

Update: It seems the roguewave 2.1 always uses atomic reference counts internally (on itanium) which is wrapped around by a mutex call. So compiling with the above flag will provide atomic reference counts provided by itanium's processor directives ! This should suffice in most cases, I guess :)

** Disable Reference Counts
Roguewave 2.1 also provides an option to disable reference counting altogether via the preprocessor definition - _RWSTD_NO_STRING_REF_COUNT . Now this one seems pretty interesting as one can always wrap a reference counted pointer implementation around it !

** Solaris ( StlPort )
On the Solaris machines that we use, the STL implementation used is STLPort (version 4). It seems there is no reference counted implementation in this version (at least), so no worries about thread safety.

** AIX
The STL implementation used on AIX is the one written by PJ Plauger ( Dinkumware ). It is shipped with customizations ( so says this page) by both IBM and MS ). There are many known issues with the string class shipped with Microsoft Visual C++ 6.0 at least ( documented pretty comprehensively with workaround et all here ).

On the AIX implementation that we use - the reference counting has been disabled altogether using preprocessor tricks :)

** Linux ( On Intel processors )
The STL implementation used is the one provided by GNU. Reference counting is enabled and is provided by means atomic builtins provided by gcc, which internally use the atomic functions provided by Intel

Atomic Builtins
Now, it does seem that atomic increments/decrements will be faster than the standard mutex drama that other solutions provide. These functions are normally implemented using assembly directly, and all platforms ( HP , AIX , Solaris ) do provide some form of it.

I didn't get much performance statistics on the internet from those who have used these operations. The only one I could find related to stl is the one provided by SUN, but the fact that atomic operations facilitated by the processor are faster in some cases and slower in some other doesn't sound nice. One other article I found, which had a pretty nifty way to implement atomic functions by hand using PowerPC aseembly had pretty good performance figures.

Edit: A comprehensive, but bit dated, performance anaylsis of the string class with various implementations (plain alloc, fast alloc, atomic increments, critical sections etc ) can be found here

Solution ?
So, I don't know what is the best solution :). As most of the implementations that we use do not have reference counting at all ( either naturally - in case of Solaris/AIX, or by force on Roguewave STL ), a good option seems to be wrapping the stl string class under hand made reference counted pointer implementations - thread safe and non-thread safe ones - the thread safe ones with atomic builtins perhaps.

To neutralize any changes in the implementation on multiple platforms, I guess we should be using one implementation on all platforms ( STLPort seems very attractive for this ) - but some things are not in my hand :)

Update: provides a very good comparative analysis of different approaches to implement a string class using actual numbers !


Anonymous Anonymous said...

We are struggling with the same situation where we have to use HP-UX aCC and have to use their RW STL which by default they say is thread-safe but we need to provide the locks at the application level anyway. So, instead of double locking - one at application level and one at the STL level, we want to get non-thread-safe version of the same RW STL for HP-UX. I would appreciate if you can give some insight of using any macros to achieve the same without refcount and locks in STL.

Monday, May 23, 2011 8:07:00 pm  

Post a Comment

<< Home