Monday, February 15, 2010

STL Strings and reference counting ( again ! )

In the last post, I mentioned that I was (un)fortunate enough to get bitten by some trickiness that some folks put into their stl implementations :). Other than the common data structures and iterators, we (for some definition of "we" which mostly includes my team in my company ) hardly use any stl features. And the one data structure that is most ubiquitously used - well, because I can't imagine programming without it - is "The fun begins when stl implementations begin implementing strings using reference counting. I wrote about some nice (or not so nice) stuff happening related to reference counting some time back, though I guess some may argue that it may not be a common scenario. This post is going to speak about a more common scenario though.

It all starts when one plans to use a reference counted string in a multi-threaded application. The STL implementation has to guarantee that the reference counting is thread safe. The part of reference counting that is not thread safe by default is incrementing/decrementing the reference counters. Strings shared across threads can cause enough problems, and this is well documented and researched in many places. This document goes through how this objective is achieved in some STL implementations found across platforms we use.

The simplest way in which one can guard the reference count is to guard it using a mutex. And this is exactly what the Roguewave implementation in HP - UX ( the version shipped with HP UX is 2.1 ) does when the application is compiled with -mt option. This is the one of the best remedies to kill the performance of an application. As one of our profiling results show, one third of the time was spent in mutexes, a large portion of which was for these mutexes which were guarding the reference counts of strings ! And the worst part is that hardly any strings were being shared across threads - so it was a pure waste.

We had half a mind to actually write our own string classes from scratch with char*. But more specifically, what we were looking for was effectively something as mentioned in this blog post + the comments on that post : a way to enable/disable thread safety on the string class depending on the usage.

Implementations

* HP - UX ( Roguewave 2.1 )

** Disable locks on reference counts}
Roguewave 2.1 does provide an option to disable locks on reference counts altogether. To achieve that, one has to compile the application with the following pre-processor definition -
__HPACC_SKIP_REFCNT_MUTEX. Now that is sort of ok - but not exactly what we were looking for as we do want to control thread safety in certain cases.

Update: It seems the roguewave 2.1 always uses atomic reference counts internally (on itanium) which is wrapped around by a mutex call. So compiling with the above flag will provide atomic reference counts provided by itanium's processor directives ! This should suffice in most cases, I guess :)

** Disable Reference Counts
Roguewave 2.1 also provides an option to disable reference counting altogether via the preprocessor definition - _RWSTD_NO_STRING_REF_COUNT . Now this one seems pretty interesting as one can always wrap a reference counted pointer implementation around it !

** Solaris ( StlPort )
On the Solaris machines that we use, the STL implementation used is STLPort (version 4). It seems there is no reference counted implementation in this version (at least), so no worries about thread safety.

** AIX
The STL implementation used on AIX is the one written by PJ Plauger ( Dinkumware ). It is shipped with customizations ( so says this page) by both IBM and MS ). There are many known issues with the string class shipped with Microsoft Visual C++ 6.0 at least ( documented pretty comprehensively with workaround et all here ).

On the AIX implementation that we use - the reference counting has been disabled altogether using preprocessor tricks :)

** Linux ( On Intel processors )
The STL implementation used is the one provided by GNU. Reference counting is enabled and is provided by means atomic builtins provided by gcc, which internally use the atomic functions provided by Intel

Atomic Builtins
Now, it does seem that atomic increments/decrements will be faster than the standard mutex drama that other solutions provide. These functions are normally implemented using assembly directly, and all platforms ( HP , AIX , Solaris ) do provide some form of it.

I didn't get much performance statistics on the internet from those who have used these operations. The only one I could find related to stl is the one provided by SUN, but the fact that atomic operations facilitated by the processor are faster in some cases and slower in some other doesn't sound nice. One other article I found, which had a pretty nifty way to implement atomic functions by hand using PowerPC aseembly had pretty good performance figures.

Edit: A comprehensive, but bit dated, performance anaylsis of the string class with various implementations (plain alloc, fast alloc, atomic increments, critical sections etc ) can be found here

Solution ?
So, I don't know what is the best solution :). As most of the implementations that we use do not have reference counting at all ( either naturally - in case of Solaris/AIX, or by force on Roguewave STL ), a good option seems to be wrapping the stl string class under hand made reference counted pointer implementations - thread safe and non-thread safe ones - the thread safe ones with atomic builtins perhaps.

To neutralize any changes in the implementation on multiple platforms, I guess we should be using one implementation on all platforms ( STLPort seems very attractive for this ) - but some things are not in my hand :)

Update: http://www.gotw.ca/gotw/045.htm provides a very good comparative analysis of different approaches to implement a string class using actual numbers !

Monday, December 21, 2009

The joys of reference counting

What is the output of the below program ?


#include <iostream>
#include <string>

using namespace std;

struct ABCD {
string internalString;
};

int main() {

string foobar("foobar");

ABCD* abcd1 = new ABCD;
abcd1->internalString = foobar;

ABCD* abcd2 = abcd1;

delete abcd1;
delete abcd2;

string barfoo("foo");

cout<<foobar;
}


On the machine that I ran it - the output was "foo" :)

The Problem

One of the applications I was working with has a singleton class which stores some reference data for global access. The singleton contains a map whose key is a string which represents some code and the value is information about the code. The problem was that after processing some data, some of the codes used to go missing from the map. There were some fascinating things to note about the problem :
  • The map had the code as the key of the map. This key was stored by value.
  • The entry for the code that went missing was not removed from the map.
  • The value of the key where the code earlier used to reside, changed to a valid, beautiful looking ( well, it didn’t look beautiful when I was debugging the problem :) ) string. No corruptions. No dingbats.
As we all know, most stl implementations of map are a balanced binary tree implemented using red black trees. Which means searching the map by the key will involve comparing the value to be searched with other members of the map. The other members decide whether to go right, left or wherever to continue searching in the red black tree. Now, if the value to be searched for happened to be anywhere near the sphere of influence of the which went missing earlier, then the probability that this value to be searched will be reported as missing goes up ten folds. And it did happen. So, we had a beautiful problem to solve ( it’s easier to
say this after it’s solved :) )

Through the Looking-Glass, and What a C++ Developer Found There

I don’t care whether they call Richard Stallman an idiot, a fanatic , non-pragmatic buffoon or whatever. First he created emacs to save us from $sucky editors. and later he created gdb to make Bjarne Stroustrup’s creation more tolerable. And as HP took the liberty to add run time checking (RTC) to help solve all the chaos inflicted by C++ - there still was some hope ( The problem was reported on HP Itanium )

I ran the program from within gdb with honest intentions to find memory corruptions. And I found a double delete ! A confused C++ developer commented out all related portions of the code which was deleting stuff to check what happens - and all was well in the world again. What the .. ? The only plausible explanation to the case of the missing code - which was stored by value - seemed to be related to reference counting.
Thus the sample program was born.

The Annotated C++ Program

STL strings are inherently reference counted. Every time it’s copied - the reference count is incremented. Every time it is destroyed - the reference count is decremented. Once the reference count goes down to 0, the memory is freed.


#include <iostream>
#include <string>

using namespace std;

struct ABCD {
string internalString;
};

int main() {

string foobar("foobar"); // reference count is 1

ABCD* abcd1 = new ABCD;
abcd1->internalString = foobar; // reference count is incremented to 2

ABCD* abcd2 = abcd1; // no change in reference count here

delete abcd1; // reference count decremented to 1
delete abcd2; // So does it crash ? Does it crash ?
// No !!!! reference count decremented to 0 and
// memory released to the wild to be reallocated

string barfoo("foo"); // aha, reallocation algorithm finds a small enough
// place for this in the same location where foobar
// had its string

cout<<foobar; // so, now can you guess what this will print ?
}



All those who have perfect 20:20 vision, will ask whether we have a double free in hand when the string foobar goes out of scope. The answer is a big YES.

This can be explained by the following two snippets of code. The code is specific to Roguewave STL - the implementation of STL used in HP Itanium in my office.


#define private public // to access all the private stuff
// of std::string
#include <iostream>
#include <string>

using namespace std;

struct ABCD {
string internalString;
};

int main() {

string foobar("foobar");

ABCD* abcd1 = new ABCD;
abcd1->internalString = foobar;

ABCD* abcd2 = abcd1;

delete abcd1;

// access the internal reference count
cout<<"abcd2's reference count is :"
<<abcd2->internalString._C_pref()->__references()
<<endl;

delete abcd2;

cout<<"foobar's reference count is :"
<<foobar._C_pref()->__references()
<<endl;
// if foobar had gone out of scope here ,
// we'd be having a double delete condition.

string barfoo("foo");

// As both foobar and barfoo will be pointing to the same thing
// and as barfoo "captured" foobar's memory
// both of the below reference counts will be 1
cout<<"barfoo's reference count is :"
<<barfoo._C_pref()->__references()
<<endl;
cout<<"foobar's reference count is :"
<<foobar._C_pref()->__references()
<<endl;

cout<<foobar;

// When either of foobar goes out of scope
// it will cause a double delete.
}



The output is :

abcd2's reference count is : 1
foobar's reference count is : 0
barfoo's reference count is : 1
foobar's reference count is : 1
foo


Now, on destruction, the destructor of the basic_string template needs to decide whether to deallocate the memory or not. It decides this by evaluating the following code. Again, this is specific to Roguewave STL


if( _C_pref()->__references() == 0 ||
_C_pref()->__removeReference() == 0)


The second part of the or clause decrements the reference count and checks f it has reached 0 - this happens when we deleted abcd2. The first part happens when foobar is auto destroyed.

One size doesn’t fit all

When all explanations fail, only one prevails - Implementation Specific :). As with so many things which involve STL and C++, even the above problem is platform/implementation specific.
  • When the same program runs on Linux it segfaults on the line delete abcd2.
  • The program runs smoothly on AIX. The stl code was too cryptic for me to understand. It seemed that the problem should have occured in AIX too from what I understood from the stl code, but it didn’t happen. There can be two explanations to this. 1. Either I didn’t understand the code, which I’m almost sure of. or 2. The memory allocated for barfoo didn’t occupy the same as the one for foobar. I even tried to allocate 10000 strings just to check, but the problem failed to appear
  • The scenario was reproduced in Solaris as the STL implementation used in Solaris is Roguewave STL too (in the machines I tried on). The same behaviour was not observed when I changed the implementation to stlport4 though. Yet to check why.
  • When the program is compiled with -DRWSTD_NO_STRING_REF_COUNT i.e reference counting for strings is disabled in Itanium, everything works like a charm, as expected.
Memory related problems never fail to create enough drama to prevent documenting it. Every line in the program explained was atleast 10 call stacks away from each other in the actual code in which the problem happened. Once upon a time with C, only human errors seemed to cause tricky problems to solve. With C++, and more so with STL, the extra level of abstractions and automatic memory management baggage have added the ++ to the trickiness too, it seems.

Wednesday, March 11, 2009

Ah, Virtual Memory ...

I seem to be on some sort of a hot treasure trail finding gold at the speed of thought. I found this no-words-to-praise explanation of how the kernel handles virtual memory/paging and all that niceties , written by Jeff Berryman, University of British Columbia, more than 3 decades ago, and lying in many places on the interweb now.
THE PAGING GAME

Rules

1. Each player gets several million things.

2. Things are kept in crates that hold 2048 things each. Things in
the same crate are called crate-mates.

3. Crates are stored either in the workshop or warehouse. The workshop is
almost always too small to hold all the crates.

4. There is only one workshop but there may be several warehouses.
Everybody shares them.

5. Each thing has its own thing number.

6. What you do with a thing is to zark it. Everybody takes turns zarking.

7. You can only zark your things, not anybody else's.

8. Things can only be zarked when they are in the workshop.

9. Only the Thing King knows whether a thing is in the workshop or in a
warehouse.

10. The longer a thing goes without being zarked, the grubbier it
is said to become.

11. The way you get things is to ask the Thing King. He only gives out
things in multiples of eight. This is to keep the royal overhead down.

12. The way you zark a thing is to give it thing number. If you give the
number of a thing that happens to be in a workshop it gets zarked right
away. If it is in a warehouse, the Thing King packs the crate containing
your thing back into the workshop. If there is no room in the workshop, he
first finds the grubbiest crate in the workshop, whether it be yours or
somebody else's, and packs it off with all its crate-mates to a warehouse.
In its place he puts the crate containing your thing. Your thing then gets
zarked and you never knew that it wasn't in the workshop all along.

13. Each player's stock of things have the same numbers as everybody else's.
The Thing King always knows who owns what thing and whose turn it is, so you
can't ever accidentally zark somebody else's thing even if it has the same
number as one of yours. (VS/2)

Notes

1. Traditionally, the Thing King sits at a large, segmented table and is
attended to by pages (the so-called "table pages") whose job it is to help
the king remember where all the things are and who they belong to.

2. One consequence of Rule 13 is that everybody's thing numbers will be
similar from game to game, regardless of the number of players.

3. The Thing King has a few things of his own, some of which move back and
forth between workshop and warehouse just like anybody else's, but some of
which are just too heavy to move out of the workshop.

4. With the given set of rules, oft-zarked things tend to get kept
mostly in the workshop while little-zarked things stay mostly in a warehouse.
This is efficient stock control.

5. Sometimes even the warehouses get full. The Thing King then has to start
piling things on the dump out back. This makes the game slower because it
takes a long time to get things off the dump when they are needed in the
workshop. A forthcoming change in the rules will allow the Thing King to
select the grubbiest things in the warehouses and send them to the dump in
his spare time, thus keeping the warehouses from getting too full. This
means that the most infrequently-zarked things will end up in the dump so
the Thing King won't have to get things from the dump so often. This should
speed up the game when there are a lot of players and the warehouses are
getting full. (Not applicable to VS/1)

LONG LIVE THE THING KING

De-bug-ging

Seemingly, with nothing original that I can think of myself, I am finding lots of things I find here and there pretty interesting. Especially this one on some mailing list that I found via reddit, which describes one of the best ways of debugging ;)

We called it the Rubber Duck method of debugging. It goes like this:

1) Beg, borrow, steal, buy, fabricate or otherwise obtain a rubber duck
(bathtub variety)
2) Place rubber duck on desk and inform it you are just going to go over
some code with it, if that's all right.
3) Explain to the duck what you code is supposed to do, and then go into
detail and explain things line by line
4) At some point you will tell the duck what you are doing next and then
realise that that is not in fact what you are actually doing. The duck
will sit there serenely, happy in the knowledge that it has helped you
on your way.

Works every time.

To think of it, I missed an opportunity to use my cat more effectively ;)

Sunday, March 08, 2009

Sharia, anyone ?

Vir Sanghvi had writen this nice article (as usual) in the Hindustan Times editorial page, regarding the sorry state of Pakistan. There was this small part about the imposition of the Sharia Law in the infamous Swat valley, and Imran Khan's comments on it.

Imran Khan (Keble College, Oxford, 1973-76) even declared that sharia law would be better because justice would be dispensed more swiftly! (I know this is politically incorrect but the Loin of the Punjab’s defence of sharia law reminded me of the famous Private Eye cover when his marriage to Jemima Goldsmith was announced. The Eye carried a picture of Khan speaking to Jemima’s father. “Can I have your daughter’s hand?” Imran was supposedly asking James Goldsmith. “Why? Has she been caught shoplifting?” Goldsmith replied. So much for sharia law.)

Ah, Sharia, how funny you are.

Sunday, February 03, 2008

Sons of Soil

The weekly spoof news of CNN-IBN had an awesome joke on the `sons of soil` stupidity.
It went something like,

Along with other cities in India, Mumbai is facing record low temperatures. This can be attributed to the cold winds blowing from North India. Shiv Sena is, hence, protesting ....

Monday, June 18, 2007

It's that time of the year ...

It's that time of the year again...

When I have to shift my tent again. Worry whether the cat will run away during the shifting process. Get a new internet connection ( hopefully not ).
Hopefully the next home will not be as dark and gloomy as the current one.

Why am I shifting ? Because the tenants gave some foobar reason why they want their house back.
I wish searching for a new rented house was as easy as a google search.

Thursday, December 14, 2006

Reliance Rocks !

It's been about 6 months since I moved from my previous home. The old building is no more now. All i can see there is air.
But well, Reliance can make money out of thin air !! How else can they send us an electricity bill amounting to Rs. 2000 when there's no meter to read !!
I was laughing when my previous neighbour got a similar bill 2 months back. Last month we got it, and this month one more neighbour was sent a BIG bill.

It's a pity (for Reliance) that there were only 8 houses in my previous building. Not too much scope for them.