Monday, December 21, 2009

The joys of reference counting

What is the output of the below program ?


#include <iostream>
#include <string>

using namespace std;

struct ABCD {
string internalString;
};

int main() {

string foobar("foobar");

ABCD* abcd1 = new ABCD;
abcd1->internalString = foobar;

ABCD* abcd2 = abcd1;

delete abcd1;
delete abcd2;

string barfoo("foo");

cout<<foobar;
}


On the machine that I ran it - the output was "foo" :)

The Problem

One of the applications I was working with has a singleton class which stores some reference data for global access. The singleton contains a map whose key is a string which represents some code and the value is information about the code. The problem was that after processing some data, some of the codes used to go missing from the map. There were some fascinating things to note about the problem :
  • The map had the code as the key of the map. This key was stored by value.
  • The entry for the code that went missing was not removed from the map.
  • The value of the key where the code earlier used to reside, changed to a valid, beautiful looking ( well, it didn’t look beautiful when I was debugging the problem :) ) string. No corruptions. No dingbats.
As we all know, most stl implementations of map are a balanced binary tree implemented using red black trees. Which means searching the map by the key will involve comparing the value to be searched with other members of the map. The other members decide whether to go right, left or wherever to continue searching in the red black tree. Now, if the value to be searched for happened to be anywhere near the sphere of influence of the which went missing earlier, then the probability that this value to be searched will be reported as missing goes up ten folds. And it did happen. So, we had a beautiful problem to solve ( it’s easier to
say this after it’s solved :) )

Through the Looking-Glass, and What a C++ Developer Found There

I don’t care whether they call Richard Stallman an idiot, a fanatic , non-pragmatic buffoon or whatever. First he created emacs to save us from $sucky editors. and later he created gdb to make Bjarne Stroustrup’s creation more tolerable. And as HP took the liberty to add run time checking (RTC) to help solve all the chaos inflicted by C++ - there still was some hope ( The problem was reported on HP Itanium )

I ran the program from within gdb with honest intentions to find memory corruptions. And I found a double delete ! A confused C++ developer commented out all related portions of the code which was deleting stuff to check what happens - and all was well in the world again. What the .. ? The only plausible explanation to the case of the missing code - which was stored by value - seemed to be related to reference counting.
Thus the sample program was born.

The Annotated C++ Program

STL strings are inherently reference counted. Every time it’s copied - the reference count is incremented. Every time it is destroyed - the reference count is decremented. Once the reference count goes down to 0, the memory is freed.


#include <iostream>
#include <string>

using namespace std;

struct ABCD {
string internalString;
};

int main() {

string foobar("foobar"); // reference count is 1

ABCD* abcd1 = new ABCD;
abcd1->internalString = foobar; // reference count is incremented to 2

ABCD* abcd2 = abcd1; // no change in reference count here

delete abcd1; // reference count decremented to 1
delete abcd2; // So does it crash ? Does it crash ?
// No !!!! reference count decremented to 0 and
// memory released to the wild to be reallocated

string barfoo("foo"); // aha, reallocation algorithm finds a small enough
// place for this in the same location where foobar
// had its string

cout<<foobar; // so, now can you guess what this will print ?
}



All those who have perfect 20:20 vision, will ask whether we have a double free in hand when the string foobar goes out of scope. The answer is a big YES.

This can be explained by the following two snippets of code. The code is specific to Roguewave STL - the implementation of STL used in HP Itanium in my office.


#define private public // to access all the private stuff
// of std::string
#include <iostream>
#include <string>

using namespace std;

struct ABCD {
string internalString;
};

int main() {

string foobar("foobar");

ABCD* abcd1 = new ABCD;
abcd1->internalString = foobar;

ABCD* abcd2 = abcd1;

delete abcd1;

// access the internal reference count
cout<<"abcd2's reference count is :"
<<abcd2->internalString._C_pref()->__references()
<<endl;

delete abcd2;

cout<<"foobar's reference count is :"
<<foobar._C_pref()->__references()
<<endl;
// if foobar had gone out of scope here ,
// we'd be having a double delete condition.

string barfoo("foo");

// As both foobar and barfoo will be pointing to the same thing
// and as barfoo "captured" foobar's memory
// both of the below reference counts will be 1
cout<<"barfoo's reference count is :"
<<barfoo._C_pref()->__references()
<<endl;
cout<<"foobar's reference count is :"
<<foobar._C_pref()->__references()
<<endl;

cout<<foobar;

// When either of foobar goes out of scope
// it will cause a double delete.
}



The output is :

abcd2's reference count is : 1
foobar's reference count is : 0
barfoo's reference count is : 1
foobar's reference count is : 1
foo


Now, on destruction, the destructor of the basic_string template needs to decide whether to deallocate the memory or not. It decides this by evaluating the following code. Again, this is specific to Roguewave STL


if( _C_pref()->__references() == 0 ||
_C_pref()->__removeReference() == 0)


The second part of the or clause decrements the reference count and checks f it has reached 0 - this happens when we deleted abcd2. The first part happens when foobar is auto destroyed.

One size doesn’t fit all

When all explanations fail, only one prevails - Implementation Specific :). As with so many things which involve STL and C++, even the above problem is platform/implementation specific.
  • When the same program runs on Linux it segfaults on the line delete abcd2.
  • The program runs smoothly on AIX. The stl code was too cryptic for me to understand. It seemed that the problem should have occured in AIX too from what I understood from the stl code, but it didn’t happen. There can be two explanations to this. 1. Either I didn’t understand the code, which I’m almost sure of. or 2. The memory allocated for barfoo didn’t occupy the same as the one for foobar. I even tried to allocate 10000 strings just to check, but the problem failed to appear
  • The scenario was reproduced in Solaris as the STL implementation used in Solaris is Roguewave STL too (in the machines I tried on). The same behaviour was not observed when I changed the implementation to stlport4 though. Yet to check why.
  • When the program is compiled with -DRWSTD_NO_STRING_REF_COUNT i.e reference counting for strings is disabled in Itanium, everything works like a charm, as expected.
Memory related problems never fail to create enough drama to prevent documenting it. Every line in the program explained was atleast 10 call stacks away from each other in the actual code in which the problem happened. Once upon a time with C, only human errors seemed to cause tricky problems to solve. With C++, and more so with STL, the extra level of abstractions and automatic memory management baggage have added the ++ to the trickiness too, it seems.

2 Comments:

Blogger Swati said...

you haven't written what happens on Mac! :(

Tuesday, January 05, 2010 8:30:00 pm  
Blogger Swati said...

I'm yet to give the next blog a second reading and understand. Meanwhile this post was excellent. thanks for the piece of info.

keep blogging. often.

And why am I not able to comment on your other post?

Sunday, January 10, 2010 11:17:00 pm  

Post a Comment

<< Home