Wednesday, May 23, 2007

Virtual functions in C++




"In order to implement the concept of Polymorphism which is a corner-stone of OOP the C++ compiler has to find a way to make it possible."


Lets see how the story begins.

Derived classes inherit member functions of the base class but when some member functions are not exactly appropriate for the derived class they should provide their own version of these functions to override the immediate base class' functions and make their own objects happy. So if any of these functions is called for a derived object the compiler calls its class' version of that function.

This works quite fine when the types of objects are known at compile time so that the compiler knows which function to call for each particular object. The compiler knows where to find the copy of the function for each class and so the addresses used for these function calls are
settled at compile time. ( static binding )

Suppose that we have a lot of derived objects at different levels of the inheritance hierarchy that have a common base class and that they need to be instantiated at run time. Here the compiler does not know in advance what derived class objects to expect. These objects would be dynamically allocated and the code for handling these objects should be able to deal with all them.

It is perfectly legitimate to use base class pointers to point to these objects but that requires the compiler to handle them exactly the same way they would handle their base class objects. So they would call base class versions of member functions and none of the member functions specific for the derived class would be accessible.

To solve this problem
Virtual functions are used to allow dynamic binding.

"...It seems that our friend, the compiler of course, is very resourceful."


To support Polymorphism at runtime the compiler builds at compile time
virtual function tables ( vtables ). Each class with one or more virtual functions has a vtable that contains pointers to the appropriate virtual functions to be called for objects of that class. Each object of a class with virtual functions contains a pointer to the vtable for that class which is usually placed at the beginning of the object.

The compiler then generates code that will:

1. dereference the base class pointer to access the derived class object.
2. dereference its vtable pointer to access its class vtable.
3. add the appropriate offset to the vtable pointer to reach the desired function pointer.
4. dereference the function pointer to execute the appropriate function.

This allows dynamic binding as the call to a virtual function will be
routed at run time to the virtual function version appropriate for the class.

Impressive isn't it?

Well that made me try just for fun to write code that would do these steps instead of the compiler.

But as I did this another question evolved.

How does member functions get their "this" pointer ? ( pointer to the object the function is called for )

I know that the compiler should implicitly pass 'this' as an argument to the member function so that it can use it to access data of the object it is called for.

I used in my example a single virtual function that takes no arguments and returns void.
So at first I tried calling the destination virtual function with no arguments. The function was called already but the results showed it has used some false value for 'this' that pointed it somewhere other than the object and gave the wrong results.

So I tried calling the function and passing it the pointer to the object and it seemingly worked just fine.

Here's the code I tried...


#include <iostream>

using std::cout;
using std::endl;

class Parent {
public:
   Parent( int = 0, int = 0 );  // default constructor
   void setxy( int, int );
   int getx() const { return x; }
   int gety() const { return y; }
   virtual void print();
private:
   int x;
   int y;
};

Parent::Parent( int a, int b )
{
   setxy( a, b );
}

void Parent::setxy( int a, int b )
{
   x = ( a >= 0 ? a : 0 );
   y = ( b >= 0 ? b : 0 );
}

void Parent::print()
{
   cout << " [ x: " << a =" 0," b =" 0," c =" 0" d =" 0" z =" (">= 0 ? c : 0 );
   t = ( d >= 0 ? d : 0 );
}

void Child::print()
{
   Parent::print();
   cout << " [ z: " << int =" 0," int =" 0," int =" 0," int =" 0," int =" 0);" e =" (">= 0 ? num : 0 );
}

void GrandChild::print()
{
   Child::print();
   cout << " [ e: " << e << " ]";
}

int main()
{
   Parent parentObj( 7, 8 );
   Child childObj( 56, 23, 6, 12 );
   GrandChild grandchildObj( 4, 64, 34, 98, 39 );
   
   // declare an array of pointers to Parent
   Parent *parentPtr[ 3 ];          

   cout << "size of Parent = " << sizeof( Parent ) << " bytes\n";
   cout << "size of Child = " << sizeof( Child ) << " bytes\n";
   cout << "size of GrandChild = "
        << sizeof( GrandChild ) << " bytes\n";


   parentPtr[ 0 ] = &parentObj;      // direct assignment
   parentPtr[ 1 ] = &childObj;       // implicit casting
   parentPtr[ 2 ] = &grandchildObj;  // implicit casting

   cout << "\nThe Derived objects accessed by"
         " an array of pointers to Parent:\n\n";

   for ( int i = 0; i < 3; i++ ) {
      cout << "Object " << i + 1 << " : ";
      cout << "\tvtable ptr (" << *( ( void ** ) parentPtr[ i ] ) << ")\n" ;
                  // vtable ptr at the beginning of the object
    
      // initialize pointer to function
      void (* funptr ) ( Parent * ) = NULL;       

      // assign to it pointer to function in vtable    
      funptr = *( *( ( void (*** ) ( Parent * ) ) parentPtr[ i ] ) );
    
      cout << "\t\tpointer 1 in vtable is (" << ( void * ) funptr
           << ")\n\t\t( pointer to virtual function 1 'print()' )";

      cout << "\n\n\t\tdata: ";

      funptr( parentPtr[ i ] ); // call the 1st function in vtable
                               // and passing ( this ) to it
                              // without using parentPtr[ i ]->print();
      cout << "\n" << endl;
   }

   return 0;
}
The output should look like this:


size of Parent = 12 bytes
size of Child = 20 bytes
size of GrandChild = 24 bytes

The Derived objects accessed by an array of pointers to Parent:

Object 1 : vtable ptr (0043FD90)
pointer 1 in vtable is (00401480)
( pointer to virtual function 1 'print()' )

data: [ x: 7, y: 8]

Object 2 : vtable ptr (0043FD80)
pointer 1 in vtable is (004015B8)
( pointer to virtual function 1 'print()' )

data: [ x: 56, y: 23] [ z: 6, t: 12]

Object 3 : vtable ptr (0043FD70)
pointer 1 in vtable is (004016E6)
( pointer to virtual function 1 'print()' )

data: [ x: 4, y: 64] [ z: 34, t: 98] [ e: 39 ]



In order to reach the function pointer to the desired function ( print() ) the parentPtr of the object which normally points to its beginning had to be casted to type pointer to pointer to pointer to function before it was dereferenced to give the vtabel pointer and then dereferenced again to give the first pointer to function in the vtable.

Polymorphism uses virtual functions in another interesting way. Virtual functions enables us to create special classes for which we never intend to instantiate any objects. These classes are called abstract classes and they only used to provide an appropriate base class that passes a common interface and/or implementation to their derived classes.

Abstract classes are not specific enough to define objects. Concrete classes on the other hand have the specifics needed to have a real object. To make a base class abstract it must have one or more pure virtual functions which are those having = 0 added at the end of its function prototype.

virtual void draw() const = 0;


These pure virtual functions should be all overridden in the derived classes for these to be concrete ones or else they would be abstract classes too.

Suppose we have a base class Hardware. We can never draw, print the production date or price unless we know the exact type of hardware we're talking about. So it looks that class Hardware could make a good example for an abstract base class.

Another example could be class Furniture and it might look something like this:

Class Furniture {
public:
// ...
   virtual double getVolume() const = 0; // a pure virtual
                                          function
   virtual void draw() const = 0;       // another one here
// ...               
}

Here class Furniture definition contains only the interface and implementation to be inherited.
It even does not contain any data members.

That's it.
Hope you liked this article.

I will be happy to receive your comments.