Variants, Safearrays, and BSTR’s

VARIANTs, SAFEARRAYs, and BSTRs, Oh My!

(Click here to
go back to the index page for my white papers.)

VARIANTs, SAFEARRAYs, and BSTRs, Oh My!

or, an Introduction to Common OLE Data Types for the C++ Programmer

by Rob Locher

 

So far, all the textbooks I have seen that talk about VARIANTs,
SAFEARRAYs, and BSTRs tend to lead the C++ programmer learning COM
down the primrose path, by implying that wrapper classes such as CComVariant
or _bstr_t are going to solve all your problems.  Maybe such a
class will, if you have a simple case, but what if you have to pass a VARIANT
holding a SAFEARRAY of BSTRs?  Well, then you need to understand
what  VARIANTs, SAFEARRAYs, and BSTRs really are, unless you
want to wait around for somebody to create a wrapper class for VARIANTs
holding SAFEARRAYs of BSTRs.

For the rest of this article I will drop the capitalization of VARIANT,
SAFEARRAY, and BSTR for readability purposes most of the time, and I might call
them “OLE types” or “VB types”.

Why are variants, safearrays, and bstrs so hard to use?  The problem, of
course, is that those OLE types aren’t simple to use at all, if you are a C++
programmer.  If you are coding in Visual Basic, then the VB runtime takes
care of that particular complexity for you.  (Even then I’ve heard that
OLE types can be tricky sometimes.)  If you as a C++ programmer are going
to use them, you will have to bear the same burden as the coders of the VB
runtime.  In other words, you will have to understand what the VB types
actually are, and follow the poorly-documented rules about how to use
them.  Otherwise, you will find yourself causing memory leaks, or possibly
even using invalid pointers.

In the discussion that follows, I will first talk about the fundamental types
and how to use them, and then I will discuss the shortcuts (helper classes) and
their limitations.

 

BSTRs

What is a BSTR?  A bstr is a pointer to a string of wide characters (not
char
).  The string is not terminated.  Instead, the length of
the string is stored as an unsigned long (four bytes) just before the
first character of the string.  Note that this is not how you would do it
in C++; in C++, the pointer would be to the first member of the structure, the
unsigned long, and not to the second.  You might think that you could
create a bstr the C++ way, by creating a structure and then returning a pointer
to the second element, cast to bstr, but you should never do this; the memory
pointed to by a bstr is actually owned by Windows.  Because a bstr’s
memory is owned by Windows, you can safely pass a bstr (which is a pointer)
between processes.  Instead, you can use the function SysAllocString()
to create a bstr, and the function SysFreeString() to destroy it
properly.  In fact, you should use only platform SDK functions (or helper
classes that use them internally) to manipulate bstrs; see the platform
SDK help topic “String
Manipulation Functions
“.

It is a convention that a null pointer is a legal bstr, that represents an
empty string.  A bstr must always either be a null pointer or point to an
actual allocated bstr; it should never be a random uninitialized pointer.
There are also functions to reallocate or change a bstr.  As far as I can
see, most of the string manipulation functions normally taken for granted, such
as finding a substring, comparing two strings, and so on, are
missing.  Most significantly, there seems to be no function to copy a
bstr.  There are lots of functions to convert things to and from bstrs;
see the platform SDK help topic “Data
Type Conversion APIs
“, the help topic “ATL
and MFC String Conversion Macros
“, and the help topics for the functions

ConvertBSTRToString()
and
ConvertStringToBSTR()
.

When you pass a bstr across a COM boundary, you must give careful thought as
to whether the client or the server should be responsible for allocating and
deallocating the bstr.  Certain rules have been established by convention
that determine who should create the bstr, and who should free it.
Unfortunately, these rules are poorly documented– see the platform SDK help
topic “Allocating
and Releasing Memory for a BSTR
” for guidelines.  Generally
speaking, the client is responsible for the bstr.  If the client is
passing a read-only string to the server, then usually it is passed as a
bstr.  If the server is expected to change a string, then it might make
sense to have the client pass a pointer-to-bstr.  If there is any
confusion, you should probably test the server carefully against Visual Basic
to insure that the rules are being followed.  (You can also import the
type library into a C++ project and examine the wrapper classes generated.)

There are two helper classes available when using bstrs: the “native COM
support” class _bstr_t, and the ATL class CComBSTRCComBSTR
has a helpful CopyTo() method that fills a pointer-to-bstr
properly, which is very useful for [out] parameters.  Otherwise,
the two classes are very similar.  Both will take care of allocating and
deallocating the wrapped bstr in the class constructor and destructor.
They also can be used to take charge of an existing bstr with the Attach()
method, or can be made to abandon the bstr with the Detach() method.
They have helpful operators and methods to compare two strings, check for
equality, copy a string, and so on.  In many cases, _bstr_t and
CComBSTR
instances can be passed as bstr substitutes — please read the
help pages on those classes carefully, because improperly attempting to use the
classes as bstr substitutes can cause memory leaks.

VARIANTs

What the heck is a variant?  A variant is a structure containing a union
member, and an unsigned integer member that describes which member of the union
is currently being used.  (I’m oversimplifying a bit I think, but the
oversimplification has gotten me through so  far.)  If you don’t know
what a union is,
read about it
first, and then come back.  (But don’t feel bad,
because I’ve never used a union except as a variant and I don’t know of anybody
else having done so either.)  You might want to look up the system header
file oaidl.h or the documentation topic “VARIANT
and VARIANTARG
“, which shows the juicy bits of the header file.  All
the typedefs and conditional defines in the header file are quite confusing,
but what really matters is that there is a member vt which shows what
member of the union is being used, and then one of the union members (llVal,
lVal, bVal, iVal, etc.) is actually holding the
data, or a pointer to the data.  By the way, VARIANT and VARIANTARG are
interchangeable.

The unsigned integer member that tells you what type the variant is actually
holding, vt, is itself a bit confusing.  The various legal values
that it can hold are enumerated in the system header file wtypes.h.
If you look at the enumeration VARENUM in that file, you will see that
it is possible to combine certain values.  (The bitwise or operator “|
” is usually used to do the combining, but it seems to me that addition would
work just as well.)  The common values that I know of that are combined
with other values are VT_ARRAY and VT_BYREF.  If vt
equals VT_ARRAY | (something), then it means that the variant contains
a safearray of (something). For example, (vt == VT_ARRAY | VT_BSTR) means
that you are passing a safearray of bstr.  Similarly, if vt equals
VT_BYREF | (something), it means that you are passing (something) by
reference.  In this case, you are explicitly passing a type-safe pointer,
where (something) indicates the type pointed to.

By now, hopefully we can infer that the idea of a variant is to provide Visual
Basic with a generic variable type that doesn’t waste too much space, that
can hold just about anything, including a pointer to or an array of just
about anything.  Although a variant is general-purpose, it is still
possible for a variant to be marshalled, because it is always possible to
determine how much space it uses.  Or, if the variant holds a pointer to
something, or a safearray of something, it is still possible to determine how
much space the value pointed to takes up, because that might have to be
marshalled too.

Thanks to the way variants support weakly-typed languages, it is legal to
change the type of the variant when it is holding a value.  This is known
as type coercion.  For example, you could coerce a variant holding the
bstr “3.0″ to be type  VT_R8 (double).  When a variant is
coerced, what is happening inside the variant is that vt is being
changed, and also the data is being converted internally.  In the example,
the bstr pointed to by the pbstrVal element would be freed, and an
eight byte floating point representation of the number 3.0 would be placed in
the dblVal element.  The reason I mention coercion is that if you
have a COM object written in C++ that accepts a variant from a Visual Basic
client, the variant may have to be coerced to the type that you expect.
The functions VariantChangeType() and VariantChangeTypeEx() are
very useful if you have to do type coercion.

If a variant is holding a bstr, then the variant owns the bstr, and properly
deallocating the variant will result in the bstr being deallocated.  If a
variant holds a pointer, that is to say vt is VT_BYREF | (something),
then the variant is being used to explicitly pass the pointer, and the variant
does not own the memory pointed to.  If a variant is holding a
safearray, then the variant owns the safearray, and properly deallocating the
variant will result in the safearray being deallocated.  See the help
topic “Variant
Manipulation API Functions
“.  That topic also mentions the API
functions that are provided to use variants.

If you want to create and manipulate a variant yourself, without benefit of a
helper class, then here is how you do it (mostly adapted from the book The
COM and COM+ Programming Primer
, by Alan Gordon, published by Prentice
Hall PTR):

  1. If your variant is going to explicitly pass a pointer to something,
    declare and set the memory to which the pointer will point.
  2. Declare a variant, or use the new
    operator to create a pointer to a variant.
  3. Initialize the variant by calling VariantInit().
  4. Set vt
    in the variant to be the proper value.
  5. Set the member variable in the variant that corresponds to the type in vt.
  6. Use or pass your variant.
  7. If the variant was used to explicitly pass a pointer, i.e. (vt & VT_BYREF
    != 0)
    , then free the memory pointed to.
  8. Use VariantClear()
    to free any resources that the variant owns.
  9. Let the variant go out of scope, or use the delete operator on its
    pointer, the same as you would for any other variable.

If that seems to you to be a lot of work, I agree with you.  Fortunately,
there are two helper classes, CComVariant and _variant_t,
that you can use to make things easier.  You can simply pass the pointer
or value that you wish to wrap to the constructor, and the class will create a
variant to wrap the value or pointer, and take care of the variant.
Please note though that since a variant that wraps a pointer is not responsible
for the memory to which it points, neither is a _variant_t or CComVariant
instance.  Both classes support Attach() and Detach() methods
similar to the bstr wrapper classes.  Attach() lets an instance
of the class take charge of a pre-existing variant.  Detach() forces
the instance to abandon its variant.  Both classes also have a ChangeType()
method that can be used to coerce the wrapped variant.

I would be remiss if I didn’t mention an
excellent article
pointed out to me, written by Microsoft’s Bruce
McKinney in 1996 and still appropriate, that talks about variants from both the
Visual C++ and the Visual Basic points of view.  It apparently was written
before CComVariant and _variant_t came around, and does an
excellent job of describing how variants really work.

 

SAFEARRAYs

If you understand bstrs and variants, then you shouldn’t have much trouble
with safearrays.  SAFEARRAY was created to suit the needs of Visual Basic
and other weakly-typed languages for a type-safe array of one or more
dimensions of arbitrary bounds.  Note that it is not legal to pass a
safearray by itself via an IDispatch interface; for automation
purposes, a safearray is only legal if it is wrapped by a variant.  (The
ATL wizard won’t let you do it.)  If you are wrapping a safearray with a
variant, then the member vt of the variant should be VT_ARRAY
bitwise-or’ed (“|“) with (something), where (something) corresponds to
the type of the elements of the array.  A safearray is responsible
for its contents; properly deallocating a safearray will result in its contents
being safely deallocated.  Since a safearray should always be wrapped by a
variant, and a variant is responsible for its contents as long as the variant
doesn’t wrap a pointer, then properly deallocating the variant holding the
safearray will properly take care of the variant, the safearray, and the
contents of the safearray.

A safearray can only hold one type at a time, as you might
guess; however, a safearray can hold variants, so that rule really isn’t
much of a restriction.  In fact, if you wanted to, you could have a
multi-dimensional safearray holding all different kinds of variants, some of
which could themselves be safearrays, which in turn could hold other things…
but I recommend not making things that complicated if it can be avoided.

So, by now you probably are wondering what a safearray really is. Here is
some code from oaidl.h:

typedef struct tagSAFEARRAYBOUND

{

ULONG cElements;

LONG lLbound;

} SAFEARRAYBOUND;

typedef struct tagSAFEARRAY

{

USHORT cDims;

USHORT fFeatures;

ULONG cbElements;

ULONG cLocks;

PVOID pvData;

SAFEARRAYBOUND rgsabound[ 1 ];

} SAFEARRAY;

The SAFEARRAYBOUND structure is simple: it describes how many elements
there are, and what the lower bound is, for a dimension.  (The first
element of a VB array doesn’t have to be number zero.)  As for the SAFEARRAY
structure itself, cDims has the number of dimensions, and rgsabound
is an array of SAFEARRAYBOUND, where there is one element per
dimension in the safearray (contrary to the declaration above).  pvData
of course points to the actual data.  cLocks holds a lock count,
and cbElements holds the size of an element.  fFeatures somehow
tells how the data is being stored, and therefore how it can be freed.
See the help topic “SAFEARRAY
Data Type
” to see more about the fFeatures member.

There is a surprising number of API functions to deal with safearrays; see the
help topic “Array
Manipulation API Functions
“.  You should not attempt to manipulate a
safearray or access an element manually; you should instead either use the API
functions, or much better yet use the helper class CComSafeArray.
(There doesn’t seem to be a _safearray_t class.) 

CComSafeArray is a template class; you provide the type of an element
as one of the template parameters.  (The example code in
the MSDN documentation for CComSafeArray
is lousy, if you ask
me.  If you can’t figure it out, check out the 
ATLSafeArray sample
.)  CComSafeArray provides Attach()
and Detach() methods to take over an existing safearray or abandon the
wrapped safearray, respectively, as you would expect.  Perhaps the best
feature is the overridden operator[] method, which allows you to
access an array element safely and easily.

Well, that’s it for this article.  I hope it helped you get pointed in
the right direction.  (Getting pointed there myself was the main reason I
wrote the article.)  If it is incorrect anywhere, or you have any
suggestions, please email me.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>