The following sections provide information on how to achieve persistence using dbstl.
Each container has a begin() method
which produces an iterator. These
begin methods take a boolean parameter,
directdb_get, which controls the
caching behavior of the iterator. The default value of this parameter
is true
.
If directdb_get is
true
, then the persistent object is fetched anew
from the database each time the iterator is dereferenced as a pointer
by use of the star-operator
(*iterator) or by use of the arrow-operator
(iterator->member). If
directdb_get is false
, then
the first dereferencing of the iterator fetches the object from the
database, but later dereferences can return cached data.
With directdb_get set to true
, if you call:
(*iterator).datamember1=new-value1; (*iterator).datamember2=new-value2;
then the assignment to datamember1
will be lost,
because the second dereferencing of the iterator would cause the cached
copy of the object to be overwritten by the object's persistent data
from the database.
You also can use the arrow operator like this:
iterator->datamember1=new-value1; iterator->datamember2=new-value2;
This works exactly the same way as iterator::operator*. For this reason, the same caching rules apply to arrow operators as they do for star operators.
One way to avoid this problem is to create a reference to the object, and use it to access the object:
container::value_type &ref = *iterator; ref.datamember1=new-value1; ref.datamember2=new-value2; ...// more member function calls and datamember assignments ref._DB_STL_StoreElement();
The above code will not lose the newly assigned value of ref.datamember1
in the way that the previous example did.
In order to avoid these complications, you can assign to the object referenced by an iterator with another object of the same type like this:
container::value_type obj2; obj2.datamember1 = new-value1; obj2.datamember2 = new-value2; *itr = obj2;
This code snippet causes the new values in obj2
to
be stored into the underlying database.
If you have two iterators going through the same container like this:
for (iterator1 = v.begin(), iterator2 = v.begin(); iterator1 != v.end(); ++iterator1, ++iterator2) { *iterator1 = new_value; print(*iterator2); }
then the printed value will depend on the value of
directdb_get with which the
iterator had been created. If directdb_get
is false
, then the original, persistent value is
printed; otherwise the newly assigned value is returned from the
cache when iterator2
is dereferenced. This
happens because each iterator has its own cached copy of the
persistent object, and the dereferencing of
iterator2
refreshes
iterator2
's copy from the database, retrieving
the value stored by the assignment to
*iterator1
.
Alternatively, you can set directdb_get
to false
and call
iterator2->refresh()
immediately before
the dereferencing of iterator2
, so that
iterator2
's cached value is refreshed.
If directdb_get is
false
, a few of the tests in dbstl's test kit
will fail. This is because the above contrived case appears in
several of C++ STL tests. Consequently, the default value of the
directdb_get parameter in the
container::begin()
methods is
true
. If your use cases avoid such bizarre usage
of iterators, you can set it to false
, which
makes the iterator read operation faster.
If you modify the object to which an iterator refers by using one of the following:
(*iterator).member_function_call()
or
(*iterator).data_member = new_value
then you should call
iterator->_DB_STL_StoreElement()
to
store the change. Otherwise the change is lost after the
iterator moves on to other elements.
If you are storing a sequence, and you modified some part of it, you
should also call
iterator->_DB_STL_StoreElement()
before moving the iterator.
And in both cases, if directdb_get
is true
(this is the default value), you should
call _DB_STL_StoreElement()
after the
change and before the next iterator movement OR the next
dereferencing of the iterator by the star or arrow operators
(iterator::operator*
or
iterator::operator->
). Otherwise, you will
lose the change.
If you update the element by assigning to a dereferenced iterator like this:
*iterator = new_element;
then you never have to call
_DB_STL_StoreElement()
because the change
is stored in the database automatically.
Dbstl is an interface to Berkeley DB, so it is used to store data persistently. This is really a different purpose from that of regular C++ STL. This difference in their goals has implications on expected object lifetime: In standard STL, when you store an object A of type ID into C++ stl vector V using V.push_back(A), if a proper copy constructor is provided in A's class type, then the copy of A (call it B) and everything in B, such as another object C pointed to by B's data member B.c_ptr, will be stored in V and will live as long as B is still in V and V is alive. B will be destroyed when V is destroyed or B is erased from V.
This is not true for dbstl, which will copy A's data and store it
in the underlying database. The copy is by default a shallow copy,
but users can register their object marshalling and unmarshalling
functions using the DbstlElemTraits
class
template. So if A is passed to a db_vector
container, dv
, by using
dv.push_back(A)
, then dbstl copies A's data
using the registered functions, and stores data into the underlying
database. Consequently, A will be valid, even if the container is
destroyed, because it is stored into the database.
If the copy is simply a shallow copy, and A is later destroyed, then
the pointer stored in the database will become invalid. The next time
we use the retrieved object, we will be using an invalid pointer, which
probably will result in errors. To avoid this, store the referred
object C rather than the pointer member A.c_ptr itself, by registering
the right marshalling/unmarshalling function with
DbstlElemTraits
.
For example, consider the following example class declaration:
class ID { public: string Name; int Score; };
Here, the class ID has a data member Name, which refers to a memory address of
the actual characters in the string. If we simply shallow copy an
object, id
, of class ID to store it, then the
stored data, idd
, is invalid when
id
is destroyed. This is because
idd
and id
refer to a common
memory address which is the base address of the memory space storing
all characters in the string, and this memory space is released when
id
is destroyed. So idd
will be
referring to an invalid address. The next time we retrieve
idd
and use it, there will probably be memory
corruption.
The way to store id
is to write a marshal/unmarshal
function pair like this:
void copy_id(void *dest, const ID&elem) { memcpy(dest, &elem.Score, sizeof(elem.Score)); char *p = ((char *)dest) + sizeof(elem.Score); strcpy(p, elem.Name.c_str()); } void restore_id(ID& dest, const void *srcdata) { memcpy(&dest.Score, srcdata, sizeof(dest.Score)); const char *p = ((char *)srcdata) + sizeof(dest.Score); dest.Name = p; } size_t size_id(const ID& elem) { return sizeof(elem.Score) + elem.Name.size() + 1;// store the '\0' char. }
Then register the above functions before storing any instance of
ID
:
DbstlElemTraits<ID>::instance()->set_copy_function(copy_id); DbstlElemTraits<ID>::instance()->set_size_function(size_id); DbstlElemTraits<ID>::instance()->set_restore_function(restore_id);
This way, the actual data of instances of ID are stored, and so the data will persist even if the container itself is destroyed.