A database cursor refers to a single key/data pair in the database. It supports traversal of the database and is the only way to access individual duplicate data items. Cursors are used for operating on collections of records, for iterating over a database, and for saving handles to individual records, so that they can be modified after they have been read.
The DB->cursor() method opens a cursor into a database. Upon return the cursor is uninitialized, cursor positioning occurs as part of the first cursor operation.
Once a database cursor has been opened, records may be retrieved (DBC->get()), stored (DBC->put()), and deleted (DBC->del()).
Additional operations supported by the cursor handle include duplication (DBC->dup()), equality join (DB->join()), and a count of duplicate data items (DBC->count()). Cursors are eventually closed using DBC->close().
For more information on the operations supported by the cursor handle, see the Database Cursors and Related Methods section in the Berkeley DB C API Reference Guide.
The DBC->get() method retrieves records from the database using a cursor. The DBC->get() method takes a flag which controls how the cursor is positioned within the database and returns the key/data item associated with that positioning. Similar to DB->get(), DBC->get() may also take a supplied key and retrieve the data associated with that key from the database. There are several flags that you can set to customize retrieval.
DB_NOTFOUND
error.
DB_GET_BOTH
.
In all cases, the cursor is repositioned by a DBC->get() operation to point to the newly-returned key/data pair in the database.
The following is a code example showing a cursor walking through a database and displaying the records it contains to the standard output:
int display(char *database) { DB *dbp; DBC *dbcp; DBT key, data; int close_db, close_dbc, ret; close_db = close_dbc = 0; /* Open the database. */ if ((ret = db_create(&dbp, NULL, 0)) != 0) { fprintf(stderr, "%s: db_create: %s\n", progname, db_strerror(ret)); return (1); } close_db = 1; /* Turn on additional error output. */ dbp->set_errfile(dbp, stderr); dbp->set_errpfx(dbp, progname); /* Open the database. */ if ((ret = dbp->open(dbp, NULL, database, NULL, DB_UNKNOWN, DB_RDONLY, 0)) != 0) { dbp->err(dbp, ret, "%s: DB->open", database); goto err; } /* Acquire a cursor for the database. */ if ((ret = dbp->cursor(dbp, NULL, &dbcp, 0)) != 0) { dbp->err(dbp, ret, "DB->cursor"); goto err; } close_dbc = 1; /* Initialize the key/data return pair. */ memset(&key, 0, sizeof(key)); memset(&data, 0, sizeof(data)); /* Walk through the database and print out the key/data pairs. */ while ((ret = dbcp->get(dbcp, &key, &data, DB_NEXT)) == 0) printf("%.*s : %.*s\n", (int)key.size, (char *)key.data, (int)data.size, (char *)data.data); if (ret != DB_NOTFOUND) { dbp->err(dbp, ret, "DBcursor->get"); goto err; } err: if (close_dbc && (ret = dbcp->close(dbcp)) != 0) dbp->err(dbp, ret, "DBcursor->close"); if (close_db && (ret = dbp->close(dbp, 0)) != 0) fprintf(stderr, "%s: DB->close: %s\n", progname, db_strerror(ret)); return (0); }
The DBC->put() method stores records into the database using a cursor. In general, DBC->put() takes a key and inserts the associated data into the database, at a location controlled by a specified flag.
There are several flags that you can set to customize storage:
In all cases, the cursor is repositioned by a DBC->put() operation to point to the newly inserted key/data pair in the database.
The following is a code example showing a cursor storing two data items in a database that supports duplicate data items:
int store(DB *dbp) { DBC *dbcp; DBT key, data; int ret; /* * The DB handle for a Btree database supporting duplicate data * items is the argument; acquire a cursor for the database. */ if ((ret = dbp->cursor(dbp, NULL, &dbcp, 0)) != 0) { dbp->err(dbp, ret, "DB->cursor"); goto err; } /* Initialize the key. */ memset(&key, 0, sizeof(key)); key.data = "new key"; key.size = strlen(key.data) + 1; /* Initialize the data to be the first of two duplicate records. */ memset(&data, 0, sizeof(data)); data.data = "new key's data: entry #1"; data.size = strlen(data.data) + 1; /* Store the first of the two duplicate records. */ if ((ret = dbcp->put(dbcp, &key, &data, DB_KEYFIRST)) != 0) dbp->err(dbp, ret, "DB->cursor"); /* Initialize the data to be the second of two duplicate records. */ data.data = "new key's data: entry #2"; data.size = strlen(data.data) + 1; /* * Store the second of the two duplicate records. No duplicate * record sort function has been specified, so we explicitly * store the record as the last of the duplicate set. */ if ((ret = dbcp->put(dbcp, &key, &data, DB_KEYLAST)) != 0) dbp->err(dbp, ret, "DB->cursor"); err: if ((ret = dbcp->close(dbcp)) != 0) dbp->err(dbp, ret, "DBcursor->close"); return (0); }
If you are using the Heap access method and you are creating a new record in the database, then the key that you provide to the DBC->put() method should be empty. The DBC->put() method will return the record's ID (RID) in the key. The RID is automatically created for you when Heap database records are created.
The DBC->del() method deletes records from the database using a cursor. The DBC->del() method deletes the record to which the cursor currently refers. In all cases, the cursor position is unchanged after a delete.
Once a cursor has been initialized (for example, by a call to DBC->get()), it can be thought of as identifying a particular location in a database. The DBC->dup() method permits an application to create a new cursor that has the same locking and transactional information as the cursor from which it is copied, and which optionally refers to the same position in the database.
In order to maintain a cursor position when an application is using locking, locks are maintained on behalf of the cursor until the cursor is closed. In cases when an application is using locking without transactions, cursor duplication is often required to avoid self-deadlocks. For further details, refer to Berkeley DB Transactional Data Store locking conventions.
Berkeley DB supports "equality" (also known as "natural"), joins on secondary indices. An equality join is a method of retrieving data from a primary database using criteria stored in a set of secondary indices. It requires the data be organized as a primary database which contains the primary key and primary data field, and a set of secondary indices. Each of the secondary indices is indexed by a different secondary key, and, for each key in a secondary index, there is a set of duplicate data items that match the primary keys in the primary database.
For example, let's assume the need for an application that will return the names of stores in which one can buy fruit of a given color. We would first construct a primary database that lists types of fruit as the key item, and the store where you can buy them as the data item:
Primary key: | Primary data: |
---|---|
apple | Convenience Store |
blueberry | Farmer's Market |
peach | Shopway |
pear | Farmer's Market |
raspberry | Shopway |
strawberry | Farmer's Market |
We would then create a secondary index with the key color, and, as the data items, the names of fruits of different colors.
Secondary key: | Secondary data: |
---|---|
blue | blueberry |
red | apple |
red | raspberry |
red | strawberry |
yellow | peach |
yellow | pear |
This secondary index would allow an application to look up a color, and then use the data items to look up the stores where the colored fruit could be purchased. For example, by first looking up blue, the data item blueberry could be used as the lookup key in the primary database, returning Farmer's Market.
Your data must be organized in the following manner in order to use the DB->join() method:
These duplicate entries should be sorted for performance reasons, although it is not required. For more information see the DB_DUPSORT flag to the DB->set_flags() method.
What the DB->join() method does is review a list of secondary keys, and, when it finds a data item that appears as a data item for all of the secondary keys, it uses that data item as a lookup into the primary database, and returns the associated data item.
If there were another secondary index that had as its key the cost of the fruit, a similar lookup could be done on stores where inexpensive fruit could be purchased:
Secondary key: | Secondary data: |
---|---|
expensive | blueberry |
expensive | peach |
expensive | pear |
expensive | strawberry |
inexpensive | apple |
inexpensive | pear |
inexpensive | raspberry |
The DB->join() method provides equality join functionality. While not strictly cursor functionality, in that it is not a method off a cursor handle, it is more closely related to the cursor operations than to the standard DB operations.
It is also possible to do lookups based on multiple criteria in a single operation. For example, it is possible to look up fruits that are both red and expensive in a single operation. If the same fruit appeared as a data item in both the color and expense indices, then that fruit name would be used as the key for retrieval from the primary index, and would then return the store where expensive, red fruit could be purchased.
Consider the following three databases:
Consider the following query:
Return the personnel records of all people named smith with the job title manager.
This query finds are all the records in the primary database (personnel) for whom the criteria lastname=smith and job title=manager is true.
Assume that all databases have been properly opened and have the handles: pers_db, name_db, job_db. We also assume that we have an active transaction to which the handle txn refers.
DBC *name_curs, *job_curs, *join_curs; DBC *carray[3]; DBT key, data; int ret, tret; name_curs = NULL; job_curs = NULL; memset(&key, 0, sizeof(key)); memset(&data, 0, sizeof(data)); if ((ret = name_db->cursor(name_db, txn, &name_curs, 0)) != 0) goto err; key.data = "smith"; key.size = sizeof("smith"); if ((ret = name_curs->get(name_curs, &key, &data, DB_SET)) != 0) goto err; if ((ret = job_db->cursor(job_db, txn, &job_curs, 0)) != 0) goto err; key.data = "manager"; key.size = sizeof("manager"); if ((ret = job_curs->get(job_curs, &key, &data, DB_SET)) != 0) goto err; carray[0] = name_curs; carray[1] = job_curs; carray[2] = NULL; if ((ret = pers_db->join(pers_db, carray, &join_curs, 0)) != 0) goto err; while ((ret = join_curs->get(join_curs, &key, &data, 0)) == 0) { /* Process record returned in key/data. */ } /* * If we exited the loop because we ran out of records, * then it has completed successfully. */ if (ret == DB_NOTFOUND) ret = 0; err: if (join_curs != NULL && (tret = join_curs->close(join_curs)) != 0 && ret == 0) ret = tret; if (name_curs != NULL && (tret = name_curs->close(name_curs)) != 0 && ret == 0) ret = tret; if (job_curs != NULL && (tret = job_curs->close(job_curs)) != 0 && ret == 0) ret = tret; return (ret);
The name cursor is positioned at the beginning of the duplicate list for smith and the job cursor is placed at the beginning of the duplicate list for manager. The join cursor is returned from the join method. This code then loops over the join cursor getting the personnel records of each one until there are no more.
Once a cursor has been initialized to refer to a particular key in the database, it can be used to determine the number of data items that are stored for any particular key. The DBC->count() method returns this number of data items. The returned value is always one, unless the database supports duplicate data items, in which case it may be any number of items.
The DBC->close() method closes the DBC cursor, after which the cursor may no longer be used. Although cursors are implicitly closed when the database they point to are closed, it is good programming practice to explicitly close cursors. In addition, in transactional systems, cursors may not exist outside of a transaction and so must be explicitly closed.