CXXR (C++ R): Links

CXXR's approach to garbage collection is based primarily on reference counting. Consequently, in many cases, an object will be destroyed - and its memory reused - almost immediately it becomes unreachable, rather than remaining in existence until the next mark-sweep garbage collection. This more aggressive approach is merciless in exposing memory protection bugs: i.e. bugs arising from a failure correctly to use PROTECT(), UNPROTECT() and kindred functions (or their CXXR replacements such as the GCStackRoot smart pointers). Bugs of this kind that can lay dormant in CR, possibly for years, manifest themselves as hard failures in CXXR, and this applies both to bugs in the interpreter itself or in the C/C++ code of add-on packages.

This section gives advice on how to diagnose the source of such bugs.

Preliminaries. Build CXXR with full debugging options. Remove all C++ compiler optimisation flags, and define the following preprocessor variables within src/include/CXXR/config.hpp: AGGRESSIVE_GC, CELLFIFO, CHECK_EXPOSURE, FILL55 and GCID. (For an explanation of these and other preprocessor variables, see the doxygen documentation for file config.hpp.)
Is the problem due to premature garbage collection? Run CXXR under a debugger, and repeat the sequence of commands that caused the problem. Here we'll assume that the debugger is gdb, and that the problematic sequence of commands is embodied in a problem script. A memory protection error will typically manifest itself as a segmentation fault, or an unexpected call to Rf_error() or Rf_errorcall(). When this happens scan along the gdb backtrace looking for instances of objects derived from class GCNode in which fields that should contain meaningful data are in fact filled with 0x55 bytes. This is obvious in the case of pointers but can also be manifested by 32-bit integers with the value 1431655765, 16-bit integers with the value 21845, or the ASCII character 'U'. If these 0x55 bytes are found, then the problem is very probably due to premature garbage collection. Take a note of the address of the GCNode object where the problem was discovered. In what follows, we'll assume for illustration that the address is 0x85d4860.
Find the ID number of the problem node. When the preprocessor variable GCID is defined, each object of a class inheriting from GCNode is given a unique ID number. Unfortunately this ID number will be trashed by the premature garbage collection, so the next step is to discover what it previously was. For this we need to set up two breakpoints in the file GCNode.cpp, one within the function GCNode::initialize() on the line commented BREAKPOINT A, and the other within the function GCNode::watch() on the line commented BREAKPOINT B. Restart CXXR within the debugger, and rerun the problem script. On arrival at breakpoint A, set s_watch_addr to the address of the problem node, as follows:
```
(gdb) p s_watch_addr=0x85d4860
```
Then continue running the program. The program will subsequently stop at breakpoint B at key points in the lifecycle of any GCNode created at address 0x85d4860. We want to discover the ID numbers of such nodes, which can be determined by:
```
(gdb) p m_id
```
Beware that more than one GCNode may be created at the relevant address before the problem manifests itself: what we want to discover is the ID number of the last one created. (Usually, though sometimes it can be an earlier occupant of the address that is the source of the problem.) For illustration, we'll suppose that ID number in question is 53278.
Discover the life history of the problem node. We now want to study the life history of the problem node, to discover why it was prematurely garbage-collected. Restart CXXR within the debugger, and on arrival at breakpoint A, set s_watch_id to the ID number of the problem node:
```
(gdb) p s_watch_id=53278
```
Then continue running the program. The program will subsequently stop at breakpoint B at the following events: (a) node #53278 is constructed, (b) this node is 'made moribund' (i.e. becomes newly eligible for garbage collection because its reference count has fallen to zero), and (c) the node is deleted. Note that the node may be made moribund more than once, because its reference count may increase from zero. (This happens when the program leaves one protection scope and subsequently enters another: the reference count of a protected node may fall to zero on leaving the first scope, and rise again on entering the next scope: as long as no GCNode is allocated in the interregnum, the node will not be garbage-collected.) What is of particular interest is the program context in which the node last becomes moribund, and the program context in which it is deleted: establishing this is usually sufficient to identify the memory protection gap.

$Id: memory.html 1288 2012-04-18 16:33:27Z arr $

CXXR: Diagnosing Memory Protection Bugs