CXXR's approach to garbage collection is based primarily on reference
counting. Consequently, in many cases, an object will be destroyed - and its
memory reused - almost immediately it becomes unreachable, rather than
remaining in existence until the next mark-sweep garbage collection. This more
aggressive approach is merciless in exposing memory protection bugs: i.e. bugs
arising from a failure correctly to use PROTECT()
,
UNPROTECT()
and kindred functions (or their CXXR replacements such
as the GCStackRoot
smart pointers). Bugs of this kind that can lay
dormant in CR, possibly for years, manifest themselves as hard failures in
CXXR, and this applies both to bugs in the interpreter itself or in the C/C++
code of add-on packages.
This section gives advice on how to diagnose the source of such bugs.
src/include/CXXR/config.hpp
:
AGGRESSIVE_GC
, CELLFIFO
,
CHECK_EXPOSURE
, FILL55
and GCID
.
(For an explanation of these and other preprocessor variables, see the doxygen documentation for file
config.hpp
.)gdb
, and that
the problematic sequence of commands is embodied in a problem
script. A memory protection error will typically manifest itself
as a segmentation fault, or an unexpected call to Rf_error()
or Rf_errorcall()
. When this happens scan along the
gdb
backtrace looking for instances of objects derived from
class GCNode
in which fields that should contain meaningful
data are in fact filled with 0x55 bytes. This is obvious in the case of
pointers but can also be manifested by 32-bit integers with the value
1431655765, 16-bit integers with the value 21845, or the ASCII character
'U'. If these 0x55 bytes are found, then the problem is very probably due
to premature garbage collection. Take a note of the address of the
GCNode
object where the problem was discovered. In what
follows, we'll assume for illustration that the address is 0x85d4860.GCID
is defined, each object of a class
inheriting from GCNode
is given a unique ID number.
Unfortunately this ID number will be trashed by the premature garbage
collection, so the next step is to discover what it previously was. For
this we need to set up two breakpoints in the file GCNode.cpp
,
one within the function GCNode::initialize()
on the line
commented BREAKPOINT A
, and the other within the function
GCNode::watch()
on the line commented BREAKPOINT
B
. Restart CXXR within the debugger, and rerun the problem script.
On arrival at breakpoint A, set s_watch_addr
to the address of
the problem node, as follows:
(gdb) p s_watch_addr=0x85d4860
Then continue running the program. The program will subsequently stop at
breakpoint B at key points in the lifecycle of any GCNode
created at address 0x85d4860. We want to discover the ID numbers of such
nodes, which can be determined by:
(gdb) p m_id
Beware that more than one GCNode
may be created at the
relevant address before the problem manifests itself: what we want to
discover is the ID number of the last one created. (Usually,
though sometimes it can be an earlier occupant of the address that is the
source of the problem.) For illustration, we'll suppose that ID number in
question is 53278.
s_watch_id
to the ID number of
the problem node:
(gdb) p s_watch_id=53278
Then continue running the program. The program will subsequently stop at
breakpoint B at the following events: (a) node #53278 is constructed, (b)
this node is 'made moribund' (i.e. becomes newly eligible for garbage
collection because its reference count has fallen to zero), and (c) the
node is deleted. Note that the node may be made moribund more than once,
because its reference count may increase from zero. (This happens when the
program leaves one protection scope and subsequently enters another: the
reference count of a protected node may fall to zero on leaving the first
scope, and rise again on entering the next scope: as long as no
GCNode
is allocated in the interregnum, the node will not be
garbage-collected.) What is of particular interest is the program context
in which the node last becomes moribund, and the program context
in which it is deleted: establishing this is usually sufficient to identify
the memory protection gap.