This page describes the phases so far completed within the CXXR project
to refactor the R engine into C++. Each phase is placed within the
Subversion tags
directory, with a name of the form 0.00-2.5.0
,
where 0.00
indicates the phase, and 2.5.0
indicates the R release to which that phase is intended to correspond.
0.00-2.5.0
In this phase all .cpp
files within src/main
are renamed to .cpp
, with the following exceptions:
complex.c
: This file uses the C99 complex types, which
are not (under the current C++ standard) understood by a C++ compiler;gram.c
: This file is automatically generated by
yacc/bison;regex.c
: The source of this file is very insistent that
it is C, not C++: it gives a #warning
if you attempt to
compile it with a C++ compiler.(Subsequently, RNG.c
was also reverted to C, to respect
Knuth's copyright statement.)
The result of this phase does not build correctly; however, it is useful as a baseline for seeing the subsequent changes.
0.01-2.5.0
Make such changes to the result of Phase 0 to enable the .cpp
files to compile without warning using -Wall
with gcc-4.1.3
,
retaining C linkage conventions for everything defined in .h
files. Ensure that the whole of R will build correctly and pass make
check
.
A desirable side effect of enforcing C linkage was that the linkage
editor picked up several instances where the source file implementing a
function failed to #include
the appropriate header file, and
consequently generated a function with C++ linkage: see below.
This needed to address the following issues:
Rboolean
is different from C++ bool
. Rboolean
is an enumeration with elements FALSE=0
and TRUE=1
;
bool
is a primitive type, with values false
and true
. (Also, there are #define
s of FALSE
to 0 and TRUE to 1 lurking around in the R code, just to confuse
matters.) In particular an Rboolean
is a different size
from a bool
. It was necessary to introduce many explicit
conversions from bool
(resulting in C++ from evaluating
Boolean expressions) or integer types to Rboolean
.
In connection with this, defined a macro RBOOL(x)
within Rinlinedfuns.h
expands to x
in C
and Rboolean(x)
in C++.
class
, new
, private
and this
were used as identifiers; these had to be
renamed, e.g. class
changed to connclass
.connections.cpp
, a void*
was implicitly converted to another type of pointer. These conversions
were made explicit, and flagged /*CCAST*/
.datetime.cpp
and memory.cpp
used statements
of the form i -= d;
where i
is of integer
type and d
is an expression evaluating to a floating point
type. This was converted to the form i = int(i - (d));
to
avoid a compiler warning. This interpretation complies with
sec. 6.5.12.2 of the C99 standard ISO:IEC
9899:1999.NewDevDesc
defined in GraphicsDevice.h
contains a number of pointers to functions as members, and the types of
these functions were specified without giving the number and types of
the function arguments. This was rectified. It was also necessary to
give this structure a tag (_NewDevDesc
) because most of
these functions included a pointer to a NewDevDesc
among
their arguments.R_ext/GraphicsEngine.h
,
in particular the definition of R_GE_context
, into a new
header file R_ext/GraphicsContext.h
, to avoid reciprocal
dependencies between GraphicsEngine.h
and GraphicsDevice.h
.CCODE
, defined in Defn.h
,
was redefined to make the number and type of its arguments explicit, as
follows:
typedef SEXP (*CCODE)(SEXP, SEXP, SEXP, SEXP);
__MAIN__
is defined, libextern.h
#define
d
extern
to the empty string, which could play havoc with the
extern "C"
used in C++ to enforce C-style linkage. This #define
was commented out, and instead a new macro extern1
was #define
d
within Defn.h
.reinterpret_cast
s in
various places in memory.cpp
, scan.cpp
, serialize.cpp
and vfonts.cpp
. (In future it is the intention to get rid
of as many of these as possible, as well as getting rid of all C-style
casts.)Defn.h
, the whole declaration extern FUNTAB
R_FunTab[];
was made #ifndef __R_Names__
, not
just the word extern
.const
pointer. We can
expect much more of this later, but this may have been premature.sysutils.cpp
(conditionally) contained an extern
declaration of environ
; the compiler considered this to
have C++ linkage, conflicting with the C-linkage definition in unistd.h
(subsequently #include
d into sysutils.cpp
).
This extern
declaration has been itself replaced by a
(conditional) #include
of unistd.h
.eval.cpp
,
format.cpp
, memory.cpp
, platform.cpp
,
printutils.cpp
, and library/methods/src/methods_list_dispatch.c
;
they were commented out, and flagged with the comment "Use header
files!". Needed prototypes that didn't appear in any header file were
generally placed at the end of Defn.h
.
A particularly obscure example of this kind concerns R_CHAR
.
This is declared as a pointer to a function in Rinternals.h
,
and implemented in memory.cpp
. Now memory.cpp
does #include
Rinternals.h
, but it does so
with USE_RINTERNALS
defined, as a result of which the R_CHAR
declaration in the header file isn't seen by the compiler, and so the
implemented function got C++ linkage. I modified the header file by
moving the R_CHAR
declaration outside the #ifndef
USE_RINTERNALS
.
print.cpp
of functions intended to be
called from FORTRAN needed to be surrounded by extern "C"{
... }
.deparse.cpp
:1191 used &
where &&
was surely intended; character.cpp
:738 similarly used |
instead of ||
.-Wall
complains about attempts to compare signed with
unsigned. This required explicit conversions in numerous places.
Generally (but not always) I did this by converting unsigned to signed.
In other places it was clear that the same effect could be achieved
without deleterious side effect by changing the type of a variable.
In connection with this, the macro AGE_NODE
in memory.cpp
had to be changed to make an__g__
unsigned.
0.02-2.5.0
In a subsequent phases (possibly starting in Phase 3) it is our objective
to replace the SEXPREC union by a hierarchy of C++ classes. This phase
prepares for that by reorganising the material in the header files in src/include
.
This involves creating a new subdirectory src/include/CXXR
,
and within that creating a new header file RObject.h
(ultimately to include a base class RObject
for the new
hierarchy), and further header files RClosure.h
, REnvironment.h
,
RInternalFunction.h
, RPairList.h
, RPromise.h
,
RSymbol.h
and RVector.h
, corresponding
respectively to closxp_struct
, envsxp_struct
,
primsxp_struct
, listsxp_struct
, promsxp_struct
,
symsxp_struct
and vecsxp_struct
, which will
eventually be derived classes. The material in these new headers comes
predominantly from Rinternals.h
, but to some extent (in the
case of RInternalFunction.h
) from Defn.h
. All
of the new header files, with the exception of RInternalFunction.h
,
are also installed in $(rincludedir)/CXXR
.
Function prototypes moved into the new header files are documented using
doxygen. Where is was clearly
consistent with the semantics, some of the argument types of the functions
were changed, either by adding const
, or by converting int
into Rboolean
(however, see the issues below regarding the
latter).
The following are implementational details and issues that arose:
SEXPREC
(though still the
unchanged C code) was made visible only to C++ programs. This is to get
advance warning of potential problems when the implementation is
changed to C++.USE_RINTERNALS
was defined, and otherwise as a function. It has been the intention in
this phase to replace the macros with C++ inline functions: these would
automatically also generate a non-inlined form, so the separate
definition (usually in memory.cpp
) could be dispensed
with.
This was all very well where the function form was implemented in CR simply by invoking the macro; however in some cases the function form carried out some error checking before invoking the macro. Trying to convert the macro to an inline function would then result in two distinct functions with the same name, which the compiler and/or linker would certainly reject.
In the end it was decided to leave the macros in place for the time being: they'll have to be changed when the C++ implementation rolls out anyway.
USE_RINTERNALS
compilation conditions, but decided to retain it to mark out material
(usually currently in the form macro definitions) that will in the
future need privileged access to a C++ class. Only memory.cpp
now #define
s USE_RINTERNALS
.Rinternals.h
contained many #define
s of
function names to the same name prefixed by Rf_
: this
appears to correspond in C++ terms to putting these functions in a
namespace. I split these #define
s out into a separate
header file Rf_namespace.h
, which is #include
d
by RObject.h
(which is in turn included by the other new
headers). There are various similar #define
s scattered
around other CR header files, which may need to be moved into Rf_namespace.h
in due course.RInternalFunction.h
or RPrimitiveFunction.h
. Usage in the CR code (e.g. primsxp_struct
)
suggests the latter, and the R Internals document speaks of internal and
primitive functions as being mutually exclusive, but fails to give a
more general name covering any function handled via R_FunTab
.
But it seems to be reasonable to regard primitive functions as a special
case of an internal function, hence the eventual choice of RInternalFunction.h
.Rdynload.cpp
and dotcode.cpp
each give compiler warnings under -pedantic
because they
attempt to cast function pointers to void*
. The source
code of the former already contains a comment saying that it's illegal
even in C. Not easy to fix, so leave for now.LGLSXP
)
should contain items of type Rboolean
rather than of type
int
, and consequently that the macro/function LOGICAL(SEXP)
should return Rboolean*
rather than int*
. I
made some attempt to do this, but backed out of it for the following
reasons:
.C
interface expects these vectors to contain int
s;gcc
happens
to use int
for Rboolean
).MAYBE
value
in the enumeration, perhaps Rboolean
is best thought
of as 'bool
for C', rather than having any capability
to handle NAs.Possible new policy: within functions visible from C, use Rboolean
as a substitute for C++ bool
, possibly constrained to be
32Â bits long to avoid the enum
implementation
dependencies noted above. However, R logical vectors will continue to
be represented using int
s. (One day we might define an Rlogical
class - a wrapper round an int
- to handle logical
vectors within C++, while C programs simply see typedef int
Rlogical;
.)
0.03-2.5.0
The primary objective of this phase was to redefine R_NilValue
as a null (i.e. zero) pointer of type SEXP
. R_NilValue
is widely used within CR as a stub, i.e. to signify that something that
might be present is absent, in much the same way that a null pointer is
used within C or C++. However, in CR it is actually implemented in effect
as an element of a pairlist (i.e. struct listsxp
), whose
CAR, CDR, TAG and attributes all point to itself. This would cause
difficulties in CXXR when we reimplement the SEXPREC
union
as a type hierarchy, because pairlist elements will need to be of a
specific type within the hierarchy. If R_NilValue
were given
this type, it would preclude its use as a general-purpose stub. But zero
is a possible value for a pointer of any type, so if we equate R_NilValue
to zero this will sidestep the problem.
Another disadvantage of the CR definition of R_NilValue
is
that it needlessly introduces a cyclic data structure.
The following are implementational details and issues that arose in carrying out this change:
CAR
,
CDR
, TAG
and ATTRIB
on a SEXP
that may in fact be R_NilValue
, expecting in this case for
each of these functions to return R_NilValue
. These
functions were reimplemented to preserve this behaviour: i.e. each of
them returns a null pointer if passed a null pointer. At the same time
the macro forms were abolished: they are now implemented as inline
functions for C++, and ordinary functions if called from C.OBJECT
and IS_S4_OBJECT
have been reimplemented to return FALSE
if passed a zero
pointer. They too are now implemented as inline functions for C++, and
ordinary functions if called from C.NAMED
: the policy here
is that the calling code should be modified as necessary to prevent it
being invoked for a null pointer. Deal similarly with invocations of SET_NAMED
,
PRINTNAME
, NODE_IS_MARKED
, SET_ATTRIB
,
SET_OBJECT
, and LENGTH
. (This last case is
interesting because LENGTH
is meant to be applied to
vector objects, i.e. components of the SEXPREC
union
different from struct listsxp
.) The calling sites
concerned were determined by running make check
at
top-level: doubtless many have slipped through the net!memory.cpp
were replaced by inline functions.A secondary objective of this phase was to get rid of C-style casts within the C++ code, wherever the appropriate remedy was reasonably obvious and straightforward. The following kinds of C-style casts were left in place pending further work:
DL_FUNC
);DevDesc
and GEDevDesc
);(void*)(-1)
;R_varloc_t
;Addendum 2007/08/06: although make check
works with this
release, make check-devel
doesn't.
0.04-2.5.1
The primary objective of this phase was to update the program to parallel
release 2.5.1 of R. This proved to be straightforward, except that it was
necessary to install a later version of svn_load_dirs.pl
to
cope with filenames containing @
signs. (However, I was
surprised to discover that svn merge
doesn't track renames.)
Other changes were as follows:
make check-devel
were fixed. In general
this was done by modifying certain functions to behave reasonably if
passed a null pointer, namely LENGTH
(returns 0), NAMED
(returns 0) and SET_NAMED
(does nothing). These changes
obviated some of the changes made leading up to svn revision 49 (see
Phase 3 above), and these changes were accordingly reversed. make
check-all
also now works, but it was time-consuming to run and
revealed no bugs.autoconf
working properly, and
accordingly backed out of some configuration kludges I had made
previously.0.05-2.5.1
The aim of this phase was to create a branch entitled const
,
to explore to what extent the R code is amenable to 'constifying': i.e.
converting pointers and C++ references wherever possible to const
pointers. Two preliminary steps, carried out in the trunk, were as
follows:
Similar changes were made to the header files under src/include
:
however, the pattern here was to convert a macro to an inline function
if the header files was #include
d into a C++ file, and
to an out-of-line call to the same function if the header file was #include
d
into a C file.
This macro conversion was counterindicated in the following circumstances:
##
#define INC(x) ++(x)
(Using C++ reference arguments to get round this is not as straightforward as it might seem.)
SEXPREC
was defined along the following lines:
typedef struct SEXPREC { ... } SEXPREC;
with the first occurrence of SEXPREC
being what in C
would have been a structure tag. This has now been changed to:
typedef struct RObject { ... } SEXPREC;
exploiting the fact that in C++ RObject
is a
fully-fledged class name. The header files in src/include/CXXR
now generally refer to RObject
rather than SEXPREC
.
Having established the const
branch, constification was set
in train by the brute force measure of redefining SEXP
to
mean const RObject*
rather than simply RObject*
;
a new typedef
mapped vSEXP
onto plain RObject*
.
In the same spirit 'v
' variants of many of the accessor
functions were introduced: for example now CAR
takes a SEXP
argument and returns a SEXP
, while vCAR
takes
and returns a vSEXP
. (Since these accessor functions are
required to be callable from C, we can't simply overload CAR
.)
I then attempted to recompile various files, inserting 'v
's
wherever the compiler demanded it. It quickly became apparent that these 'v
's
were highly contagious: for example, both NA_STRING
and R_EmptyEnv
had to be declared as vSEXP
s rather than SEXP
s.
This led me to the conclusion that it was premature to attempt
constification until I understand the evaluation process better.
At the time of tagging this release, the following files compile without
warnings in the const
branch: memory.cpp
, envir.cpp
and names.cpp
. eval.cpp
gives one compilation
error, when do_function
attempts a non-const operation on
its op
argument: fixing this would mean changing the
signature of all the do_
functions.
0.06-2.5.1
In CR, each SEXPREC
has a node class in the range 0 to 7.
Nodes of non-vector SEXPTYPE
(i.e. not of types CHARSXP
,
LGLSXP
, INTSXP
, REALSXP
, CPLXSXP
,
STRSXP
, VECSXP
, EXPRSXP
, WEAKREFSXP
or RAWSXP
) are all in class 0, and are 28 bytes long. Class
7 is used for vector nodes whose vector data amount to more than
128Â bytes; the remaining classes are used for smaller vectors, classified
according to their size. Nodes of class 7 are allocated directly using malloc
;
nodes of the remaining classes are allocated from 'pages' about 2Â kB in
size, with each node class having its own pages. In CXXR it is intended to
replace SEXPREC
s with an extensible class hierarchy (rooted
at RObject
), so it will not be feasible to put a tight upper
bound on the size of non-vector nodes.
Another feature of CR is that in vector nodes, a single block of memory
contains the data of the vector preceded by a SEXPREC
and
information about the length of the header. This is quite incompatible
with the design philosophy of C++, which is that the size of an object
must be deducible from its (C++) type: in particular ::operator
delete
relies on this.
The purpose of Phase 6 was to circumvent these problems, and at the same time to endeavour to decouple the code for allocating memory from the code managing garbage collection. This comprised the following changes:
CXXR::Heap
was created to handle allocation
and deallocation of blocks of memory. This parallels CR to the extent
that requests for large blocks are passed on directly to ::operator
new
, while requests for small blocks are satisfied by
allocating fixed-sized cells carved out of 'superblocks'. However, this
is an implementational detail and is not visible to the remainder of
CXXR: only the total number of bytes and the total number of blocks
allocated via CXXR::Heap
are visible (using
static member functions).
It is intended that CXXR::Heap
will serve as a back-end
to implementations of operator new and to an STL-compatible Allocator
class. Note in particular that the blocks allocated from CXXR::Heap
are not exclusively used to create RObject
s, but may be
used for any purpose where rapid allocation/deallocation of small
blocks is required.
CXXR::Heap
. (CR
deallocates only large vector nodes.)CXXR::Heap
; a data member m_data
of RObject
(in due course to be factored out into a derived class) points to this
block. For non-vector objects, and vectors of size zero, m_data
is a null pointer. (CR appears to allocate at least 8 bytes of vector
data even when the nominal size of the vector is zero.)CXXR::Heap
, divided by 8.
I was strongly tempted to base GC exclusively on (b), and to ignore the number of nodes - after all, we're talking about a single resource here: memory. I'd welcome opinions about this.
0.07-2.5.1
The purpose of this phase was to encapsulate all the garbage-collection
logic within C++ classes. Five such classes were introduced, namely GCManager
,
GCNode
, GCEdge
, GCRoot
and WeakRef
,
as now described.
GCManager
, as the name implies, carries out
high-level management of garbage collection. It has no non-static data
or methods. When CXXR::Heap
indicates (via a
callback) that it is on the point of requesting additional memory from
the operating system, method GCManager::gc()
decides
whether to carry out a garbage collection, and if so how many
generations to collect. As comtemplated at tag 0.06-2.5.1, this decision
is now based only on the total memory allocated via CXXR::Heap
,
and not on the number of nodes allocated. If GCManager
decides to carry out a garbage collection, this is carried out by
calling GCNode::gc()
, specifying the number of generations
to be collected.GCNode
is intended to be the base class for all
objects subject to garbage collection; RObject
is now
derived from GCNode
. All GCNode
s are
threaded on circular doubly-linked lists according to their generation,
managed via the static private vector s_genpeg
.
Element 0 of this vector represents the 'new' generation of nodes that
have not yet been exposed to the garbage collector; nodes that survive
garbage collection are moved into successively higher generations.GCEdge<T>
, where T
(defaulting to RObject*
) is a pointer to a class type
derived from GCNode
, represents a directed edge within the
directed graph whose nodes are the GCNode
s. Whenever an
object of a type derived from GCNode
wishes to refer to
another such object, it should do so by incorporating a GCEdge
encapsulating an appropriate pointer, rather than by incorporating the
pointer directly. The class provides for GCEdge<T>
to be implicitly converted to T
in contexts which require
this.
GCEdge
contains the logic for ensuring that a node in a
higher generation never includes a reference to an object in a younger
generation. If any attempt is made to direct a GCEdge
from an older node to a younger node, that younger node is immediately
promoted to the the generation of the older node, and this change is
propagated through the outgoing GCEdge
s of the younger
node, and so on recursively. (In other words, it implements the EXPEL_OLD_TO_NEW
logic that can be configured into CR (but is not the default for CR).)
GCRoot<T>
, where T
(defaulting to RObject*
) is a pointer to a class type
derived from GCNode
, is intended to protect GCNode
s
from the garbage-collector. A GCNode
pointed to by a GCRoot
will not be garbage collected for as long as the GCRoot
object exists. The constructor and destructor of this class therefore
perform similar functions to the PROTECT
/UNPROTECT
macros of CR, but within a C++ idiom, in which the programmer is spared
the need to check that PROTECT
s are balanced by UNPROTECT
s.
(However, PROTECT
and UNPROTECT
continue,
and will continue, to be available within CXXR.) The class provides for
GCRoot<T>
to be implicitly converted to T
in contexts which require this.
The implementation of GCRoot
uses an internal stack,
and consequently requires (and checks) that GCRoot
s are
destroyed in the reverse order of their creation. This should cause no
problem as long as only variables with automatic or static storage
duration are declared as GCRoot
s.
Despite successful experiments, the deployment of this class has been
deferred, pending the replacement of setjmp
/longjmp
within CXXR by C++ exceptions. This is because destructors of C++
automatic variables are not called when the stack is unwound by longjmp
(see ISO14882:2003 sec. 18.7); they are when the stack is unwound by a
C++ exception.
WeakRef
implements weak references (SEXPTYPE
WEAKREFSXP
) in a way intended to be functionally identical to
CR. Each weak reference has a key and, optionally, a value and/or a
finalizer. The finalizer may either be a C/C++ function or an R object.
The garbage collector will consider the value and finalizer to be reachable provided the key is reachable. If, during a garbage collection, the key is found not to be reachable then the finalizer (if any) will be run, and the weak reference object will be 'tombstoned', so that subsequent calls to key() and value() will return null pointers. A weak reference object with a reachable key will not be garbage collected even if the weak reference object is not itself reachable.
Note that, in CXXR, weak references are not implemented as
four-element vectors, and the class has separate, appropriately typed
fields for R and C/C++ finalizers (though at most one of these fields
may be used in any particular WeakRef
object).
0.08-2.5.1
setjmp
and longjmp
(and sigsetjmp
and siglongjmp
) within directory main
have
been removed, and replaced by using JMPException
, a C++
exception class designed as far as possible to be a drop-in replacement
for setjmp
/longjmp
. This is to ensure that
the destructors of C++ objects are invoked as the stack is unwound
following an exceptional condition.
Use of JMPException
should be regarded as an interim
measure. Normal C++ coding practice is for throw
simply
to report the exceptional condition that has arisen, rather than - as
with JMPException
- in effect requesting a specific
subsequent flow of control.
GCNode
s from
the garbage collector is now to use the templated class GCRoot
.
GCRoot
's constructor will protect the GCNode
in question, and its destructor will unprotect it; there is therefore no
need for the programmer to remember to balance out the use of PROTECT
and UNPROTECT
as in CR.
The facilities of CR's pointer protection stack (using e.g. PROTECT
and UNPROTECT
) remain available, but the underlying
implementation has been rewritten in C++ as part of the GCRootBase
class. CXXR makes the additional requirement that when UNPROTECT
or REPROTECT
are applied to a pointer, this is carried
out in the same context (RCNTXT
) as that in which the
pointer was PROTECT
ed. This is to help pick up
mispairing between PROTECT
and UNPROTECT
.
Rinternals.h
and Defn.h
,
contain macro definitions of the form
#define func Rf_func
These serve to avoid name clashes (at least at the linker level) with
third-party packages; a similar purpose would be achieved in C++ by
placing the function func
in a namespace Rf
.
(In Phase 2 these macros were generally shifted into a separate
header file Rf_namespace.h
, but this change has now been
reversed.) Using the preprocessor to modify program tokens in this way
is something that many C++ programs will shun, especially since some
of the tokens concerned (e.g. length
) are likely to be
widely used. However abolishing these macros altogether would break
much existing code. Nevertheless, reliance on them is now deprecated
within CXXR, and in particular all header files within src/include
have been modified as necessary to include the Rf_
prefix explicitly where it is needed.
0.09-2.6.1
The primary objective of this phase was to update the program to parallel release 2.6.1 of R.
Other changes were as follows:
include/CXXR
,
and at the same time to add doxygen documentation. This has now been
modified into a policy of copying the prototypes into the
relevant CXXR header file, and adding documentation there, but leaving
the prototype also in the CR header file. This will make it easier to
track changes in function signatures when we upgrade to future releases
of R. To this end a script allincludes.pl
has been produced. This generates an (otherwise trivial) C++ source file
that #include
s all the header files under src/main
and src/include
; compiling this file checks that the
prototypes in the CXXR header files are consistent with those in the CR
headers.
In the light of this change, the policy regarding the Rf_
prefix described under Phase 8 has been modified. Whilst all header
files in the CXXR
directory should use the Rf_
prefix explicitly, header files derived from CR (e.g. Rinternals.h
and Defn.h
) should normally omit the prefix if the
corresponding CR file does so.
CXXR
directory.SGN_DFL
are defines as macros in terms of C-style casts, so main.cpp
still gives warnings if compiled using gcc
with -Wold-style-cast
.)0.10-2.6.1
The primary objective of this phase was to reimplement all vector data
types as C++ classes derived (directly or indirectly) from RObject
,
rather than using vecsxp_struct
within the RObject::u
union. vecsxp_struct
has not yet been eliminated entirely,
however, because of some straggling uses of truelength
.
Other changes were as follows:
R_alloc
and kindred
functions are no longer implemented as objects inheriting from RObject
.
Instead these blocks are managed separately via a new class RAllocStack
.
When the stack size is reduced using vmaxset
, the memory
blocks are released immediately, rather than being left to the garbage
collector.GCNode
is immune from garbage
collection while it is being constructed, leading to considerable
simplification.GCEdge
was abolished: it was felt
that the advantage of encapsulating the write barrier within a single
class was outweighed by various knock-on obscurities.HASHASH
, SET_HASHASH
and SET_HASHVALUE
abolished: the new class CXXR::String
will compute and
cache hash values automatically on demand.0.11-2.6.2
The primary objective of this phase was to update the program to parallel
release 2.6.2 of R. Errors and warnings given by make check-devel
were also corrected.
0.12-2.6.2
The primary objective of this phase was to eliminate the RObject::u
union completely, replacing its remaining elements with classes derived
from RObject
. This entailed the creation of the following
classes: BuiltInFunction
, ByteCode
, Closure
,
DottedArgs
, Environment
, Expression
,
ExternalPointer
, PairList
, Promise
,
SpecialSymbol
and Symbol
. Several loose ends
remain to be tied up, however; in particular, the remaining data members
of RObject
ought all to be private.
Other changes were as follows:
CXXR::Heap
has been renamed CXXR::MemoryBank
to avoid confusion with standard data structures called heaps.GCNode
and GCRootBase
are now
initialized using a Schwarz counter, thus enabling certain standard
objects (e.g. the 'not available' string, and the global environment) to
be declared as static class members: it is no longer necessary to wait
until InitMemory()
has been called before creating them.
This in turn simplifies the implementation of the garbage collection
algorithm, which no longer has to treat these objects specially.
Concomitant with this change, the R interpreter now terminates by
throwing an exception of class ExitException
, which
ensures that all GCRoot
objects are destroyed in the
reverse order of their creation.String
objects now belong to one of two subclasses, CachedString
and UncachedString
, with the former being the preferred
implementation. At any time, at most one CachedString
with
given text and encoding will exist; to enforce this, the class
constructor is private, and instead clients use the static method obtain()
(accessible from C via the function mkChar()
) to get a
pointer to a CachedString
object with specified text and
encoding. The implementation of the cache is different from that used in
CR, and is based on the C++ standard library; it has the advantage that
cached strings do not need any special handling by the garbage
collector. There are no facilities for modifying the text or encoding of
a CachedString
once it has been created; in particular the
function CHAR_RW()
can be used only on UncachedString
objects.0.13-2.6.2
This phase was an attempt - less successful than was hoped! - to close the gap in speed between CR and CXXR. Principal changes were:
CellHeap
. CellHeap
differs from CellPool
(used previously for this purpose)
in that whenever a memory block is requested from a CellHeap
,
the allocated block will always be the one with the lowest address among
the available blocks. This is achieved using a skew heap data structure,
and is intended to increase the spatial localisation of successively
allocated blocks. Where the underlying OS provides posix_memalign()
,
the superblocks from which memory blocks are allocated are aligned with
memory pages.MemoryBank
now uses CellHeap
s with more
closely spaced block sizes than were used previously, to avoid wasting
space in cache lines.GCNode
object has its generation changed as a
result of write barrier enforcement or by being exposed to the garbage
collector, it is no longer immediately shifted to the list appropriate
to its new generation. Instead this is deferred until the sweep phase of
a garbage collection visits the node. This avoids pulling nodes into the
processor cache unnecessarily, and paves the way for the following
change.GCNode
manages garbage
collection are now singly-linked rather than doubly-linked. This and
other changes mean that the size of a PairList
node (cons
cell) has been reduced (on 32-bit architecture) from 40Â bytes to
32Â bytes. DumbVector
nodes have been reduced in size by
12Â bytes.CellHeap
works particularly efficiently if memory
blocks are released in decreasing address order.GCNode
objects are
exposed to the garbage collector has been simplified and streamlined to
avoid pulling nodes into the cache unnecessarily. First, GCNode::expose()
exposes only the node for which it is invoked; it does not look for
unexposed descendants of this node. Secondly, protecting a node from the
garbage collector (e.g. using GCRoot<T>
or PROTECT()
)
no longer automatically exposes the node. (However, write barrier
enforcement will continue to expose nodes if an exposed node is modified
to refer to an unexposed node, and this exposure will propagate to
descendants: this falls out automatically from the write barrier
enforcement algorithm.)GCNode::operator new
no longer zeroes the memory it
allocates.0.14-2.7.1
The objective of this phase was to update CXXR to parallel release 2.7.1 of R. However, other changes are:
dynamic_cast
from the
'glue layer' between code inherited from CR and new CXXR code. (dynamic_cast
can be surprisingly slow.)SET_TYPEOF()
has been abolished.R_NilValue
is now defined as a macro expanding to NULL
(which will in turn typically expand to (void*)0
in C and
simply to 0
in C++). Previously it was defined as
SEXP R_NilValue = 0;
which necessitated unnecessary memory fetches.
0.15-2.7.1
The objective of this phase was to tidy up the class hierarchy rooted at
RObject
, and in particular to give RObject
itself a more distinctive class identity, i.e. for it to be less of a
ragbag for things that hadn't yet been accommodated elsewhere. Principal
changes were:
RObject
now controls attributes more closely. The
attributes (if present) must now be a PairList
, each of
whose elements must have a distinct symbol as its tag. No attribute may
have a null value. The m_has_class
field is automatically
set according to whether or not there is a class attribute; consequently
SET_OBJECT()
has been abolished. However, the class
interface does not yet enforce all necessary consistency conditions on
attributes; these are still applied by the code in attrib.cpp
.m_debug
field of RObject
has been
abolished. Instead the Closure
and Environment
classes each contain a field controlling debugging.m_trace
field of RObject
has been
moved to a new class FunctionBase
, from which the Closure
and BuiltinFunction
classes are now derived.m_flags
field of RObject
, which
replaced the gp
('general purpose') field within sxpinfo_struct
,
has been abolished. It has been replaced by various special-purpose
fields, placed as far down the class hierarchy as is practical at
present. A virtual function packGPBits()
is used to
reconstitute the old gp ('levels') word for the sole purpose of
serialization; virtual function unpackGPBits()
is
correspondingly used during deserialization. (However, not all of the
fields that have replaced m_flags
need to be
serialized/deserialized.)HandlerEntry
, defined locally within errors.cpp
,
is used to handle error handler entries, rather than using a ListVector
for this purpose. This avoids the former use of the m_flags
field here.const
pointers to objects that really ought to be immutable, R_UnboundValue
for example. To counter this, RObject
now has a Boolean
field m_frozen
: non-const
member functions in
the RObject
hierarchy can now apply a run-time check that
their object has not been frozen. In particular, attempting to change
the attributes of a frozen object gives rise to an error.String
is now an abstract class. CachedString
objects are now frozen by the constructor. R_NaString
is
also frozen.SpecialSymbol
has now been merged into Symbol
.
Entities such as R_UnboundValue
, which were formerly
implemented as SpecialSymbol
objects, are now implemented
as frozen Symbol
s.0.16-2.7.2
The objective of this phase was to update CXXR to parallel release 2.7.2 of R.
uncxxr.pl
. Where a source file inherited from
CR - foo.c
, say - has been adapted for CXXR (and changed
into a C++ file foo.cpp
in the process), this script
endeavours as far as possible to reverse systematic changes (e.g. the
conversion of C-style casts into C++ casts) to generate a quasi-C file foo.bakc
.
(We say 'quasi-C' file because the resulting file may not be
syntactically correct C: it is intended for human eyes only.) Updating
to a new release of R is facilitated by using a 3-way visual diff
between the release of foo.c
currently shadowed by CXXR,
the new release of foo.c
, and foo.bakc
. This
helps to highlight where the significant changes are in the new release
of foo.c
, and where they might conflict with changes made
in CXXR. (A similar 3-way comparison using foo.cpp
instead
of foo.bakc
throws up too much 'noise'.)uncxxr.pl
.
However, this has so far only been done for C++ source files that needed
to be changed in any case as part of the upgrade to 2.7.2.0.17-2.7.2
The primary purpose of this phase was to reimplement the functionality of
duplicate1()
in duplicate.cpp
using class copy
constructors and a virtual function RObject::clone()
,
reimplemented as necessary in derived classes. The following changes were
associated with this:
GCNode::expose()
is once again recursive in effect, thus
reversing a change made in Phase 13. Cloning a node often requires
cloning an entire subgraph of the node graph, via recursive
calls of clone()
to copy subobjects. The approach taken is
that while the copy subgraph is under construction, none of its
constituent nodes is exposed to the garbage collector: in particular clone()
itself does not expose the objects it creates to the collector. Only
when the copy subgraph is complete is the whole subgraph exposed, and to
do this the code that called to 'topmost' clone()
must
then apply the newly-recursive expose()
function to the
pointer that clone()
returned. (Trying to expose nodes
individually as the construction proceeded meant that they were at risk
of being snatched away by the garbage collector before the subgraph was
complete: it is difficult to work around this in a way that sits easily
with C++ programming idioms.)GCNode::devolveAge()
, used in enforcing the write
barrier, has been renamed propagateAge()
, and this
function remains recursive in effect. However, at the time of call, propagateAge(const
GCNode* node)
changes the generation number only of node
(if necessary); the recursive propagation of this change is deferred
until the start of the next garbage collection. (Unfortunately the same
technique cannot be applied to expose()
for a reason
explained in its documentation.)RObject
are clonable, and
for unclonable types, clone()
returns a null pointer. When
a copy constructor copies a pattern object containing a subobject of an
unclonable type, the object constructed will at the appropriate point
simply contain a pointer to the subobject of the pattern object, rather
than to a clone of that subobject. This copying logic is encapsulated in
a templated 'smart pointer' type RObject::Handle<T>
,
and for example the 'car' pointer of a PairList
object is
now a Handle<RObject>
. Similarly, the former
templated class EdgeVector<T>
has been replaced by HandleVector<T>
which - as the name suggests - is implemented using a std::vector<CXXR::RObject::Handle<T>Â >
.0.18-2.8.1
The objective of this phase was to update CXXR to parallel release 2.8.1 of R.
uncxxr.pl
script (see Phase 16) has been somewhat
further developed, and a larger number of C++ files derived directly
from CR have been tweaked so that uncxxr.pl
can
back-convert them more accurately to their CR form.reinterpret_cast
has been replaced by static_cast
wherever this possible
without artifice. This has been facilitated by the introduction of a
function CXXR_alloc
, which does the same job as R_alloc
,
but - like malloc
but unlike R_alloc
-
returns void*
rather than char*
. (uncxxr.pl
converts CXXR_alloc
back to R_alloc
.)0.19-2.8.1
The primary purpose of this phase was to refactor environments, to pave the way for introducing provenance-tracking features into R. The following changes were associated with this:
Symbol
class now enforces the requirement that
(except for certain special Symbol
s), there is at most one
Symbol
with a given name. (CR enforces a similar
requirement, but less comprehensively, using the install()
function.) To facilitate this, it is now a requirement that a Symbol
's
name be a CachedString
object, rather than any String
object.SYMSXP
objects contained a
pointer to an arbitrary object, which was considered to be the Symbol
's
value within R's base environment and base namespace. Objects of the C++
Symbol
class no longer contain such a pointer, and the base
environment and base namespace are implemented in exactly the same way
as other Environment
objects.SYMSXP
objects
contained a pointer to an R object of a function type, which was used
when the Symbol
was used as the name of a function invoked
via R's .Internal()
interface. Objects of the C++ Symbol
class no longer contain such a pointer; instead the relevant mapping is
defined by the C++ class DotInternalTable
.Environment
s on the search path
has been abolished, at least for the time being.Frame
has been introduced, inheriting
from GCNode
but not from RObject
. A Frame
defines a mapping from Symbol
objects to arbitrary RObject
s.Environment
object now contains a pointer to a Frame
object, which defines its 'local frame'. The base environment and the
base namespace have the same Frame
.Frame
itself is an abstract class, allowing different
implementations along the lines provided by the RObjectTables
package to be achieved simply by class inheritance. In most cases,
however, the concrete class StdFrame
is used, in which the
mapping from Symbol
s to RObject
s is provided
by a hash table, implemented using class unordered_map
from the TR1 extensions to the C++ standard library. This
implementational detail is not made visible to R code.MemoryBank::allocate()
has been changed
to allow the caller to specify that the call shall not result in a
garbage collection. Class CXXR::Allocator
uses this to
ensure that manipulations of standard containers using CXXR::Allocator
do not result in reentrant calls to the standard library code, which
might otherwise happen if the garbage collector attempted to delete
objects handled by the container.0.20-2.8.1
The purpose of this phase was extensively to reengineer garbage
collection. This was to pave the way to experimentation with
reference-counting approaches to garbage collection; however, release 0.20-2.8.1
itself still uses generational mark-sweep. A major change has been in the
way of implementing 'infant immunity', whereby nodes that are under
construction are not liable to garbage collection; the following is a
summary of the way in which this has evolved. The phrase 'infant nodes'
means nodes that are either under construction, or whose construction is
complete but which have not yet been exposed to garbage collection by
calling GCNode::expose()
.
PairList
copy
constructor, for example, the copied list was created working forwards
along the pattern list, but then the whole structure of the copied list
would then need to be traversed again to expose its nodes to garbage
collection. (This was achieved by having GCNode::expose()
automatically recurse to subobjects.)0.20-2.8.1
was to regard infant nodes as reachable during mark-sweep. So, during a
mark-sweep garbage collection, all the infant nodes and their
descendants would automatically be marked. So the PairList
copy constructor can expose the second and subsequent nodes of the
copied list immediately it has created them, leaving only the head of
the list unexposed, and thus conferring immunity from garbage collection
on the whole structure. There is no longer any need for expose()
to recurse to subobjects. The snag with this approach was that during
the mark phase, the Marker
visitor could invoke the visitReferents()
method of objects whose construction is not yet complete, and which may
therefore contain junk pointers. Obviously, if a visitor was directed to
a junk address, that would probably crash the interpreter. The
workaround for this was to have GCNode::operator new
zero
out the memory it allocated for new GCNode
objects, so
that instead of junk pointers, an object under construction would
contain null pointers, which visitReferents()
could
readily detect. However, this zeroing of memory was time consuming (and
wouldn't immediately be portable to some strange hardware architectures
in which null pointers are not represented by binary zero).GCNode
to keep a count of the number of infant nodes, and not to initiate a
mark-sweep garbage collection while any infant nodes exist. This has the
advantages of the second approach, but without the disadvantage: visitReferents()
will never be called for a node whose construction is incomplete, and
there is consequently no need for zeroing memory. It also simplifies the
handling of the case where an exception is thrown within the constructor
of an object derived from GCNode
.Other changes are as follows:
GCEdge<T>
(which was abolished at
Phase 10) has been reinstated, and encapsulates the write barrier. RObject::Handle<T>
now inherits from GCEdge<T>
.GCRoot<T>
has been renamed GCStackRoot<T>
,
and its implementation simplified. These objects remain subject to the
restriction that they must be destroyed in the reverse order of their
creation, and are therefore best suited to declaration as automatic
variables (i.e. variables on the processor stack). A new templated class
GCRoot<T>
has been introduced: this does a similar
job to GCStackRoot
(i.e. it is a smart pointer providing
protection from garbage collection), but is not subject to
creation/destruction order restrictions. However, construction and
destruction of GCRoot
s is more time consuming than for GCStackRoot
s,
so the latter should be preferred where possible. CR's 'precious list'
has been reimplemented as part of the base class of GCRoot
.
The ExitException
class has been abolished, since the new
GCRoot
s make it unnecessary.MemoryBank
no longer contains any logic related to
garbage collection, and in particular there are no callbacks from MemoryBank
into the garbage-collection code. The decision about whether to initiate
a mark-sweep collection is now taken in GCNode::operator new
.0.21-2.8.1
This phase changes the approach used for garbage collection. Previous
phases used a generational mark-sweep collector, like CR itself. As of
Phase 21, the principal method of garbage collection is reference
counting. The principal motivation for this is to make better use of the
processor caches: with reference counting, the memory occupied by objects
that become garbage is quickly recycled into productive use, very likely
while this memory is still mapped in cache.
To implement reference counting, each GCNode
object
contains a one-byte reference count, which is automatically adjusted by
the GCEdge<T>
, GCRoot<T>
and GCStackRoot<T>
smart pointers, and by the traditional CR PROTECT
/UNPROTECT
mechanism. (If a node's reference count ever reaches 255, it sticks at
that value, and that node can only be garbage-collected by the mark-sweep
mechanism.) When a GCNode
's reference count falls to zero,
it is declared 'moribund'. When GCNode::operator new
is
called upon to allocate memory for a new GCNode
object, it
first looks through class GCNode
's internal list of moribund
nodes. Any nodes on the list which still have a reference count of zero
are deleted; nodes whose reference count has risen back above zero -
accounting for about one in four of the nodes on the moribund list - are
returned to the 'live' list.
To cope with cycles in the node graph (i.e. the directed graph whose
nodes are GCNode
s and whose edges are GCEdge
s),
this reference counting scheme is backed up by a simple (i.e.
non-generational) mark-sweep scheme. However, this runs much more rarely
than CR's garbage collections, and uses a simpler logic to manipulate the
threshold at which mark-sweep collection takes place. Not having node
generations means that there is no longer a need to implement the 'write
barrier'; this in turn means that the GCEdge<T>
templated class can have a C++ assignment operator defined, which enables
it to be more freely used in connection with the container types in the
C++ standard library.
Weak reference (WeakRef
) objects need special handling
during garbage collection, and consequently each WeakRef
object now includes a pointer to itself, to stop it being deleted by the
reference counting mechanism.
0.22-2.9.1
The purpose of this phase was to update CXXR to parallel release 2.9.1 of CR. (Unfortunately, it was overtaken by release 2.9.2 of CR.)
uncxxr.h
now defines a macro CXXRconvert(type,
expr)
, which expands to type(expr)
, but which uncxxr.pl
replaces simply by expr
. This macro is now widely used in
code inherited from CR in cases where C++ requires an explicit type
conversion but C does not.0.23-2.9.2
The purpose of this phase was to update CXXR to parallel release 2.9.2 of CR. This proved straightforward.
0.24-2.9.2
This phase represented the first stage of refactoring the interpreter's evaluation logic into C++, and included the following principal changes:
CXXR::Evaluator
has been introduced to carry out
general services and housekeeping in support of evaluation. Rf_eval()
is now simply a wrapper round Evaluator::evaluate()
.RObject
now defines a virtual function evaluate()
,
which Evaluator::evaluate()
uses to evaluate a particular
object. By default this simply returns a pointer to the RObject
for which it was invoked, but this behaviour is overridden in various
classes (e.g. Expression
, Symbol
and Promise
)
to provide substantive functionality.FunctionBase
now defines an abstract
virtual function apply()
, which is invoked by Expression::evaluate()
to apply a function to a specific set of actual arguments.BuiltInFunction
now has subclasses OrdinaryBuiltInFunction
(corresponding to SEXPTYPE
BUILTINSXP
) and SpecialBuiltInFunction
(SPECIALSXP
). (It is possible that these classes will be
abolished in the future, with their respective functionalities - which
differ only slightly - being moved into BuiltInFunction
.)BuiltInFunction::apply()
, through
to the invocation of the appropriate do_
function, is now
fully handled within the CXXR core. do_internal()
has also
been absorbed into the CXXR core. For the time being, however, Closure::apply()
is simply a wrapper round CR's Rf_applyClosure()
.R_FunTab
in CR, is now a private
static data member of class BuiltInFunction
. This class
now uses a Schwarz counter, which automatically initialises the function
table on program start-up.0.25-2.9.2
This phase continued with refactoring the interpreter's evaluation logic into C++, and comprised the following principal changes:
Closure::apply()
has now been reimplemented within the
CXXR core, making use of a new class ArgMatcher
to carry
out argument matching. For the time being the function Rf_applyClosure()
remains in existence, but it is now used only in connection with method
dispatch.OrdinaryBuiltInFunction
and SpecialBuiltInFunction
have been abolished, and their
functionalities absorbed into BuiltInFunction
.RObject
,
has been defined and put into practice regarding the use of const
T*
, where T
is RObject
or a class
inheriting from it. This policy aims to resolve as far as possible an
inherent tension between the way CR is implemented and the
'const-correctness' that forms part of C++ programming style.WeakRef
) objects has
been improved and tidied up in various ways. In particular, when the key
object of a WeakRef
is found to be unreachable, it is now
guaranteed that the weak reference's finalizer (if any) will be run as
part of the same mark-sweep garbage collection that collects the key.0.26-2.10.1
The purpose of this phase was to update CXXR to parallel release 2.10.1 of CR.
0.27-2.10.1
This phase comprised the following principal changes:
SET_ENCLOS()
has been superseded by new mechanisms for
manipulating the enclosing relationships of Environment
s,
which ensure that acyclicity is preserved.Symbol
bindings found along the
search list has been introduced, similar to that used in CR.R_isMissing()
reimplemented as CXXR::isMissingArgument()
;
unlike the previous CXXR implementation, it no longer requires any
memory allocations.GCNode
class can now optionally include diagnostic
code to identify cycles within the GCNode
/GCEdge
graph.0.28-2.10.1
This phase was concerned with refactoring contexts (CR's RCNTXT
),
and involved teasing apart the numerous distinct functions that this struct
plays in CR:
Evaluator::Context
.longjmp
targets
from the destination to the point where longjmp
is called.
C's setjmp
and longjmp
are incompatible with
C++ exception handling, and were removed from CXXR at Phase 8. At that
stage, however, they were simply replaced by an exception class JMPException
,
which was designed simply to ape the behaviour previously achieved with
longjmp
. JMPException
has now itself been
abolished, and replaced with three exception classes LoopException
(servicing R functions break
and next
), ReturnException
(which services the R function return
and various other
indirect flows of control) and CommandTerminated
(raised
in response to unhandled errors or user interrupts). These new exception
classes are used in a way consistent as far as possible with C++
programming idioms; in particular, the class Evaluator::Context
plays no direct role in controlling their propagation, and the CR
function findcontext()
no longer exists.longjmp
).
For the time being, this save/restore functionality has been retained
within the Evaluator::Context
class, though in some cases
the functionality is achieved by incorporating an object of some other
class, such as ProtectStack::Scope
or RAllocStack::Scope
,
within an Evaluator::Context
object.
In all cases this save/restore functionality is now achieved,
following a standard C++ idiom, by the constructor of a stack-based
object saving state, and then its destructor restoring it. This
automatically copes both with the normal flow of control and with
exceptions, so there is now no need for CR's R_restore_globals()
function.
In the future, it is likely that some of the save/restore functions
now carried out by the Evaluator::Context
class will be
factored out into new classes with more specific responsibilities.
on.exit
expressions. This
function is now also encapsulated within the Evaluator::Context
class. Any on.exit
expressions attached to a Context
object are evaluated automatically by the object's destructor. This
automatically copes both with the normal flow of control and with
exceptions, so there is now no need for CR's R_run_onexits()
function.break
, next
and return
) are
used only in circumstances where there is an appropriate destination. In
CXXR this is now accomplished using the classes Environment::LoopScope
and Environment::ReturnScope
.Browser
.Other changes in this phase were:
Promise
stack has been abolished, the
necessary functionality now being achieved with C++ try-catch logic.R_RestartToken
,
R_ReturnedValue
and R_Toplevel
. (CR's TOPLEVEL
contexts have been replaced by Evaluator
objects.)0.29-2.10.1
The primary purpose of this release was to define the baseline for the results on add-on packages reported at useR! 2010. The changes are mainly bugfixes, but with the following more substantive changes:
RObject
hierarchy may evaluate R expressions. This
has entailed a change to the implementation of PairList::construct()
,
which was previously not reentrant; in the new implementation, this
function never gives rise to garbage collection.RObject
concerned with setting and
examining attributes are all now either virtual or implemented via
calls to virtual functions. This means that classes within the RObject
hierarchy can apply their own consistency checks to attribute settings,
and also override or augment the way in which attribute values are
stored within the C++ object.0.30-2.11.1
The primary purpose of this phase was to update CXXR to parallel release 2.11.1 of CR. This included the following corrections to significant preexisting bugs:
COMPLEX()
, INTEGER()
,
LOGICAL()
, RAW()
, REAL()
, R_CHAR()
,
STRING_ELT()
, SET_STRING_ELT()
, VECTOR_ELT()
,
SET_VECTOR_ELT()
, XVECTOR_ELT()
and SET_XVECTOR_ELT()
now verifies not only that its vector argument is a pointer to an RObject
of the correct type, but also that this argument is not a null pointer.
SET_STRING_ELT()
also now verifies that the pointer to the
new String
value is not null. These changes bring the
behaviour of these functions back into line with CR. These non-null
checks are applied even if CXXR is built with the preprocessor variable
UNCHECKED_SEXP_DOWNCAST
defined (which causes the type
checks to be elided).do_browser()
correctly saves and restores the restart handler stack, and to ensure
that the browser can be invoked at top-level. (There is however still a
problem that typing Q
into the browser does not work as
described in the manual page: it simply returns to the browser prompt.)0.31-2.11.1
This phase included extensive changes:
Evaluator::Context
class is
now the root of a hierarchy of classes. A Context object of some kind is
now created for every R function invocation (this no longer depends on
whether profiling is in progress), but the intention is that these
Context objects are lightweight, and contain only information relevant
to the particular function invocation.return
and break
functions are handled by C setjmp
/longjmp
.
Since these are incompatible with the orderly stack unwinding that C++
requires, at Phase 8 CXXR everywhere replaced invocations of longjmp
by throwing C++ exceptions. Unfortunately the propagation of C++
exceptions incurs a considerable overhead.
An R function such as return
is now implemented so that
it creates an object of a class inheriting from Bailout
.
The basic idea is that this object is then passed as a return value up
the chain from called function to caller, until it reaches the
intended destination of the indirect flow of control. However, this
passing up the call chain happens only if the caller has indicated, by
wrapping its call in a BailoutContext
, that it is able
to propagate the Bailout
object correctly. If that is
not the case, then the called function will invoke the throwException()
method of the Bailout
object, which - as the name
suggests - will complete the indirect flow of control by throwing a
C++ exception.
This change has greatly reduced the number of C++ exceptions that are thrown, with corresponding benefits for performance.
ArgList
.
Rf_applyClosure()
and R_execClosure()
have
been abolished, their functionality now being incorporated into the Closure
class. However much remains to be done.MemoryBank
and
CellPool
) using Valgrind client requests. This
instrumentation was controlled by the preprocessor variable VALGRIND_LEVEL
.
Unfortunately the instrumented CXXR ran under Valgrind with glacial
slowness, making it useless for practical purposes. Under the new
approach, VALGRIND_LEVEL
has been abolished. Instead, when
Valgrind (+memcheck) is to be used, the file MemoryBank.cpp
should be recompiled with the preprocessor variable NO_CELLPOOLS
defined, and CXXR rebuilt. (Only this one file needs to be recompiled.)
When NO_CELLPOOLS
is defined, class MemoryBank
routes all requests for memory blocks directly to ::operator new
(which no doubt in turn calls malloc()
). This means that
Valgrind's internal malloc()
substitute comes into play,
and the result runs at an entirely usable speed.
CXXR has also been changed to carry out a more thorough clean-up at
program exit; in particular all objects of a class derived from GCNode
are deleted, and the tables of Symbol
s and CachedString
s
are deleted. This suppresses a lot of the 'possibly lost' reports that
Valgrind's leak check would otherwise report.
0.32-2.11.1
This phase consisted of changes to improve the speed of CXXR. The principal changes were as follows:
GCNode
falls to zero, it
is designated as 'moribund'. Previously moribund nodes were moved onto a
separate doubly-linked list of nodes (and moved back again if the
reference count was found subsequently to have risen). Now instead the GCNode
class maintains a vector of pointers to moribund nodes. Also, the
moribund flag within a GCNode
object is now incorporated
into the same byte as the saturating reference count.PairList
objects have now been squeezed into 32 bytes (on
32-bit architecture) - with some resulting inelegances in encapsulation
- and Frame::Binding
objects have been reduced to 16 bytes
(again on 32-bit architecture). Class CellPool
now
allocates its 'superblocks' on 4096-byte boundaries. These changes make
for better utilisation of the processor caches.VectorFrame
has been introduced, and used to
implement the local Environment
s of Closure
calls instead of the StdFrame
s used previously. As the
name suggests, VectorFrame
is an implementation of the Frame
abstract type which holds its constituent Frame::Binding
s
as a vector. Although look-up time is asympotically linear in the number
of Bindings, as compared with the logarithmic performance of StdFrame
,
it has a shorter construction and destruction time than StdFrame
,
and is better localised in memory. These factors make VectorFrame
more efficient in implementing small Frame
s with a short
lifetime.0.33-2.12.1
The purpose of this phase was to update CXXR to parallel release 2.12.1
of CR. In the course of this, the use of UncachedString
objects was largely replaced by the use of CachedString
objects, a change that has lagged behind the corresponding change in CR.
0.34-2.12.1
This phase was marked by a wider use of C++ generic programming techniques, both to simplify the internal code, and to make this code available in a flexible form to add-on packages. In particular:
FixedVector
.Subscripting
and
associated functions.)VectorOps
.ElementTraits
namespace.0.35-2.12.1
This release is intended to clear the decks prior to an upgrade to R 2.13.1, and includes only small changes in the development trunk:
Subscripting
has now been extended to cover
subassignment to matrices and arrays.GCNode
has been modified,
reducing its administrative data to a single byte.(The main activity in the period leading up to this release has been the
introduction of the lazycopy
branch, which is exploring
methods for managing object duplication automatically via the RHandle
smart pointer, and eliminating the need for NAMED()
and SET_NAMED()
.
Verdict so far is mixed: it basically works, but has performance issues,
and breaks somewhat more existing code than I'd like. A plus point is that
it better achieves C++ 'const correctness' than the development trunk.)
0.36-2.13.1
The purpose of this phase was to upgrade CXXR to parallel release 2.13.1
of CR. This includes making bytecode interpretation available in CXXR for
the first time, though not yet in the 'threaded code' implementation
(which is the CR default when using gcc
).
The code also now builds correctly when configured with --enable-memory-profiling
.
(Thanks to Doug Bates for pointing out that previously it didn't.)
However, the functionality of tracemem
and kindred R
functions (untracemem
and retracemem
) is
currently unavailable in CXXR even when it is configured with memory
profiling enabled.
0.37-2.13.1
This release contains only minor changes:
tracemem
and kindred functions has
been reinstated.gcc
(as in CR).0.38-2.13.1
This release clears the decks prior to an upgrade of CXXR to R 2.14.1.
The principal change regards garbage collection. The reference-counted approach to garbage collection primarily used by CXXR can bring speed advantages when dealing with large datasets, but the housekeeping involved in diddling reference counts up and down as required is surprisingly time-consuming, and this is a major contributor to the speed penalty of CXXR compared with CR when dealing with small datasets, a penalty that has grown greater with the advent of the bytecode interpreter. This release incorporates the following changes:
GCNode::gclite()
)
on every call to GCNode::operator new
. This is still the
case if CXXR is built with the preprocessor variable AGGRESSIVE_GC
defined (as is the case in the default configuration), but otherwise gclite()
is invoked only when the number of bytes allocated has risen by a
certain margin (currently 10,000) since the previous call of gclite()
.GCStackRoot
class template are
now in either a non-protecting or protecting state, with newly created GCStackRoot
s
being non-protecting. Only if a GCStackRoot
is in the
protecting state does it increment the reference count of its target. GCNode::gclite()
switches all GCStackRoot
s into the protecting state before
starting garbage collection. Taken in conjunction with the first change,
this means that many GCStackRoot
pointers will complete
their lifecycle without ever being switched into the protecting state.ProtectStack
) and the bytecode
intepreter's node stack, both of which are now implemented using the new
class NodeStack
.A side effect of the above changes is that when AGGRESSIVE_GC
is defined, CXXR's garbage collection is even more aggressive
than it was in previous releases, and this has revealed a number of
GC-protection gaps (e.g. in code inherited from CR) that had previously
'slipped through the net'.
Another significant change is that the CXXR distribution no longer holds
the 'Recommended
' packages in compressed tar form (.tar.gz)
,
but instead contains the untarred package directories themselves. This
will make it easier to carry forward any CXXR-specific tweaks to these
packages from one R release to the next. (Such tweaks are rare, and often
due to a latent GC-protection bug in the CR package code.)
0.39-2.14.1
The purpose of this phase was to upgrade CXXR to parallel release 2.14.1 of CR. This entailed substantial changes to the bytecode interpreter, both to track changes in CR and to correct errors in the previous CXXR implementation. In the course of preparing this release, numerous GC-protection gaps were discovered in the CR code (including the Recommended packages) and corrected within CXXR.
CXXR's bytecode interpreter does not yet implement the cache of symbol bindings used in CR.
0.40-2.15.1
The purpose of this phase was to upgrade CXXR to parallel release 2.15.1
of CR. In the course of this upgrade, the class UncachedString
was abolished, and the functionality of class CachedString
was merged into its parent class CXXR::String
.
0.41-2.15.1
In this phase, the experimental provenance-tracking facilities and the
experimental XML-based serialization facilities, both formerly in the provenance
branch, have been merged into the development trunk. Beware that
documentation and in particular the testing of these features is still not
up to standard, and there are known gaps in the serialization capability.
Moreover the interfaces of both are likely to change. To enable
provenance-tracking it is necessary to define PROVENANCE_TRACKING
within src/include/CXXR/config.hpp
before building the
program, as the documentation of this file explains.
0.42-2.15.1
This phase saw various extensions and corrections to the XML-based
serialization facilities, including the introduction of automated tests,
but beware that these are still subject to change. The release
incorporates work by Chris Silles on adapting the autoconf-based
configuration facilities to CXXR: this addresses particularly locating a
suitable installation of Boost, and enabling or disabling provenance
tracking. Previously there were some difficulties in building CXXR
otherwise than in its source directory: these have now, it is hoped, been
removed.