As distributed for compilation with gcc
, CXXR specifies no C++
optimisation flags by default, and includes extensive consistency checking
code. The resulting executable is considerably slower than CR, by a factor of
as much as eight.
In particular, the default installation carries out thorough run-time type
checking on the interface between code inherited from CR and 'native' CXXR
code. For example, in a code fragment such as the following, where
x
is of type RObject*
(i.e. SEXP
):
double sum = 0.0; unsigned int i; for (i = 0; i < LENGTH(x); i++) sum += REAL(x)[i];
CXXR will check on each iteration that x
actually
points to a RealVector
- a class derived from RObject
- and cast it accordingly. (CR will apply an equivalent check, but only if
REAL()
is invoked from outside the main part of the interpreter.)
In fact CXXR will make two type checks on each iteration, one for
REAL()
and one for LENGTH()
.
In newly written C++ code, the above fragment would preferably be rewritten along the following lines:
double sum = 0.0; const RealVector& rv = *SEXP_downcast<RealVector*>(x); for (unsigned int i = 0; i < rv.size(); ++i) sum += rv[i];
(or better still using std::accumulate()
). The
SEXP_downcast
will normally report an error if x
does
not actually point to a RealVector
, but this check is done only
once, not every time around the loop.
Unfortunately it would not be practical to go through all the code inherited from CR, replacing the first idiom with the second; not only would this change be very time consuming, it would make it extremely difficult subsequently to update CXXR to reflect a new release of CR.
A cruder approach is to suppress altogether the run-time checks made by
functions such as LENGTH()
and REAL()
. This can be
accomplished by defining the preprocessor variable
UNCHECKED_SEXP_DOWNCAST
within
src/include/CXXR/config.hpp
when building CXXR.
To build CXXR for maximum speed it is recommended that definitions of
UNCHECKED_SEXP_DOWNCAST
and NDEBUG
be added to
config.hpp
, and that the definition of CHECK_EXPOSURE
be removed. (See here for other build
options.) The C++ optimisation level should be raised to -O2
: this
can be accomplished by setting CXXFLAGS
to -O2
in the
shell from which make
is invoked. (On 32-bit Intel I have also
found the compiler flag --param inline-unit-growth=100
helpful.)
When built in this way, the performance of CXXR depends very much on the R script being run. In particular CXXR at present has a higher overhead than CR in setting up R function calls (whether to built-in functions or closures), and in the housekeeping related to garbage-collection. Consequently CXXR fares particularly badly running scripts that involve many R manipulations on small datasets; many of the R test scripts are of this kind, and on such scripts CXXR currently runs at down to about three-quarters the speed of CR, sometimes even less. (But this is a considerable improvement on earlier releases.) On the other hand, CXXR's more aggressive approach to garbage collection means that it has leaner memory requirements, and makes more effective use of the processor caches. Consequently it tends to come into its own working with larger datasets, and in many such cases CXXR runs somewhat faster than CR. Indeed Jens Oehlschlägel has produced an example where CXXR runs up to three times as fast as CR.
.h
and the
.hpp
files in the CXXR
directory?The .hpp
header files are intended to be #include
d
only into C++ source files, and will probably give compilation errors if
#include
d into a C source file. The .h
files may be
#include
d into both C++ and C source files, though C++ files will
normally see additional content.
From the CXXR point of view, many of the functions in the R API can be
considered to provide a C interface to the facilities of a particular C++
class. Consequently it makes sense to gather together the prototypes that
relate to a particular class into the header file for that class, and to
include documentation for these C interface functions alongside the
documentation for the class methods and for other class-related functions
available only to C++ programs: all concentrated within the class's header file
in the CXXR
directory.
If that were the only consideration, Rinternals.h
(which
defines the R API) could simply be modified to #include
all the
relevant header files from the CXXR
directory, and contain very
little content of its own. However, that would cause problems when it came to
update CXXR to reflect a new release of CR, because changes within CR to the
prototypes of API functions would not automatically be picked up. Consequently,
the approach taken is to keep CXXR's Rinternals.h
as close as
possible to its CR version, but to have it also #include
the
relevant header files from the CXXR
directory. Then, if the
prototype of an API function changes from one release of CR to the next, the
change will be picked up in CXXR's Rinternals.h
using svn
merge
, and then the compiler will immediately flag up the inconsistency
between Rinternals.h
and the relevant header file in the
CXXR
directory. Similar considerations apply to the other
'omnibus' header file, Defn.h
.
Foo
always to be found in Foo.cpp
?Obviously one reason may be that the method is inline, in which case its
implementation is to be found in Foo.h
or Foo.hpp
.
Even for non-inline methods, however, the implementation may not be located
in Foo.cpp
, though in this case there will usually be a comment in
Foo.cpp
to say where it is to be found.
The most common reason for this is if the implementation of the method
inherits a substantial amount of code from CR. In that case, it can make sense
to leave that code in the same place in the same source file (subject to
renaming from .c
to .cpp
) as its location within CR.
Then there is a good chance that, when CXXR is updated to reflect a new release
of CR, corrections and enhancements to that code will be automatically picked
up during the code merge.
uncxxr.h
and uncxxr.pl
, and
why the weird spacing?Where a source file inherited from CR - foo.c
, say - has been
adapted for CXXR (and changed into a C++ file foo.cpp
in the
process), the script uncxxr.pl
endeavours as far as possible to
reverse systematic changes (e.g. the conversion of C-style casts into C++
casts, and casts that C++ requires to be explicit but C does not) to generate a
quasi-C file foo.bakc
. (We say 'quasi-C' file because the
resulting file may not be syntactically correct C: it is intended for human
eyes only.) Updating to a new release of R is facilitated by using a 3-way
visual diff between the release of foo.c
currently shadowed by
CXXR, the new release of foo.c
, and foo.bakc
. This
helps to highlight where the significant changes are in the new release of
foo.c
, and where they might conflict with changes made in CXXR. (A
similar 3-way comparison using foo.cpp
instead of
foo.bakc
throws up too much 'noise'.)
Some changes have been made to the program text of files such as
foo.cpp
, so that uncxxr.pl
can make a better job of
recovering the form of the file in CR. This includes inserting additional
whitespace and redundant brackets, and the use of various macros defined in
uncxxr.h
.
GCNode
declared explicit
?In C++, a constructor that is capable of being called with a single argument
defines an implicit conversion from the type of that argument to the class
being constructed. This default behaviour can be prevented by qualifying the
constructor with the keyword explicit
.
Perspicacious readers may have noticed that since GCNode
and
all classes derived from it have (or should have) private or protected
destructors, it will be impossible for the compiler to create temporary objects
of these classes, and hence no implicit conversions can be carried out anyway.
However, we consider it good practice to declare constructors
explicit
in cases where - even if an implicit conversion were
feasible - it would not be desired. (This in fact covers the majority of
constructors callable with a single argument.) Moreover, following this
practice may lead to clearer compiler error messages, because the compiler need
not even consider using implicit conversions.