This page gives guidance on porting R code and C packages to work with CXXR, and on some coding practices that may affect the portability of CXXR to different platforms.
In general, if R code behaves differently under CXXR and under the standard C-based R (in the release of CR on which this release of CXXR is based), then that is a bug in CXXR: please report it. However, differences in timing and in space consumption are to be expected. The following differences are intentional:
asS4()
) for an object which is of type S4
(S4SXP
). Update: this restriction was applied
in release 0.17-2.7.2, but is no longer applied as of release 0.18-2.8.1
because of an issue in the methods
package; it may be
reapplied at some time in the future.SYMSXP
) is strongly
discouraged, and may be forbidden in future. Attributes may not be set on
string (CHARSXP
) objects.on.exit
expression, the error is reported, but it does not result in an immediate
return to top-level: the function to which the on.exit
relates
returns in the same way as if the on.exit
expression had not
resulted in an error. This behaviour is nonconformant to Sec. 8.3 of the R
Language Definition, but fixing this is not seen as a priority. Let me know
if this causes you problems (or if you actually prefer this
behaviour!).x
is a one-dimensional array with dimnames, then in CXXR
evaluating x[]
preserves the dimnames; in CR they are
discarded.tracemem
(and untracemem
and retracemem
) is currently unavailable in CXXR, even if CXXR
is configured with --enable-memory-profiling
. Let me know if
this is a serious nuisance: otherwise fixing it is of low priority.Also, there may be differences in the behaviour of R functions which probe into the implementation of the interpreter. In particular:
gc()
now returns different quantities, in the form of a
vector with three rows and two columns: see the revised help page. The
reporting enabled by gcinfo(TRUE)
in CR is not currently
implemented in CXXR; in other words, the function gcinfo()
is
a no-op. Likewise gctorture()
is a no-op in CXXR, whose
aggressive approach to garbage collection will generally manifest
memory-protection bugs without help!mem.limits(nsize, vsize)
retains the same
interface, the interpretation of the quantities involved has changed
somewhat; likewise the corresponding command-line options to R. See the
revised Memory
help page for details.Rprofmem
in
package utils
, has not (yet) been properly reengineered for
CXXR, and the relevant code has not been tested. Do not rely on the CR
behaviour to persist.memory.profile()
performs no useful function in
CXXR: it simply returns a vector of zeroes.--with-valgrind-instrumentation
is not used. If valgrind is to
be used with the memcheck tool, it is recommended that
MemoryBank.cpp
be recompiled with the preprocessor variable
NO_CELLPOOLS
defined. Then CXXR will allocate all memory
blocks directly via C++'s ::operator new
(rather than using
CXXR's internal memory pools implemented by class CellPool
),
and such memory blocks will therefore be monitored by memcheck.hash
parameter to new.env()
is ignored, and
env.profile()
always returns NULL
.Moreover, while CXXR is at an alpha development stage, internal logic errors
(i.e. errors due to bugs in the interpreter rather than to bugs in the R code
it is interpreting) will sometimes cause the interpreter itself to terminate,
even in circumstances where CR would manage to recover to the top-level prompt.
This is intentional in order to 'preserve the scene of crime' for debugging.
Also, typing Control-C will currently cause the CXXR interpreter to terminate,
rather than returning to the top-level prompt: if this is a nuisance, change
the setting of R_SignalHandlers
at around line 809 in
main.cpp
from 0 to 1, and rebuild.
C and C++ code that appears to work under CR can often contain latent memory protection bugs that will only manifest themselves when a garbage collection occurs at a particular point in execution. Such bugs are very likely to become hard failures under CXXR: refer to this page for advice on diagnosing such bugs.
R.h
or S.h
:gcc
this can be achieved by specifying the compiler flag
-fexceptions
. The most likely case where this will be
necessary is in code that calls error()
(aka
Rf_error
); without this change, the R interpreter may not
return correctly to the top-level R prompt following an error. (It is
intended to remove this requirement in a future release.)R_alloc()
and kindred
functions always return a memory block containing at least one more byte
than the number requested. This cannot be relied upon in CXXR.Rinternals.h
:The following changes are required in addition to those listed above for
code using .h
or S.h
:
R_NilValue
in CXXR is simply a macro expanding to
NULL
(which will typically be further macro-expanded to
(void*)0
in C, or plain 0
or
null_ptr
in C++), rather than a pointer to a real object. To
smooth this change, the following changes have been made to accessor
functions:
CAR()
, CDR()
, TAG()
and
ATTRIB()
each return a null pointer if passed a null
pointer.OBJECT()
and IS_S4_OBJECT()
each return
FALSE
if passed a null pointer.LENGTH()
, NAMED()
and TRACE()
each return 0 if passed a null pointer.SET_NAMED()
is a no-op if the first argument is a null
pointer.However, other accessor functions are likely to crash if invoked for
R_NilValue
: the calling code should introduce appropriate
checks and workarounds.
In existing C/C++ code, it occasionally happens that a function is
designed to return R_NilValue
to signify that the result is an
R NULL
, and to return a null pointer to signify some other
eventuality such as an error. Note that in CXXR these return values are
indistinguishable, and such functions must be redesigned.
SEXPTYPE
is now an enumeration, rather than being
typedef
ed to unsigned int
, but the numerical
values of particular SEXPTYPE
s are unchanged. (This change
appears to have been under comtemplation within CR.) This may necessitate
some explicit conversions or changes in the types of variables.VECTOR_ELT()
and SET_VECTOR_ELT()
can
only be applied to SEXP
s of type VECSXP
(implemented internally using class CXXR::ListVector
). In
particular, these functions cannot be applied to EXPRSXP
s, for
which the new functions XVECTOR_ELT()
and
SET_XVECTOR_ELT()
should be used instead.ConsCell
object (i.e.
LISTSXP
, LANGSXP
, DOTSXP
or
BCODESXP
) must be of class PairList
(i.e.
LISTSXP
specifically).SET_TYPEOF()
has been abolished.allocSExp()
can be used to create objects only of
ConsCell
types.SET_FORMALS()
and SET_BODY()
are no
longer available: the formals and body of a closure must be set at the time
the closure is created (e.g. using mkCLOSXP()
).SET_PRENV()
and SET_PRCODE()
are no
longer available; the environment and code of a Promise
must
be set at the time it is created (e.g. using mkPROMISE()
).
SET_PRVALUE()
is still available (for the time being), and
will automatically null the environment pointer if the value is set to
anything other than R_UnboundValue
.LENGTH()
points to a
vector object. CXXR does check this (unless
UNCHECKED_SEXP_DOWNCAST
is defined); however, a null pointer
argument is acceptable, in which case (as noted above)
LENGTH()
simply returns 0.SET_ATTRIB(x, v)
does not simply plug its (list)
argument v
into the attribute field of x
; instead
it presents the elements of the list in sequence to
RObject::setAttribute()
, which verifies that class invariants
are preserved. Consequently altering v
after calling
SET_ATTRIB()
- as is currently done in model.c
in
CR - may well not have the desired effect, and is deprecated. (Although in
CXXR - unlike CR - SET_ATTRIB()
may allocate memory, it has
been engineered so that this will never result in a mark-sweep garbage
collection; this is to avoid breaking existing code.)SET_ATTRIB()
to a cached
CHARSXP
raises an error: such objects should be regarded as
immutable. Applying SET_ATTRIB()
to a SYMSXP
is
strongly discouraged, and may raise an error in future.SET_OBJECT()
has been abolished. The
m_has_class
field of RObject
is maintained
automatically by the RObject
class interface, according to
whether or not a class attribute is set.RDEBUG()
and SET_RDEBUG()
are applicable only
to closures; use ENV_DEBUG()
and SET_ENV_DEBUG()
to query/control single-stepping within environments.TRACE()
and SET_TRACE()
are applicable only to
FunctionBase
objects (i.e. CLOSXP
,
SPECIALSXP
and BUILTINSXP
) and will raise an
error if used otherwise. However, as noted above TRACE()
may
also be applied to a null pointer, in which case it returns 0.PRSEEN()
and SET_PRSEEN()
are no longer
available. The relevant code should instead use the interface of class
Promise
directly.ENVFLAGS()
, HASHTAB()
,
SET_ENCLOS()
, SET_ENVFLAGS()
,
SET_FRAME()
and SET_HASHTAB()
are no longer
available. The relevant code should instead use the interface of class
Environment
directly.FRAME()
produces on the fly a representation of an
Environment
's frame as a PairList
; it is no
longer a simple accessor function. Consequently its return value will need
specific protection from garbage collection: you cannot rely for this on
the fact that the Environment
itself is protected.LEVELS()
and SETLEVELS()
should be used only
during serialization and deserialization respectively. This reflects the
fact that the 'general purpose' field of CR (field gp
of
sxpinfo_struct
) has been replaced by various
special-purpose fields, each placed as far down the
RObject
class hierarchy as is practical.S4Object
(S4SXP
). (According to the 'R Internals'
document, CR apparently does, but this doesn't appear to be implemented
consistently: for example - at least as of CR 2.7.2 -
duplicate()
doesn't duplicate the tag field.)RObject::setS4Object(false)
) for an object
of type S4Object
(S4SXP
). (This restriction is
currently in abeyance.)Rf_mkSYMSXP()
is no longer available. The relevant
code should use instead Symbol::obtain()
to obtain a pointer
to a symbol (and this enforces that requirement that there should be at
most one standard symbol with a given name);
Symbol::makeDDSymbol()
is also available to create a dot-dot
symbol.BINDING_IS_LOCKED()
, LOCK_BINDING()
,
UNLOCK_BINDING()
and SET_ACTIVE_BINDING_BIT()
are
now applicable only to objects of type PairList
(LISTSXP
), and should be used only in connection with the
serialization and deserialization of environments. Similarly,
IS_ACTIVE_BINDING()
is now only applicable to objects of a
type derived from ConsCell
. In particular, these functions are
not applicable to symbols (SYMSXP
). This reflects the fact
that the base environment is now a regular environment, rather than being
implemented via the contents of Symbol
objects. For
the same reason, SYMVALUE()
and SET_SYMVALUE()
now simply look up or set the value of a symbol in the base
environment.Rf_eval()
to a
CHARSXP
; in CXXR it is not an error, and doing so returns a
pointer to the CHARSXP
(String
) object itself,
with its NAMED
field increased if necessary to 2, i.e. this
case is handled in the same way as REALSXP
etc.BUILTINSXP
function, it coerces any tags in
the argument list to be SYMSXP
objects; CXXR does not. (But
preferred practice in CXXR is for tags always to be Symbol
(SYMSXP
) objects anyway.)R_RestartToken
does not exist in CXXR.Rf_countContexts()
is not available in
CXXR.Rf_applyClosure()
no longer exists; instead use
the interface of class Closure
directly.R_bcEncode()
and R_bcDecode()
no
longer exist. (When 'threaded code' is in use, CXXR stores the threaded
form of the bytecode inside a ByteCode
object alongside the
unthreaded form; this threaded form is created automatically by the class
constructor, and is not visible outside the class.) Typically, in CXXR
R_bcEncode(x)
can be replaced simply by x
, and
similarly for R_bcDecode(x)
.class
, this
,
new
or private
, for example (all of which occur
in CR!). But remember to use extern "C"
appropriately in your
header files, to prevent C++ mangling of the names of functions intended to
be visible to C code.const
qualifier is being discarded.const
qualifiers, especially to function arguments,
wherever appropriate.SYMSXP
objects (i.e. symbols) are
automatically protected against garbage collection (though for the present
they are).gfortran
this can apparently be achieved by specifying the
compiler flag -fexceptions
(though it is not well documented).
The most likely case where this will be necessary is in code that calls
rexit()
; without this change, the R interpreter may not return
correctly to the top-level R prompt following an error. (It is intended to
remove this requirement in a future release.)It is a central intention of CXXR to make a wide range of functionality,
over and above that offered by CR, available to C++ packages via the
$(R_HOME)/include/CXXR
API. However, package authors should be
aware that this API is currently in a state of considerable flux from release
to release. If you are exploiting this API please let me know, so that I can
take your requirements on board, and forewarn you of upcoming changes.
The following are areas where portability has been traded for efficiency, simplicity and/or clarity in new code generated for CXXR. There may be other unportabilities in code inherited from CR.
std::list
, it may in places be assumed that if a node
is spliced from one list to another, an iterator pointing to that node
remains valid as an iterator (though it is now an iterator within the
destination list rather than within the source list). This assumption is
specifically contrary to ISO14882:1998 and ISO14882:2003, though this has
been identified as a defect
in the standard, rectified in the draft 'C++ 0x' standard.make
are available:
make
variables using +=
.include
directive; the included files are remade by make
automatically if necessary.%
as a placeholder).