By Eric Burden | April 8, 2021
I’ve mentioned in other posts that I’ve been learning the Rust programming language, with the explicit intention of implementing some R functionality in Rust for performance gains. I discuss this rationale in more detail in THIS blog post. In that post, I discuss the rustbind
project, where I explore strategies for calling Rust code in an R package. The goal of that project is to provide a straightforward set of patterns that I (and other developers) can leverage for integrating Rust into future R projects. That work is made much simpler by the inclusion of the extendr-api
Rust crate and the great work of the extendr
project. Here, I will walk through the process of adding a simple Rust function to an R package.
Bubble Sort
Consider the Bubble Sort algorithm. Briefly, Bubble Sorting requires traversing a list of elements and exchanging them pair-wise until the entire list is in order, as demonstrated in the below pseudocode.
# Not real code
function bubble_sort(arr) {
n = len(arr)
// Traverse through all array elements
for i in range from 0 to n {
for j in range from 0 to (n-i-1) {
// For each element `j`, swap with element `j+1` if element `j` is
// larger
if arr[j] > arr[j+1] :
swap(arr[j], arr[j+1])
}
}
}
This is not an efficient way to sort a list (or vector), but it is an algorithm that can be run much faster by implementing it in Rust instead of R alone.
Adding the Rust Implementation
To keep the code nicely separated, start by creating a file rustbind/src/rust/src/bubble_sort.rs
with the following code:
// Rust Code
use extendr_api::prelude::{Real, RobjItertools};
pub(crate) fn bubble_sort_fn(input: Real) -> Real {
let mut nvec: Vec<_> = input.collect();
let len = nvec.len();
for idx in 0..(len - 1) {
let last_idx = len - idx - 1;
for inner_idx in 0..last_idx {
if nvec[inner_idx] > nvec[inner_idx + 1] {
nvec.swap(inner_idx, inner_idx + 1)
}
}
}
nvec.iter().collect_robj().as_real_iter().unwrap()
}
In order to generate the wrappers and bindings, add the following to rustbind/src/rust/src/lib.rs
:
// Rust Code
use extendr_api::prelude::*;
mod bubble_sort;
// Other modules here...
/// Bubble Sort a vector of doubles
///
/// Demonstrates using Rust to perform a Bubble Sort on a vector of doubles
///
/// @params input A double vector to sort
/// @return a sorted vector of doubles
///
/// @examples bubble_sort(runif(1000))
///
/// @export
#[extendr]
fn bubble_sort(input: Real) -> Real {
bubble_sort::bubble_sort_fn(input)
}
extendr_module! {
mod rustbind;
// Other functions to export go here...
fn bubble_sort;
}
And that’s it! After sourcing the build.R
script1 in the package to build, install, and load your package, you will be able to call yourpackagename::bubble_sort(n)
and take advantage of Rust’s blazing speed in your R code. Also, note the format of that doc comment. If you’ve written R packages before, you’ll recognize those as roxygen2
comments, which are used to build the documentation for your package, as well as control whether or not the function is available outside your package (via package_name::function_name) through the ‘@export’ tag.
Was it Worth It?
Now, that’s definitely a relatively more complicated way to just write Rust code instead of R code, was it worth it? I would argue that for any type of exploratory or ad hoc analysis, the answer is most definitely no. But, if you’re writing R code that will be used in production to run the same type of algorithm over and over again, the speed gains are tremendous. Consider the analogous R implementation of the Bubble Sort algorithm:
# R Code
#' Bubble Sorting - R Implementation
#'
#' @param nums numeric vector to sort
#' @return a sorted integer vector
#' @export
bubble_sort_r <- function(nums) {
nums <- if (missing(nums)) { stats::runif(1000) }
n <- length(nums)
for (i in 1:(n-1)) {
for (j in 1:(n - i)) {
if (nums[j] > nums[j + 1]) {
temp <- nums[j]
nums[j] <- nums[j + 1]
nums[j + 1] <- temp
}
}
}
nums
}
Notwithstanding the slight modification needed because R doesn’t provide a convenient way to swap values without using a temporary variable, this is nearly identical to our Rust implementation. But, if we benchmark the two functions…
test replications elapsed relative user.self
2 bubble_sort_r(input) 100 384.256 24.789 384.045
1 bubble_sort(input) 100 15.501 1.000 15.502
We see that, for vectors of 10k random numbers, the Rust implementation is nearly 25x faster than the R implementation, for the same task. That’s a pretty massive speedup, considering we’ve used a fairly naive implementation of this algorithm (we’re copying the vector at least twice). So, there will definitely be situations where you will save a huge amount of processing time by implementing functions in Rust (just like if you were implementing an underlying function in C or C++), with the added safety guarantees of Rust. You may find you are even able to perform calculations that simply aren’t feasible (at least not in any reasonable amount of time) in pure R. So, happy coding!
Extending extendr
You may encounter situations in which extendr
does not behave as expected or support your use case (yet). As of the date this article was written, extendr
v0.2.0 doesn’t support (so far as I can tell) passing in character vectors that may contain NA’s or correctly giving back integer vectors with NA’s (they get converted to 0). There are at least two different ways to address these issues as they arise:
Wrapping Rust Calls in R
One strategy is to ‘wrap’ the functions automatically generated by extendr
to address these issues by modifying the input or output on the R side. For an example of this, see R/r-wrappers.R
.
Implementing R <-> Rust Conversions in Rust
Another strategy is to create structs and types in Rust with appropriate trait implementations (particularly FromRobj
and From<T> for Robj
) in your Rust module. This has the added benefit of plugging directly into the extendr
infrastructure. For an example of this, see src/rust/src/structs/char_vec.rs
.
Future Work
One of the really compelling things about R is the incredible support for working with tabular data through built-in data frames or data frame derivatives from other R packages. The Apache Arrow project provides a data frame implementation that can be accessed and manipulated across different languages and runtimes. I’d like to leverage this project to pass data frames back and forth from R to Rust, and manipulate the data in either.
-
The RStudio IDE provides built-in build tools, but the
build.R
script included inrustbind
incorporates adding the Rust wrapper functions to theR/extendr-wrappers.R
file. ↩︎