Rerun

Rerun 0.9 gives access to the underlying ECS

Written by Nikolaus West 7 months ago

We’re building a fast and easy to use general framework for handling and visualizing streams of multimodal data. This is a big undertaking, and the way we’re getting there is by starting with a fast and easy to use visualizer for computer vision, and then making it more capable and extensible piece by piece.

Rerun 0.9.0 is released two months after 0.8.0, but it’s been even longer coming. It includes the foundation of the coming C++ SDK, our most asked for feature by far. In order to maintain great and consistent APIs across Rust, Python, C++, and any future languages, we’ve rebuilt much of Rerun's data infrastructure around a new code generation framework.

0.9 also adds support for logging markdown, a new in-viewer getting started experience, and as always, a bunch of performance improvements.

Load example recordings, including descriptions of how they were made, directly in the viewer.

This has all been a huge lift, but what I’m most excited about are our redesigned APIs and what they pave the way for in future releases. From this release on, we’ll start to expose more and more of Rerun's underlying infrastructure, starting with the core data model, a hierarchical and time varying Entity Component System (ECS).

To ease the transition for Python users, we've marked the old APIs as deprecated in 0.9 with migration instructions in the warning messages. The old API will be removed completely in 0.10. Check out the migration guide for more details on updating your code.

A more type centric logging API

At the heart of Rerun is the ability to handle streams of multimodal data, e.g. images, tensors, point clouds, and text. To get data out of your programs and ready to be visualized, you log it with the Rerun SDK. Rerun handles everything needed to make that work. It doesn't matter if the data source and visualization are in the same process or the data is coming in real-time from multiple devices.

The ease of use, expressiveness, and extensibility of these APIs are core to the usefulness of Rerun. On a first glance, the API changes introduced in 0.9 are very small. For example, here is how you might log a single colored point cloud, represented by two 3xN numpy arrays of positions and colors.

Old Python API

rr.log_points("example/points", positions, colors=colors)

New Python API

rr.log("example/points", rr.Points3D(positions, colors=colors))

Both these log calls take the same user data. The difference is in the data type information, which is moved from the function name rr.log_points to a type, rr.Points3D that wraps the logged data. This new structure both opens up for more direct control of the underlying ECS and for more ergonomic logging of your own objects.

Lower level control of Entity Components

Rerun comes with a set of built in archetypes like rr.Points3D , rr.Image, and rr.Tensor. An archetype defines a bundle of component batches that the Rerun Viewer knows how to interpret, such that Rerun will just do the right thing™️ when you log it. In this case, that’s one component batch for positions and one for colors.

rr.log("example/points", rr.Points3D(positions, colors=colors))
# is equivalent to
rr.log("example/points", rr.Points3D(positions, colors=colors).as_component_batches())
# which in this case is the same as
rr.log("example/points", [rr.Points3D.indicator(),
                          rr.components.Position3DBatch(positions),
                          rr.components.ColorBatch(rgb=colors)])

Partial updates using the component level API

In most cases, you’ll want to stick to the high level archetype API, but directly setting single components gives a lot of control, which can matter. For instance, a common use case is meshes where only the vertex positions change over time. Logging the whole mesh for each change adds a lot of overhead. For example:

import numpy as np
import rerun as rr  # pip install rerun-sdk

rr.init("rerun_example_mesh3d_partial_updates", spawn=True)

vertex_positions = np.array([[-1.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], dtype=np.float32)

# Log the initial state of our triangle
rr.set_time_sequence("frame", 0)
rr.log(
    "triangle",
    rr.Mesh3D(
        vertex_positions=vertex_positions,
        vertex_normals=[0.0, 0.0, 1.0],
        vertex_colors=[[255, 0, 0], [0, 255, 0], [0, 0, 255]],
    ),
)

# Only update its vertices' positions each frame
factors = np.abs(np.sin(np.arange(1, 300, dtype=np.float32) * 0.04))
for i, factor in enumerate(factors):
    rr.set_time_sequence("frame", i)
    rr.log("triangle", [rr.components.Position3DBatch(vertex_positions * factor)])

Interpret the same data in several ways using the component level API

The component level API gives you the ability to interpret the same data as multiple types. We do that by logging multiple indicator components, which tell the Rerun Viewer "hey, this entity should be interpreted as type X". In this example we interpret an entity as bott a colored triangle and as three colored points.

import rerun as rr

rr.init("rerun_example_manual_indicator", spawn=True)

# Specify both a Mesh3D and a Points3D indicator component so that
# the data is shown as both a 3D mesh _and_ a point cloud by default.
rr.log(
    "points_and_mesh",
    [
        rr.Points3D.indicator(),
        rr.Mesh3D.indicator(),
        rr.components.Position3DBatch([[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [0.0, 10.0, 0.0]]),
        rr.components.ColorBatch([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]),
        rr.components.RadiusBatch([1.0]),
    ],
)

Using your own types with Rerun

The new type oriented API also makes logging data from your own objects more ergonomic. For example, you might have your very own point cloud class.

@dataclass
class LabeledPoints:
    points: np.ndarray
    labels: List[str])

All you need to do is implement as_component_batches() and you can pass them directly to rr.log. The simplest possible way is to use the matching Rerun archetype’s as_component_batches method like below but you can also get as fancy as you like with custom components and archetypes. Check out the guide on using Rerun with custom data for more details.

@dataclass
class LabeledPoints:
    points: np.ndarray
    labels: List[str])

    def as_component_batches(self) -> Iterable[rr.ComponentBatch]:
        return rr.Points3D(positions=self.points, labels=self.labels).as_component_batches()
...
# Somewhere deep in my code
classified = my_points_classifier(...)  # type: LabeledPoints
rr.log("points/classified", classified)

The main takeaway here is that with 0.9 and the new type oriented API, it becomes a lot easier to use Rerun with your own data types.

How it paves the way for the future

Although this release brings a lot of great updates, it's perhaps the future features it paves the way for that are the most exciting.

C++ SDK

Getting data from C++ environments into Rerun was the motivating factor behind the move to our own code generation framework. A large amount of production systems in robotics, computer vision and gaming are built in C++ and we're incredibly excited to soon bring Rerun to all those developers.

Building visualizations inline

Rerun started out making the hard case, where you stream data out of multiple processes and visualize it live, easy. The downside so far has been that in simpler cases, like in jupyter notebooks, using Rerun is more convoluted than it should be.

Even when time is not a factor and you have all your data right there and just want to draw it, you currently have to go through the indirection of logging it first.

The new APIs introduced in 0.9, pave the way for a clean way of just drawing data inline without logging. We'll start rolling that out together with the ability to control layout and visualization options from the SDK later in the year once C++ has landed.

Generating your own Rerun SDK extensions

Our new code generation framework is still a bit immature, but it's been a design goal from the start to let users use it to generate their own stand alone extensions to the Rerun SDK. We hope once it's had time to mature, it will be useful to both teams with their own proprietary data formats and for other projects that want to make interfacing with Rerun as easy as possible for their users.

Let us know what you think

We're incredibly excited to hear what you think about these changes. Join us on Github or Discord and let us know how 0.9 works for you and what you'd like to see in the future.

If you're an existing Rerun user and have any questions or need any help migrating to the new APIs, send us a ping on Discord or elsewhere and we'll be happy to get on a call and help you out.