The Anatomy of a Driving Scenario

15 min readFeb 25, 2021

By Arpan Chakraborty, Learned Driving @ Ridecell.

TL;DR Driving scenarios provide a robust framework for testing and validation of automated driving systems. This article introduces the fundamental elements and spatiotemporal relationships that characterize scenarios, and incrementally builds an expressive language for representing them.

Driving is commonly described as the controlled operation of a motor vehicle, typically for transportation of people or goods, but often enjoyed as an activity in itself. It requires mastering several skills such as following a lane, merging into traffic, stopping at a red light, yielding to pedestrians, etc. Each of these skills is related to some situation or “scenario” we find ourselves in while on the road, and which requires us to react in an appropriate manner. Identifying and characterizing these driving scenarios is important for several processes that enable the development and safe operation of autonomous vehicles.

When faced with a daunting technical challenge, we often dive into finding a solution without fully considering the problem space. At first glance, autonomous driving seems like a classic robotics problem, with perception, planning and controls being the key ingredients. While it is true that these functions form the foundation for any solution to this problem, you quickly realize that the complexity of the problem arises from the variability in the conditions under which your system must operate. If you don’t spend the time to map out the different scenarios you may encounter on the road, and deliberately test your system against those, you run the risk of developing a solution that only works in a narrow set of situations.

The long tail of autonomous driving scenarios, illustrated by the skewed prevalence of different types of ego-vehicle interactions

Figuring out the broader universe of possible scenarios helps us take a more holistic approach — understand how big and diverse the problem is, carve out a meaningful subset of the problem that we are prepared to address, and then design an appropriate solution that meets the needs. This also helps us figure out what capabilities we need to develop within the desired system and plan accordingly.

Identifying the scenarios applicable to your chosen domain is the first step in this process. For example, if you’re building an autonomous shuttle to operate within a closed campus, the set of relevant scenarios would be different from those you might expect on public roads. You can then organize the scenarios based on factors that are important to you, for instance lane-driving vs. intersections, yield vs. stop signs, lead/follow vs. cut-in etc. This way of cataloging scenarios might be sufficient for certain use cases, such as testing and validating your system against a verification plan. But you can do much more by decomposing each scenario into its constituent elements.

Representing Scenario Elements

Prior work suggests organizing the elements of a scenario into different “layers” such as road or map information, actors in the scene, weather, and other environmental conditions. A distinction is also typically made between generic descriptions or “logical” scenarios, and more “concrete” variants that represent specific instances. These are some guiding principles that we adopt in our approach to scenario modeling as well. But our focus is on a different aspect — how do we best represent the relationships between the different elements?

The answer may depend on what your end goal is, for instance, you may want to reproduce a scenario exactly as it played out (what is commonly referred to as “resimulation”), or you may want to obtain different potential variations of a scenario (some call this “scenario fuzzing”). Or maybe you just want to cluster related scenarios together to understand the problem space. Whatever your present use case may be, we make the assumption that a scenario representation that can meet all of these needs is likely going to be more expressive, and robust to future uses as well.

A common scenario representation that addresses different use cases, including extraction, resimulation and fuzzing, will be more expressive and robust.

This line of thinking leads us to an axiomatic approach to scenario modeling, where we try to characterize each scenario in terms of its salient features, without overfitting to a specific end application. Let’s start with a simple example — a single vehicle driving on a stretch of road on a sunny day.

A vehicle driving on a road, on a sunny day

Consider the following:

What are the different elements involved in this scenario? Well, there’s the vehicle (say, V1), and there’s a road (R1).
How are these elements related? Clearly, vehicle V1 is positioned on the road R1.
And what is going on in this scenario? V1 is driving (on R1), at some speed (say, 20 m/s).

If we were to come up with a minimal description of this scenario, it might be sufficient to encode the following information.

- weather: sunny
- road: R1
- vehicle: V1
    position: R1
    actions:
    - drive:
        road: R1
        speed: 20

Spatial and Topological Relationships

What if we had two vehicles driving on a road, one following the other?

This time, let’s iterate on our format a little bit, and create different sections for global/environmental factors (e.g. weather), map elements (e.g. roads, lanes), and actors (e.g. vehicles, pedestrians), along with their initial positions and actions. Also, now that we have two vehicles on the same road, we need an additional measure to distinguish between their positions along the roadway. This is commonly known as longitudinal position or “station” (s-coordinate).

environment:
- weather: sunnymap:
- road: R1actors:
- vehicle: V1
    position:
      reference: R1
      station: S1
    actions:
    - drive:
        road: R1
        speed: 20
- vehicle: V2
    position:
      reference: R1
      station: S2
    actions:
    - drive:
        road: R1
        speed: 20

What we haven’t captured here is the spatial relationship between the positions of the two vehicles, namely the fact that V2 (S2) is ahead of V1 (S1). We could express that in a separate section as follows:

...
relationships:
- S2 > S1

Alternately, we could rewrite the position of V2 in terms of S1 (e.g. S2 = S1 + 10). An even cleaner approach might be to express the position of one actor relative to another actor directly.

...
- vehicle: V2
    position:
      reference: V1
      station: +10
    actions:
    - drive:
        road: R1
        speed: 20

This essentially says that whatever the position of V1 might be, V2 should be located 10 units (meters) in front of V1, along the same road element.

Now, let’s consider a common “cut-in” scenario, where one vehicle is changing lanes in front of another.

A vehicle cutting in front of another one

There is a clear spatial relationship between the two vehicles that we need to capture, for instance, we could introduce a lateral offset (l-coordinate) to measure how far away one vehicle is from another. But in this case, it might be more meaningful to encode the topological relationship between the two lanes in which the vehicles are initially driving. Let’s assume we call these lanes R1_1 and R1_2. This will make it easier to generalize the scenario to different situations, e.g. with a curved roadway or lanes that are wider/narrower.

Topological relationships allow us to take advantage of the road structure and generalize to different situations with varying spatial geometry.

Using a graph to specify this road structure seems like a natural choice.

Topological structure of map elements, expressed as a graph

We can express this graph as a collection of nodes and edges in the map section. The notation “X =type=> Y” is just syntactic sugar for expressing an edge with a certain type label. At this point, we should also make a minor change from the more generic “drive” action to “lane_follow” (followed by “lane_change” for V2).

environment:
- weather: sunnymap:
- nodes:
  - road: R1
  - lane: R1_1
  - lane: R1_2
- edges:
  - R1 =contains=> R1_1
  - R1 =contains=> R1_2
  - R1_1 =left=> R1_2
  - R1_2 =right=> R1_1actors:
- vehicle: V1
    position:
      reference: R1_2
      station: S1
    actions:
    - lane_follow:
        reference: R1_2
        speed: 20
- vehicle: V2
    position:
      reference: R1_1
      station: S2
    actions:
    - lane_follow:
        reference: R1_1
        speed: 25
    - lane_change:
        target: R1_2
        station: S3

As before, S3 could be rewritten as “S1 + some constant” or the target for lane_change could be made relative to V1 (that would ensure V2 tries to cut-in in front of V1). Note that in this case the initial lane-following speed for V2 is intentionally kept slightly higher (25) than V1 (20). We could make this a relative quantity as well.

Finally, let’s consider a more complex scenario, where two vehicles are approaching an intersection controlled by stop signs, and need to negotiate with each other in order to cross safely. Vehicle V1 is attempting to turn left, while V2 is following along straight. Oh, and it is starting to get cloudy now!

Two vehicles negotiating an intersection controlled by stop signs

In this case, the relevant map elements include a junction (J1) and stop signs (SS1, SS2). We also introduce a road “section” as an intermediate container for lane elements, for instance, road R1 is made up of 3 sections (R1_1, R1_2 and R1_3), with each section containing a lane element or “lanelet” (R1_1_1, R1_2_1, R1_3_1). The connections between these elements can be numerous, and it’s easy to make a mistake in building out such a network by hand (partially specified below). Fortunately, this hierarchical structure is adapted from established road map representation formats like OpenDRIVE and Lanelet2, and can be either deduced from simpler representations, or directly extracted from existing maps.

environment:
- weather: cloudyactors:
- vehicle: V1
    position:
      reference: R1_1_1
      station: S1
    actions:
    - lane_follow:
        reference: R1_1_1
        speed: 15
    - turn_left:
        junction: J1
        exit: R2_3_1
- vehicle: V2
    position:
      reference: R2_1_1
      station: S2
    actions:
    - lane_follow:
        reference: R2_1_1
        speed: 15
    - go_straight:
        junction: J1
        exit: R2_3_1map:
- nodes:
  - road: R1
  - section: R1_1
  - ...
  - lanelet: R1_1_1
  - ...
  - road: R2
  - section: R2_1
  - ...
  - lanelet: R2_1_1
  - ...
  - junction: J1
  - stop_sign: SS1
  - stop_sign: SS2
- edges:
  - R1 =contains=> R1_1
  - ...
  - R2 =contains=> R2_1
  - ...
  - R1 =crosses=> R2
  - R1_1 =enters=> J1
  - ...
  - R1_3 =exits=> J1
  - ...
  - SS1 =controls=> R1_1_1
  - SS2 =controls=> R2_1_1

As you can imagine, different instantiations of this scenario might play out with entirely different outcomes, especially based on the starting position of each actor, their approach speed, aggressiveness, etc. All of these are parameters that can further enrich the scenario description.

Temporal Ordering of Events

Now consider this — what if you wanted to capture a specific ordering of events, for instance, a vehicle (V1) is about to reach a crosswalk (CW1) when a waiting pedestrian (P1) decides to start walking.

A pedestrian about to cross in front of an approaching vehicle

Note that this is different from the case where the vehicle has already passed the crosswalk, and then the pedestrian begins walking.

A pedestrian about to cross after a vehicle has passed

There are several ways of expressing such temporal relationships, for example using a wait condition before the pedestrian’s action, based on some distance to the vehicle. This is helpful when executing scenarios in simulation, but doesn’t really help in other use cases, such as extracting scenarios from real-world data. Thus, we adopt a more generic approach based on Allen’s interval algebra which expresses temporal relationships between events in the same way we might consider spans along any one-dimensional domain.

In the present scenario, we begin by labeling every identifiable occurrence within the scenario as an “event”, with a start and end time that defines its interval (instantaneous events are assumed to have the same start and end time). This includes the vehicle’s lane_follow action, the pedestrian’s walk action, and the vehicle going over the crosswalk (which need not be an action per se, but is still an observable event).

Events on a timeline help capture temporal relationships

Temporal relationships can be used as constraints when looking for matching scenarios, or as actions in an execution plan when simulating a scenario.

Each event may have some associated metadata as well, such as the identifier of the map element that is most relevant to the event (R1, CW1, etc.). A map element can be used to ground two events, for instance in this case we want to say that the crosswalk in the vehicle and pedestrian events are the same, therefore we address it with the same identifier (CW1).

We can then specify the desired temporal relationships as conditions:

...actors:
- vehicle: V1
    position:
      reference: R1
      station: S1
    actions:
    - lane_follow:
        id: E1
        reference: R1
        speed: 20
- pedestrian: P1
    position:
      reference: CW1
      station: S2
    actions:
    - walk:
        id: E2
        reference: CW1
        speed: 2events:
- crosswalk:
    id: E3
    actor: V1
    reference: CW1conditions:
- E2 =before=> E3
- E3 =during=> E1

Note that in the above conditions, “X before Y” is interpreted to mean that X must start before Y starts; we intentionally allow some overlap between the events (e.g. the vehicle may begin to cross as soon as it is safe, even before the pedestrian is entirely clear of the crosswalk).

Ridecell Scenario Language (RSL)

Let us recap what we have discussed so far. We started by identifying the different components of a driving scenario, such as map elements, environmental factors, actors, and their actions. These components can be written down in separate sections of a structured description of a scenario. We then went on to develop our representation in order to capture different kinds of relationships between these components:

Spatial relationships encode the positions of different components in relation to each other, typically using a numeric distance value.
Topological relationships help capture logical relationships between elements, especially map elements, such as adjacent lanes.
Temporal relationships specify ordering of events that are characteristic to a scenario.

This is the vocabulary we need for expressing driving scenarios, and forms the basis for Ridecell Scenario Language (RSL). We are using RSL within our team for scenario modeling, iterating on the language specification itself, and building the tools and libraries around it to support some key features that we have identified, including map-relative coordinates, parameter distributions, and actor behavior trees. In upcoming articles, we will share more details regarding the RSL format and specific use cases that it can enable.

RSL uses map-relative coordinates, parameter distributions, and actor behavior trees to provide a rich vocabulary for scenario modeling.

Those familiar with other scenario description formats may have recognized that this is not structurally very different, and yet includes certain key elements that make it much more flexible and extensible. This is on purpose, as we want to retain the ability to easily convert to other formats as needed, in order to interact with systems that have specific dependencies. For instance, we can export concrete RSL scenarios into OpenSCENARIO v1.0, which is used by many existing simulators. In fact, we also chose YAML as a generic file format to write out RSL templates, as opposed to a domain-specific language, so that there are no barriers to parsing or interpretation.

Scenario Coverage

One application of RSL is to serve as a template against which we can compare real-world driving segments, and find matching scenarios. A matched scenario essentially provides specific values for all the variables specified in the template, while having a compatible structure. With a library of such templates, you can begin to capture your entire operational domain. The next logical question is, how do you quantify scenario coverage? That is, how frequently do you observe the various scenarios in real-world data. An obvious way is to count the number of occurrences of each individual scenario template. But that can easily become overwhelming as the total number of possible scenarios can explode based on the different factors and their combinations you need to consider.

Expressing scenario coverage in terms of different scenario factors allows us to analyze different slices and dimensions without being overwhelmed by the vast amount of data collected from test or production vehicles.

Instead, we propose quantifying scenario coverage in terms of the factors themselves, so that you can organize their combinations in a meaningful structure and study any “slice” that is pertinent to you. As an example, two important scenario factors to consider are map elements and objects. Map elements include lanes, junctions, crosswalks, stop signs, etc. where a scenario plays out; objects can be classified as per your perception system’s capabilities, such as vehicle, bicycle, pedestrian, etc. If we focus on these two factors or “dimensions” of the scenario space, we can visualize scenario coverage using a two-dimensional histogram or “heatmap”. The color of each cell represents the count or frequency of occurrence of that particular combination, and the margins give you the distribution of each dimension.

Heatmap of observed scenario factor combinations, used to express scenario coverage

Being able to monitor these distributions over time can have a number of benefits. If you are collecting data from test fleets, they can inform you of gaps in your coverage, and help direct your data collection efforts to be more meaningful. If you’re building a model of a crash or near-crash driving events, these distributions can serve as prior probabilities of scenario occurrence. This scenario-based approach can also help you decide which problems in autonomous driving you should focus your efforts on in order to have the greatest impact.

Metrics and Continuous Distributions

Beyond scenario occurrence, the same structural decomposition of scenarios can also help you look at metrics from your system through different lenses. Minimum distance to other actors in a scenario is an important metric to keep an eye on. Here we visualize minimum distance (in meters) with respect to actor type as well as the ego vehicle’s dynamic state. Any unexpected occurrences, such as low minimum distance to pedestrians and bicycles during hard acceleration/deceleration phases could be inspected further, and the corresponding motion control algorithms/parameters improved to avoid such situations.

Minimum distance metric, for different combinations of actor types and ego dynamics

When we think about the distribution and variability of scenarios, we must take into account such continuous variables in addition to discrete categorizations. Speed is one such variable that can significantly change the nature or outcome of a scenario. Let us take the example of a cut-in scenario, find all the examples of this template from real-world driving logs, and measure the average speed of our ego vehicle when a cut-in occurs.

Cut-in scenario instances

If we collect all the relative speed samples from these scenarios, then we can come up with a distribution that reflects real-world observations. For example, from our own test vehicles we have observed this to have a roughly Normal or Gaussian distribution with a mean of 11.2 m/s (roughly 25 mph) and a standard deviation of 3.8. Such a distribution can be encoded in RSL very easily.

...actors:
- vehicle: ego
    position:
      reference: R1_2
      station: S1
    actions:
    - lane_follow:
        reference: R1_2
        speed: Gaussian(11.2, 3.8)
...

Gaussian distribution of observed ego velocity during cut-in scenarios

This doesn’t have much effect when extracting scenarios (although you could come up with a likelihood score to quantify how well an observation matches the distribution), but it plays an important role when generating different variations of the same scenario. It forms the basis for deriving realistic models of actor behavior. For example, here is a concrete scenario generated from the distribution:

...actors:
- vehicle: ego
    position:
      reference: R1_2
      station: S1
    actions:
    - lane_follow:
        reference: R1_2
        speed: 8.6
...

Sampling from such distributions, and playing out those resulting scenarios in simulation allows you to test your system’s performance more robustly. At the same time, it avoids generating variations that are not very likely based on observed data or entirely implausible. You could also invert this sampling distribution to produce more instances of rare scenarios (still grounded in real-world data). This is especially useful if you are training systems that use machine learning, since they are notorious for overfitting to the most common occurrences.

Continuous distributions computed from real-world data help build realistic actor models, find rare scenarios, utilize importance sampling and prevent overfitting.

In summary, what we have developed in this article is an abstract representation for driving scenarios and a set of processes that can be used to derive value out of it. This scenario-based approach to data collection, training, testing and validation can help organize and decompose the development of automated driving systems, which might otherwise seem like a daunting task. Nemo is currently being developed to operationalize these ideas and to address different workflows.

The applications of this approach extend to other domains as well, such as risk modeling for car insurance, fleet monitoring & optimization, anomalous event detection, drive data management, and beyond. We look forward to explaining them in more detail in upcoming articles, diving into case studies with our industry partners.

For now, we leave you with a video collage of interesting scenarios we have extracted from our data collection logs using Nemo’s scenario extraction engine!

Interesting scenarios extracted from driving logs, played back at 3x real-time

Icon credit: Voyage Open Autonomous Safety (OAS) testing toolkit.