Introducing Nemo—Search and Analytics Platform for Automotive Sensor Data

Ridecell - AI and Data
6 min readDec 19, 2020

Teams working on autonomous driving collect Petabytes of sensor data over thousands of miles. However, finding valuable events and traffic scenarios from these driving logs remains an unsolved problem.

To address this problem, we launched Nemo — our new Search and Analytics Platform. Nemo (https://nemosearch.ai) is an automated metadata tagging and scenario extraction tool that helps find useful events and traffic scenarios, saving up to 30% of developers’ time, reducing data storage costs by 80–85%, and providing insights to the testing and validation teams on product readiness.

Nemo results from years of research done by Auro, Ridecell’s Autonomous Driving Division in automated driving technologies and L4 autonomous vehicles (AV). Over time, we have developed expertise in processing AV and ADAS (Advanced Driver Assist Systems) sensor data (coming from a camera, lidar, or radar in addition to CAN and GPS) and deriving contextual scene understanding from it.

Nemo packages the expertise and underlying technology to help the new age automotive industry with data search, filtering, export, and analysis capabilities.

Data explosion in the auto industry

At least one ADAS feature is available on 92.7% of new vehicles produced in the U.S. as of May 2018 (link), because of which cars are now generating at least three orders of magnitude more data than ever before. Moreover, the level of autonomy is growing exponentially in every new production model. There will be 100M+ semi-autonomous (L2+) and autonomous vehicles by 2025. A single autonomous driving vehicle can generate up to 4 TB of data per hour, or 20 PB per year. A fleet of 5000 cars would generate 100 EB/year.

Autonomy enabled vehicles will be the largest data collecting agents of the real world, propelling the AV and ADAS companies to dwarf the video streaming companies (Netflix, YouTube) in the amount of worldwide network bandwidth consumed.

Companies working on automated driving technologies have fleets of R&D vehicles, prototype vehicles, Field Operational Test (FOT) vehicles, and production vehicles, each spread over multiple geographies collecting millions of miles of driving logs generating Petabytes of data per month.

But only 5–10% of the data collected is useful for further product development, algorithm training, compliance, and system testing. The rest is just boring miles. The question is — which 5–10%? That’s where Nemo comes in. It finds valuable events and traffic scenarios from Petabytes of driving logs.

Challenges of data management in the ADAS and AV industry

The industry lacks an intelligent and automated way of finding interest and traffic scenarios from driving logs. At best, teams use semi-automated approaches or tags created by safety drivers to capture an event of interest. However, these methods do not scale as the fleet size grows and are extremely limited in their capabilities to describe a scenario.

Using databases with SQL like queries and off-the-shelf analytics frameworks like Spark, one can only search and filter basic events such as speed/acceleration, braking power, and steering rate values. They cannot search and extract scenarios based on metadata and contextual description of a traffic scenario (for example, searching for all scenarios where a truck made an aggressive cut-in at a highway exit).

Limited tool capabilities, coupled with exploding data in the automated driving industry, has resulted in multiple points of engineering inefficiencies and money drain. Nemo aims to solve that by providing its contextual scene understanding AI and scenario extraction engine along with domain-specific micro-services to search, catalog, and export scenarios.

Nemo offers the following benefits to the AV and ADAS development teams over traditional approaches:

  • Provides developers instant access to the right data: A manual or semi-automated process of reviewing driving logs can consume up to 30% of engineers’ time, making it a prolonged and expensive process. With Nemo, they can zero-in to the right data they need instantly using a simple GUI. Think Google Search for autonomous driving sensor data.
  • Reduces data storage cost by up to 85%: Without an automated tool to separate valuable events and traffic scenarios from boring miles, companies end up storing everything in the hot storage (which is 10x costlier than the cold storage), costing them $10s of millions in just data storage costs. Nemo can automate the data tiering process based on pre-defined scenario descriptions, reducing storage costs by up to 85%.
  • Provides scenario distribution and coverage analytics: Companies collect millions of miles of driving data for training, testing, and compliance purposes but have no insights on scenario distribution, scenario coverage, and biases in the training/testing datasets. Nemo provides detailed analysis such as heat maps, time-series plots, and frequency plots that offer more in-depth insights into product development and testing.

How it works

Within a typical Automotive data lake architecture, Nemo sits at the pre-processing level right after the data-ingestion. It’s contextual scene understanding AI scans the sensor data (coming from Lidars, camera, radars, and CAN bus) to generate and tag metadata (shown in Figure 1 below) across objects in the scene, road infrastructure, traffic interactions between different objects, and the vehicle’s driving behavior.

Figure 1: Metadata tags across the five layers used to describe a scenario — objects in the scene, road infrastructure, traffic interactions, vehicle’s driving behavior and environment

Developers can search/filter for scenarios using any permutation or combination of these metadata tags, including sub-properties such as distance and number associated with the metadata. Here are some example queries:

  • Give me “all scenarios where a truck made an aggressive cut-in at a highway exit.”
  • Give me “all scenarios where a pedestrian is jaywalking at a roundabout during a night-time.”
  • Give me “all scenarios where a bicyclist is within 20 meters on a downhill road”.

Figure 2 shows a more detailed set of events and scenarios extracted by Nemo.

Figure 2: Example scenarios extracted by Nemo from driving logs

Figures 3 and 4 below show an intuitive graphical user interface to perform such searches. A more powerful interface based on Python Notebook is also available that enables end-users to write their custom analyzers and detailed scenario descriptions.

Figure 3: Querying “Bicycle encounters at junctions”
Figure 4: Querying “Interaction with other vehicles crossing in the ego vehicle path at junctions”

Nemo has been built keeping in mind the needs of AV and ADAS organizations, including ML researchers (who are working on perception, prediction, motion planning), data infrastructure teams, and testing/validation teams.

We have partnered with industry-leading cloud and on-premise data infrastructure companies to provide Nemo to their customers, and are working with leading OEMs, Tier 1s, and insurance companies to solve their data search and curation needs.

In the subsequent blog posts, we will cover various other aspects of Nemo, including:

  • Automated Metadata tagging and underlying scenario extraction engine
  • End-user interfaces — GUI and Python Notebook
  • Scenario description language — RSL
  • Export to simulation, Scenario Generation, and Scenario Variation
  • Deployment on cloud, on-premise, and hybrid IT infrastructure
  • Micro-services for data tiering, annotation, and insurance
  • Using Nemo to identify corner cases, false positives, and unsafe driving situations

In the meantime, please feel free to drop by https://nemosearch.ai to request a custom demo. We would love to discuss your sensor data strategy and help you along the way.

--

--

Ridecell - AI and Data

Exploring the intersection of AI and Data in the mobility world. Brought to you by the team at Ridecell.