DemosMetropolis - Multi-camera Tracking

Metropolis - Multi-camera Tracking

Logo for undefined
Application typeMulti-camera TrackingDomainVision AIUsageDemoTerms of useTerms of use

Demo Overview

Try out the Multi-camera Tracking application leveraging NVIDIA Metropolis Microservices. This application will track and associate a set number of objects across cameras and maintain a unique ID for the object. Each object is tracked through visual embeddings/appearance, rather than any personal biometric information. Privacy is fully maintained. The demo will generate output files for each unique object it identifies in the clip. The output clip will highlight just the object that was selected with a bounding box and a unique identification number.

The application tracks objects across cameras using object re-identification based on feature embedding and spatio-temporal association, comparison of object behavior and tracklets across space and time.

It’s important to note that the application processes the input videos all at once and creates an output set of videos and tracking information. It is not a real-time app. A mosaic output video is created for each of the identified/tracked objects is for demonstration purpose only, showing how the object moved across space and camera views.

How to run the demo

Step 1: At the top right of this page, click “Launch App”.

Step 2: Choose one of the two example scenarios - a retail or warehouse environment. Both scenes are synthetically generated using NVIDIA Omniverse.

Step 3: Click “Run App” to run the multi-camera tracking app on the input videos. All four input videos will be processed.

Once processed, the application will generate a set of output video clips for viewing.


The input to the multi-camera tracking application is a set of 4 videos that users can select from either retail or warehouse settings. These are captured from 4 cameras placed at different heights and angles. Users can view the input video before sending it for processing.


The output will show some metadata like the number of unique objects identified in all the 4 videos along with a selection of output videos with their respective tracking IDs.

The output of the videos will be a 2x2 tile of input videos with the bounding box and unique tracking ID of the object highlighted. The tracking ID will remain constant as the object moves from one camera to another. Each input video is synchronized, with the synchronized frame number shown on the upper left corner.

Output video

In addition, users can also view the floorplan of the building with the approximate camera placements within the building.

Technologies used

This application uses a combination of existing NVIDIA technologies like DeepStream and Pre-trained models along with newer (soon to be released) IPs for re-identification and tracking. Sign up to be notified when the application and microservices are released

Some of the technologies used include:


This is a standalone demo and not tied to an actual multi-camera tracking application.