Physical AI Glossary

The language of physical AI is still being written. These 24 terms define the core concepts, infrastructure, and challenges at the frontier of training robots and autonomous systems — from simulation-ready assets to real-world deployment.

Data Flywheel

A self-reinforcing loop where more real-world deployment generates more data, which improves model performance, which drives more deployment. For physical AI teams, building a data flywheel requires a scalable infrastructure to ingest, process, and version spatial data continuously — not a one-time data collection effort.

Data Pipeline (Robotics)

The end-to-end system that ingests raw source material — images, scans, videos, CAD files — and transforms it into structured, annotated, simulation-ready training data. In physical AI, the data pipeline is not a solved problem: most teams build it painfully in-house, and its quality directly determines how fast and how well their models train.

Domain Randomization

A training technique that varies environmental parameters — lighting, textures, object positions, physics properties — during simulation so the model learns to generalize rather than overfit to a single scene. It only works if your underlying 3D assets are physics-accurate and properly tagged; randomizing low-quality assets produces low-quality variance.

Edge Case Generation

The deliberate creation of rare, difficult, or failure-prone scenarios in simulation that a robot is unlikely to encounter frequently in the real world but must handle correctly. At scale, edge case generation is only feasible in simulation — and only effective when the simulation environment has sufficient physical realism to make those cases meaningful.

Embodied AI

An AI system that perceives and acts within a physical environment, rather than operating purely on text or images. Robots, autonomous vehicles, and drones are all embodied AI systems — and unlike language models, they cannot be trained on internet data. They require purpose-built spatial datasets that don't yet exist at scale.

Foundation Model (Physical AI)

A large model pre-trained on diverse physical world data — 3D scenes, sensor streams, simulation outputs — that can be fine-tuned for specific robotics tasks. Training a physical AI foundation model requires orders of magnitude more structured spatial data than most teams can produce in-house, which is why data infrastructure has become the central bottleneck.

Ground Truth Data

Verified, labeled data that represents the correct answer a model is being trained to predict. In physical AI, ground truth — precise depth values, object poses, semantic labels, physics properties — is nearly impossible to capture accurately from real-world sensors alone. It has to be generated in simulation, which means the quality of your 3D data infrastructure directly determines the quality of your ground truth.

Isaac Sim / MuJoCo

The two dominant physics simulators used to train physical AI systems. NVIDIA Isaac Sim is the standard for robotics and autonomous systems at scale, built on USD and tightly integrated with NVIDIA's GPU stack. MuJoCo (acquired by DeepMind) is widely used in academic and research settings for manipulation and locomotion tasks. Both require simulation-ready assets with accurate physics properties to produce training data that transfers to the real world.

Large World Model (LWM)

A foundation model trained to simulate or predict the behavior of complex physical environments at scale. Building and fine-tuning large world models depends entirely on having access to diverse, physics-accurate 3D environments — the kind that can’t be sourced from generic asset libraries or assembled manually.

Manipulation Task

A task in which a robot interacts with objects in its environment — grasping, placing, assembling, or sorting. Manipulation tasks are among the hardest to train because they require extremely high-fidelity data: accurate object geometry, realistic material properties, and correct contact physics. Low-quality assets produce models that work in simulation but fail on contact with real objects.

Navigation Task

A task in which a robot must move through an environment to reach a goal, avoid obstacles, and plan a path. Effective navigation training requires large volumes of spatially diverse, physically accurate environments — not a handful of hand-crafted scenes, but thousands of varied layouts generated at scale.

Physics-Based Rendering

A rendering approach that simulates how light physically interacts with surfaces — reflection, refraction, scattering — to produce photorealistic images. In training data generation, physics-based rendering is what closes the visual gap between simulation and the real world, making it possible for models trained on synthetic data to transfer to physical hardware.

Physics Tagging

The process of annotating 3D assets with physical properties — mass, friction, inertia, collision geometry, material behavior — so they behave realistically when loaded into a physics simulator. Without physics tagging, assets are visually usable but physically meaningless. It is one of the most labor-intensive steps in producing simulation-ready training data, and the step most often skipped by generic 3D asset pipelines.

Point Cloud

A set of data points in 3D space, typically captured by LiDAR or depth sensors, representing the surface geometry of objects or environments. Point clouds are a core format in robotics perception pipelines, and generating accurate synthetic point clouds — aligned with ground truth segmentation and depth — requires a simulation environment built for training, not just visualization.

Scene Graph

A hierarchical data structure that represents the objects, relationships, and spatial arrangement within a 3D environment. Scene graphs are what allow AI models and simulators to reason about a scene semantically — not just render it. Producing training data with valid scene graphs at scale requires a structured data pipeline, not manual scene assembly.

Semantic Segmentation (3D)

The process of classifying every point in a 3D scene — or every pixel in a rendered image — with a semantic label such as “floor,” “wall,” or “robot arm.” In physical AI training data, semantic segmentation is ground truth that must be generated automatically and at scale — it cannot be hand-labeled for the volume of data required to train robust perception models.

Sim-to-Real Transfer

The ability of a model trained in simulation to perform correctly when deployed on a physical robot in the real world. Sim-to-real gap is the central challenge of robotics training — and it is almost entirely a data quality problem. The fidelity of your simulation assets, physics properties, and rendering determines how much of your model’s performance survives contact with the real world.

Simulation Engine

The software environment in which physical AI models are trained — simulating physics, rendering synthetic sensor data, and enabling robots to interact with virtual environments at scale. Leading simulation engines include NVIDIA Isaac Sim, MuJoCo, Gazebo, and Genesis. The quality of training data a simulation engine can produce is bounded entirely by the quality of the 3D assets and physics properties loaded into it.

Simulation-Ready Asset

A 3D object or environment that has been processed, validated, and annotated to work correctly inside a physics simulator — with clean geometry, accurate physics properties, correct material definitions, and format compatibility. The gap between a raw 3D asset and a simulation-ready one is rarely visible but always consequential: it typically represents dozens of hours of manual cleanup per object. Most teams building physical AI either build this pipeline themselves at significant cost, or skip it and accept the sim-to-real gap that follows.

Spatial Data

Data that encodes information about physical space — 3D geometry, position, scale, orientation, and the relationships between objects. Spatial data is the foundational input for physical AI training, and the field’s most acute bottleneck: unlike text or images, high-quality spatial data cannot be scraped from the internet. It has to be produced.

Synthetic Data

Training data generated computationally rather than collected from the real world. In physical AI, synthetic data produced in simulation can be generated at scale, automatically annotated with ground truth, and varied programmatically — overcoming the fundamental limits of real-world data collection. Its quality ceiling is determined entirely by the realism of the simulation pipeline that produces it.

Training Loop

The iterative cycle in which a model processes training data, computes a loss against ground truth, and updates its parameters to improve performance. In physical AI, the training loop often runs simulation in the loop — meaning the data pipeline and the training pipeline are tightly coupled, and bottlenecks in data quality or throughput directly cap how fast a model can improve.

USD / OpenUSD

Universal Scene Description, an open-source 3D file format developed by Pixar and now the standard interchange format for physical AI and robotics simulation. USD supports complex scene hierarchies, physics annotations, and material definitions — and is natively supported by NVIDIA Isaac Sim, the dominant simulation platform for robotics training. If your assets aren’t in USD or easily convertible, they aren’t pipeline-ready.

World Model

An AI model that learns an internal representation of how the physical world works — predicting how objects move, interact, and respond to actions over time. World models are increasingly central to physical AI research because they allow robots to plan and reason about consequences before acting. Training them requires massive, diverse, physics-accurate 3D datasets that reflect real-world complexity.