文件结构(tree)

isaaclab模板生成器一键生成的文件基本结构(direct型):

isaac_lab_tutorial/
├── scripts/
├── source/
│   └── isaac_lab_tutorial/
│       ├── config/
│       │   └── extension.toml
│       ├── docs/
│       ├── isaac_lab_tutorial/
│       │   └── tasks/
│       │       └── direct/
│       │           └── isaac_lab_tutorial/
│       │               ├── agents/
│       │               ├── __init__.py
│       │               ├── isaac_lab_tutorial_env_cfg.py
│       │               └── isaac_lab_tutorial_env.py
│       ├── pyproject.toml
│       └── setup.py
├── LICENSE
└── README.md

云深处lite3的项目结构,不同点在于多了一些工具函数、在tasks/manager_based/locomotion/velocity 下 本来是agent文件夹、env_cfg.py环境、奖励等配置文件、env.py文件,变成了config/quadruped/deeprobotics_lite3/agents、mdp文件夹(自定义奖励、观测、动作等接口)、velocity_env_cfg.py (调用官方+mdp中自定义的奖励、观测、动作等接口,一个通用基类):

.
├── README.md                              
├── scripts                                   # 训练和推理的脚本(script中才是真正调用接口的地方,该程序需要自己编写)
│   ├── reinforcement_learning                
│   │   ├── rl_utils.py                       
│   │   └── rsl_rl                            
│   │       ├── cli_args.py                   # 命令行参数解析(训练/测试时的选项配置)
│   │       ├── play.py                       # 播放/测试已训练策略(可视化或评估性能)
│   │       └── train.py                      # !!!训练入口脚本(启动强化学习训练流程)
│   └── tools                                 # 工程工具脚本(一般都是官方就有提供的文件)
│       ├── check_robot.py                    
│       ├── clean_trash.py                   
│       ├── convert_mjcf.py               
│       ├── convert_urdf.py                   
│       └── list_envs.py                     
├── source                                    # 源代码根目录
│   └── rl_training                           # Python 包主目录(可通过 pip install -e 安装)
│       ├── rl_training                       # 实际 Python 模块(与包名一致)
│       │   ├── assets                        # 机器人/场景资源定义
│       │   │   ├── deeprobotics.py           # DeepRobotics 机器人资产定义(如关节、质量、摩擦等)
│       │   │   └── utils                     
│       │   │       └── usd_converter.py      # 将 URDF/MJCF 转换为 USD 格式的工具函数
│       │   ├── __init__.py                   #(模板自动生成)
│       │   ├── tasks                         # 任务定义模块
│       │   │   ├── __init__.py               #(模板自动生成)
│       │   │   ├── manager_based             # 基于 Manager 架构的任务
│       │   │   │   ├── __init__.py			  #(模板自动生成)
│       │   │   │   ├── locomotion            # 足式机器人运动控制任务
│       │   │   │   │   └── velocity          # 速度跟踪任务(目标:跟踪指定线速度/角速度)
│       │   │   │   │       ├── config        # 任务配置目录
│       │   │   │   │       │   ├── quadruped # 四足机器人子类
│       │   │   │   │       │   │   ├── deeprobotics_lite3  # DeepRobotics Lite3 平台
│       │   │   │   │       │   │   │   ├── agents         # 策略/算法配置
│       │   │   │   │       │   │   │   │   ├── __init__.py
│       │   │   │   │       │   │   │   │   └── rsl_rl_ppo_cfg.py  # PPO 超参数配置(学习率、折扣因子等)
│       │   │   │   │       │   │   │   ├── flat_env_cfg.py        # 平坦地形环境配置(场景、物理、传感器)
│       │   │   │   │       │   │   │   ├── __init__.py
│       │   │   │   │       │   │   │   └── rough_env_cfg.py       # 复杂/崎岖地形环境配置
│       │   │   │   │       │   │   ├── __init__.py                #(关键!在此文件中实现了环境在gym中的注册配置)
│       │   │   │   │       │   └── wheeled   # 轮式机器人子类
│       │   │   │   │       │       ├── deeprobotics_m20
│       │   │   │   │       │       │   ├── agents
│       │   │   │   │       │       │   │   ├── __init__.py
│       │   │   │   │       │       │   │   └── rsl_rl_ppo_cfg.py
│       │   │   │   │       │       │   ├── flat_env_cfg.py
│       │   │   │   │       │       │   ├── __init__.py
│       │   │   │   │       │       │   └── rough_env_cfg.py
│       │   │   │   │       │       ├── __init__.py
│       │   │   │   │       ├── __init__.py
│       │   │   │   │       ├── mdp           # 云深处团队针对机械狗自己定义的一些接口,作为官方mdp接口的补充来使用
│       │   │   │   │       │   ├── commands.py      # 自定义的一些command接口
│       │   │   │   │       │   ├── curriculums.py   # 自定义的一些curriculums接口
│       │   │   │   │       │   ├── events.py        # 自定义的一些events接口
│       │   │   │   │       │   ├── __init__.py
│       │   │   │   │       │   ├── observations.py   # 自定义的一些observations接口
│       │   │   │   │       │   └── rewards.py        # 自定义的一些rewards接口
│       │   │   │   │       └── velocity_env_cfg.py   # 通用速度任务环境配置基类(被lite3 / m20                                                                                            中的flat_env_cfg.py赋值权重并使用)
│       │   └── ui_extension_example.py       # Isaac Sim UI 插件示例(用于自定义界面控件)
│       └── setup.py                          # Python 包安装脚本(支持 editable install: pip install -e .)

官方结构图:
在这里插入图片描述

结构说明:

Project:整个项目的顶层结构。

Extensionsource/isaac_lab_tutorial/ 是扩展包的根目录,包含配置、文档和模块。

Modulesisaac_lab_tutorial/ 子目录是 Python 模块所在位置。

Task:位于 tasks/direct/isaac_lab_tutorial/ 下,代表具体可执行的任务模块,包含环境定义和代理逻辑。

概念介绍

USD文件

USD 不只是图形格式,而是一个可编程、可组合、带物理语义的场景描述系统,适用于机器人仿真中的高保真建模。

  1. USD 是一种层级化的场景描述格式
    • 场景由 Prims(Primitives)(原语) 构成,每个 Prim 是一个包含属性(如位置、旋转、材质、物理参数等)的节点。
    • 多个 Prims 以树状结构组织,形成完整的场景图(Scene Graph),因此你可以形象地称 USD 为 “Prim 树”。
  2. USD 支持复杂几何体
    • 不仅限于规则几何体(如立方体、球体),也支持通过 外部网格文件(如 STL、OBJ、FBX)引用 的复杂自定义模型。
    • 可视化模型通常来自这些外部网格,而物理/碰撞属性则通过 USD 属性或附加的 PhysX 元数据定义。
  3. USD 具有组合性(Composition)
    • 你可以将一个 USD 文件(Prim 树)拼接到另一个 USD 文件中,实现模块化构建(例如:手部 + 腕部 + 工具)。
    • 这种机制非常适合机器人仿真中的“部件组装”工作流。
  4. 在 Isaac Lab 中,USD 与 Spawner 紧密结合
    • spawner 负责将 USD 资产(含视觉、碰撞、物理属性)实例化到仿真环境中。
    • 它会解析 USD 中的关节、刚体、传感器等信息,并与 PhysX 引擎对接。

文件解析

script文件夹

一般用于存放训练脚本和推理脚本,也是整个项目中实际运行的入口,调用了整个项目其他文件夹中的创建的接口,实例化了整个项目其他文件夹中的类。下面以train.py为例,分析一下这个项目运行起来有哪些步骤。

train.py
# Copyright (c) 2025 Deep Robotics
# SPDX-License-Identifier: BSD 3-Clause

# Copyright (c) 2024-2025 Ziqi Fan
# SPDX-License-Identifier: Apache-2.0

# Copyright (c) 2024-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0

"""用于使用 RSL-RL 训练强化学习智能体的脚本。"""

"""请先启动 Isaac Sim 仿真器。"""

import argparse
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import sys

from isaaclab.app import AppLauncher

# 本地导入
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
import cli_args

# 添加 argparse 参数
parser = argparse.ArgumentParser(description="使用 RSL-RL 训练强化学习智能体。")
parser.add_argument("--video", action="store_true", default=False, help="在训练过程中录制视频。")
parser.add_argument("--video_length", type=int, default=200, help="录制视频的长度(以仿真步数为单位)。")
parser.add_argument("--video_interval", type=int, default=2000, help="视频录制的间隔(以仿真步数为单位)。")
parser.add_argument("--num_envs", type=int, default=None, help="要仿真的环境数量。")
parser.add_argument("--task", type=str, default=None, help="任务名称。")
parser.add_argument(
    "--agent", type=str, default="rsl_rl_cfg_entry_point", help="强化学习智能体配置的入口点名称。"
)
parser.add_argument("--seed", type=int, default=None, help="环境使用的随机种子。")
parser.add_argument("--max_iterations", type=int, default=None, help="强化学习策略的训练迭代次数。")
parser.add_argument(
    "--distributed", action="store_true", default=False, help="使用多 GPU 或多节点进行分布式训练。"
)
# 添加 RSL-RL 的命令行参数
cli_args.add_rsl_rl_args(parser)
# 添加 AppLauncher 的命令行参数
AppLauncher.add_app_launcher_args(parser)
args_cli, hydra_args = parser.parse_known_args()

# 如果启用了视频录制,则始终启用摄像头
if args_cli.video:
    args_cli.enable_cameras = True

# 清空 sys.argv 供 Hydra 使用
sys.argv = [sys.argv[0]] + hydra_args

# 启动 Omniverse 应用(必须先启动这个,才有后续的接口调用等操作) ,所以,直到这里都是比较模板化的代码。
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

"""其余部分如下。"""

import gymnasium as gym
import os
import torch
from datetime import datetime

from rsl_rl.runners import OnPolicyRunner

from isaaclab.envs import (
    DirectMARLEnv,
    DirectMARLEnvCfg,
    DirectRLEnvCfg,
    ManagerBasedRLEnvCfg,
    multi_agent_to_single_agent,
)
from isaaclab.utils.dict import print_dict
from isaaclab.utils.io import dump_yaml
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlVecEnvWrapper
from isaaclab_tasks.utils import get_checkpoint_path
from isaaclab_tasks.utils.hydra import hydra_task_config

import rl_training.tasks  # noqa: F401

torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark = False

#@hydra_task_config(args_cli.task, args_cli.agent) 是一个 自定义装饰器(decorator),用于在 Isaac Lab 项目中 自动加载与任务(task)和智能体(agent)对应的 YAML 配置文件,并将其解析为 Python 数据类(dataclass)对象,作为 main() 函数的参数传入。

#该装饰器内部大致执行以下逻辑:
#1、根据 task_name = args_cli.task
#→ 找到对应的 环境配置 YAML(如 cfg/task/deeprobotics_lite3_flat.yaml)
#→ 加载并实例化为 ManagerBasedRLEnvCfg(或子类)对象 → 传给 env_cfg
#2、根据 agent_name = args_cli.agent
#→ 找到对应的 智能体配置 YAML(如 cfg/agent/rsl_rl_ppo.yaml)
#→ 加载并实例化为 RslRlOnPolicyRunnerCfg 对象 → 传给 agent_cfg
#3、自动处理配置合并、默认值、类型检查等(通过 Hydra + dataclass)
#4、将两个配置对象作为位置参数传入 main()

@hydra_task_config(args_cli.task, args_cli.agent)
def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agent_cfg: RslRlOnPolicyRunnerCfg):
    """使用 RSL-RL 智能体进行训练。"""
    # 使用非 Hydra 的命令行参数覆盖配置
    agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli)
    env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs
    agent_cfg.max_iterations = (
        args_cli.max_iterations if args_cli.max_iterations is not None else agent_cfg.max_iterations
    )

    # 设置环境的随机种子
    # 注意:某些随机化操作发生在环境初始化阶段,因此在此处设置种子
    env_cfg.seed = agent_cfg.seed
    env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device

    # 多 GPU 训练配置
    if args_cli.distributed:
        env_cfg.sim.device = f"cuda:{app_launcher.local_rank}"
        agent_cfg.device = f"cuda:{app_launcher.local_rank}"

        # 设置不同的种子以保证不同进程间的多样性
        seed = agent_cfg.seed + app_launcher.local_rank
        env_cfg.seed = seed
        agent_cfg.seed = seed

    # 指定实验日志的根目录
    log_root_path = os.path.join("logs", "rsl_rl", agent_cfg.experiment_name)
    log_root_path = os.path.abspath(log_root_path)
    print(f"[INFO] 实验日志将保存至目录: {log_root_path}")
    # 指定本次运行的日志目录:{时间戳}_{运行名称}
    log_dir = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    # Ray Tune 工作流通过下面这行日志提取实验名称,因此请勿修改(参见 PR #2346, comment-2819298849)
    print(f"从命令行请求的确切实验名称: {log_dir}")
    if agent_cfg.run_name:
        log_dir += f"_{agent_cfg.run_name}"
    log_dir = os.path.join(log_root_path, log_dir)

    # 创建 Isaac 环境
    env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)

    # 如果 RL 算法需要,将多智能体环境转换为单智能体环境
    if isinstance(env.unwrapped, DirectMARLEnv):
        env = multi_agent_to_single_agent(env)

    # 在创建新日志目录前保存恢复路径
    if agent_cfg.resume or agent_cfg.algorithm.class_name == "Distillation":
        resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint)

    # 包装环境以支持视频录制
    if args_cli.video:
        video_kwargs = {
            "video_folder": os.path.join(log_dir, "videos", "train"),
            "step_trigger": lambda step: step % args_cli.video_interval == 0,
            "video_length": args_cli.video_length,
            "disable_logger": True,
        }
        print("[INFO] 训练过程中将录制视频。")
        print_dict(video_kwargs, nesting=4)
        env = gym.wrappers.RecordVideo(env, **video_kwargs)   # 这个包装器实现了包装环境以支持视频录制

    # 包装环境以适配 RSL-RL
    env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions)

    # 创建 RSL-RL 的训练器
    runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=log_dir, device=agent_cfg.device)
    # 将当前 Git 仓库状态写入日志
    runner.add_git_repo_to_log(__file__)
    # 加载检查点
    if agent_cfg.resume or agent_cfg.algorithm.class_name == "Distillation":
        print(f"[INFO]: 正在从以下路径加载模型检查点: {resume_path}")
        # 加载之前训练好的模型
        runner.load(resume_path)

    # 将配置文件保存到日志目录
    dump_yaml(os.path.join(log_dir, "params", "env.yaml"), env_cfg)
    dump_yaml(os.path.join(log_dir, "params", "agent.yaml"), agent_cfg)
    # 开始训练
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)

    # 关闭仿真器
    env.close()


if __name__ == "__main__":
    # 运行主函数
    main()
    # 关闭仿真应用
    simulation_app.close()
运行步骤(启动isaaclab+rsl-rl库调用):
命令行参数
     ↓
启动 Isaac Sim   #前面这两部比较模板化
     ↓
加载 task/agent 配置(YAML → dataclass)  #调用source里面定义的接口
     ↓
覆盖配置 + 分布式设置
     ↓
创建日志目录 & 保存配置
     ↓
构建 Gym 环境 → 包装(视频/多智能体/RSL-RL)
     ↓
初始化 RSL-RL Runner → 加载 checkpoint(可选)
     ↓
runner.learn() → 训练 N 轮
     ↓
关闭环境 & 仿真器
疑问1:为什么在代码里面没有看到训练模型的保存?

回答:在 train.py 中,模型参数(.pt 文件)并非由主脚本显式保存,而是由 RSL-RL 的 OnPolicyRunnerlearn() 内部自动处理

  • 保存路径logs/rsl_rl/{experiment_name}/{timestamp}_{run_name}/model_{iteration}.pt
  • 保存频率:由 agent_cfg.save_interval 配置项控制(如每 500 次迭代保存一次)
  • 保存内容:包含策略网络(actor/critic)、优化器状态等完整训练状态
  • 续训支持:通过 agent_cfg.resume=True + load_run/load_checkpoint 加载已有 .pt 文件

因此,尽管 train.py 中未见 save() 调用,模型仍会按配置自动持久化,确保实验可复现与中断恢复。

疑问2:什么是检查点?

回答:检查点 = 训练过程的“存档”在这里插入图片描述
与 Isaac Lab + RSL-RL 的结合:

  • 只需在配置中设置 save_interval,框架会自动保存检查点。
  • 使用 --resumeagent.resume=true 即可加载检查点继续训练。
  • play.py 脚本也依赖检查点进行策略回放。

rl_training/assets/deeprobotics.py

该文件用于配置目标机器人的属性。 在里面指定了需要加载哪个usd文件、物理属性(如关节、质量、摩擦等)、驱动器类型与属性:

deeprobotics.py
# Copyright (c) 2025 Deep Robotics
# SPDX-License-Identifier: BSD 3-Clause

# Copyright (c) 2024-2025 Ziqi Fan
# SPDX-License-Identifier: Apache-2.0

import isaaclab.sim as sim_utils
from isaaclab.actuators import DCMotorCfg, DelayedPDActuatorCfg
from isaaclab.assets.articulation import ArticulationCfg

from rl_training.assets import ISAACLAB_ASSETS_DATA_DIR

DEEPROBOTICS_LITE3_CFG = ArticulationCfg(
    spawn=sim_utils.UsdFileCfg(
        usd_path=f"{ISAACLAB_ASSETS_DATA_DIR}/Lite3/Lite3_usd/Lite3.usd",
        activate_contact_sensors=True,
        rigid_props=sim_utils.RigidBodyPropertiesCfg(     #定义刚体属性、限制
            disable_gravity=False,
            retain_accelerations=False,
            linear_damping=0.0,
            angular_damping=0.0,
            max_linear_velocity=1000.0,
            max_angular_velocity=1000.0,
            max_depenetration_velocity=1.0,
        ),
        articulation_props=sim_utils.ArticulationRootPropertiesCfg(  #动力学属性
            enabled_self_collisions=False, solver_position_iteration_count=4, 		                                     solver_velocity_iteration_count=1
        ),
    ),
    init_state=ArticulationCfg.InitialStateCfg(       #初始状态
        pos=(0.0, 0.0, 0.35),
        joint_pos={
            ".*HipX_joint": 0.0,
            ".*HipY_joint": -0.8,
            ".*Knee_joint": 1.6,
        },
        joint_vel={".*": 0.0},
    ),
    soft_joint_pos_limit_factor=0.99,         #限制
    actuators={                               #驱动器配置
        "Hip": DelayedPDActuatorCfg(          #“hip”关节组的驱动器
            joint_names_expr=[".*_Hip[X,Y]_joint"],
            effort_limit=24.0,
            velocity_limit=26.2,
            stiffness=30.0,
            damping=1.0,
            friction=0.0,
            armature=0.0,
            min_delay=0,
            max_delay=5,
        ),
        "Knee": DelayedPDActuatorCfg(        #“knee”关节组的驱动器
            joint_names_expr=[".*_Knee_joint"],
            effort_limit=36.0,
            velocity_limit=17.3,
            stiffness=30.0,
            damping=1.0,
            friction=0.0,
            armature=0.0,
            min_delay=0,
            max_delay=5,
        ),
    },
)

DEEPROBOTICS_M20_CFG = ArticulationCfg(
    spawn=sim_utils.UsdFileCfg(
        usd_path=f"{ISAACLAB_ASSETS_DATA_DIR}/M20/M20_usd/M20.usd",
        activate_contact_sensors=True,
        rigid_props=sim_utils.RigidBodyPropertiesCfg(
            disable_gravity=False,
            retain_accelerations=False,
            linear_damping=0.0,
            angular_damping=0.0,
            max_linear_velocity=1000.0,
            max_angular_velocity=1000.0,
            max_depenetration_velocity=1.0,
        ),
        articulation_props=sim_utils.ArticulationRootPropertiesCfg(
            enabled_self_collisions=False, solver_position_iteration_count=4, solver_velocity_iteration_count=1
        ),
    ),
    init_state=ArticulationCfg.InitialStateCfg(
        pos=(0.0, 0.0, 0.52),
        joint_pos={
            ".*hipx_joint": 0.0,
            "f[l,r]_hipy_joint": -0.6,
            "h[l,r]_hipy_joint": 0.6,
            "f[l,r]_knee_joint": 1.0,
            "h[l,r]_knee_joint": -1.0,
            ".*wheel_joint": 0.0,
        },
        joint_vel={".*": 0.0},
    ),
    soft_joint_pos_limit_factor=0.9,
    actuators={
        "joint": DelayedPDActuatorCfg(
            joint_names_expr=[".*hipx_joint", ".*hipy_joint", ".*knee_joint"],
            effort_limit=76.4,
            velocity_limit=22.4,
            stiffness=80.0,
            damping=2.0,
            friction=0.0,
            armature=0.0,
            min_delay=0,
            max_delay=5,
        ),
        "wheel": DelayedPDActuatorCfg(
            joint_names_expr=[".*_wheel_joint"],
            effort_limit=21.6,
            velocity_limit=79.3,
            stiffness=0.0,
            damping=0.6,
            friction=0.0,
            armature=0.00243216,
            min_delay=0,
            max_delay=5,
        ),
    },
)

velocity\mdp

该文件夹中都是云深处团队自己定制化的reward、observation等奖励/惩罚接口,与isaaclab官方的接口兼容,只是自己定制化实现了特定的函数功能。

在该文件夹(mdp)定义的这些接口都是为了供velocity\velocity_env_cfg.py这个通用基类调用的。

拿reward为例,这里面定义了很多官方文档中没有的奖励/惩罚项:

rewards.py
# Copyright (c) 2025 Deep Robotics
# SPDX-License-Identifier: BSD 3-Clause

# Copyright (c) 2024-2025 Ziqi Fan
# SPDX-License-Identifier: Apache-2.0

from __future__ import annotations

import torch
from typing import TYPE_CHECKING

import isaaclab.utils.math as math_utils
from isaaclab.assets import Articulation, RigidObject
from isaaclab.managers import ManagerTermBase
from isaaclab.managers import RewardTermCfg as RewTerm
from isaaclab.managers import SceneEntityCfg
from isaaclab.sensors import ContactSensor, RayCaster
from isaaclab.utils.math import quat_apply_inverse, yaw_quat

if TYPE_CHECKING:
    from isaaclab.envs import ManagerBasedRLEnv


def track_lin_vel_xy_exp(
    env: ManagerBasedRLEnv, std: float, command_name: str, asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")
) -> torch.Tensor:
    """Reward tracking of linear velocity commands (xy axes) using exponential kernel."""
    # extract the used quantities (to enable type-hinting)
    asset: RigidObject = env.scene[asset_cfg.name]
    # compute the error
    lin_vel_error = torch.sum(
        torch.square(env.command_manager.get_command(command_name)[:, :2] - asset.data.root_lin_vel_b[:, :2]),
        dim=1,
    )
    reward = torch.exp(-lin_vel_error / std**2)
    # reward *= torch.clamp(-env.scene["robot"].data.projected_gravity_b[:, 2], 0, 0.7) / 0.7
    return reward


def track_ang_vel_z_exp(
    env: ManagerBasedRLEnv, std: float, command_name: str, asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")
) -> torch.Tensor:
    """Reward tracking of angular velocity commands (yaw) using exponential kernel."""
    # extract the used quantities (to enable type-hinting)
    asset: RigidObject = env.scene[asset_cfg.name]
    # compute the error
    ang_vel_error = torch.square(env.command_manager.get_command(command_name)[:, 2] - asset.data.root_ang_vel_b[:, 2])
    reward = torch.exp(-ang_vel_error / std**2)
    # reward *= torch.clamp(-env.scene["robot"].data.projected_gravity_b[:, 2], 0, 0.7) / 0.7
    return reward
#。。。省略

velocity\velocity_env_cfg.py

在该文件中,做了三件事:一是定义scene、二是定义mdp(reward、observation等)、三是将scene、mdp设置、仿真设置、一些工具函数组合起来,作为一个类(继承ManagerBasedRLEnvCfg)暴露给外界,供脚本调用。

import inspect
import math
import sys
from dataclasses import MISSING

import isaaclab.sim as sim_utils
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
from isaaclab.envs import ManagerBasedRLEnvCfg
from isaaclab.managers import CurriculumTermCfg as CurrTerm
from isaaclab.managers import EventTermCfg as EventTerm
from isaaclab.managers import ObservationGroupCfg as ObsGroup
from isaaclab.managers import ObservationTermCfg as ObsTerm
from isaaclab.managers import RewardTermCfg as RewTerm
from isaaclab.managers import SceneEntityCfg
from isaaclab.managers import TerminationTermCfg as DoneTerm
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.sensors import ContactSensorCfg, RayCasterCfg, patterns
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils import configclass
from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR, ISAACLAB_NUCLEUS_DIR
from isaaclab.utils.noise import AdditiveUniformNoiseCfg as Unoise

import rl_training.tasks.manager_based.locomotion.velocity.mdp as mdp

##
# Pre-defined configs
##
from isaaclab.terrains.config.rough import ROUGH_TERRAINS_CFG  # isort: skip


##
# Scene definition 定义场景
##
@configclass
class MySceneCfg(InteractiveSceneCfg):
    """Configuration for the terrain scene with a legged robot."""

    # ground terrain
    terrain = TerrainImporterCfg(
        prim_path="/World/ground",
        terrain_type="generator",
        terrain_generator=ROUGH_TERRAINS_CFG,
        max_init_terrain_level=5,
        collision_group=-1,
        physics_material=sim_utils.RigidBodyMaterialCfg(
            friction_combine_mode="multiply",
            restitution_combine_mode="multiply",
            static_friction=1.0,
            dynamic_friction=1.0,
            restitution=1.0,
        ),
        visual_material=sim_utils.MdlFileCfg(
            mdl_path=f"{ISAACLAB_NUCLEUS_DIR}/Materials/TilesMarbleSpiderWhiteBrickBondHoned/TilesMarbleSpiderWhiteBrickBondHoned.mdl",
            project_uvw=True,
            texture_scale=(0.25, 0.25),
        ),
        debug_vis=False,
    )
    # robots
    robot: ArticulationCfg = MISSING
    # sensors
    height_scanner = RayCasterCfg(
        prim_path="{ENV_REGEX_NS}/Robot/base",
        offset=RayCasterCfg.OffsetCfg(pos=(0.0, 0.0, 20.0)),
        ray_alignment="yaw",
        pattern_cfg=patterns.GridPatternCfg(resolution=0.1, size=[1.6, 1.0]),
        debug_vis=False,
        mesh_prim_paths=["/World/ground"],
    )
    height_scanner_base = RayCasterCfg(
        prim_path="{ENV_REGEX_NS}/Robot/base",
        offset=RayCasterCfg.OffsetCfg(pos=(0.0, 0.0, 20.0)),
        ray_alignment="yaw",
        pattern_cfg=patterns.GridPatternCfg(resolution=0.05, size=(0.1, 0.1)),
        debug_vis=False,
        mesh_prim_paths=["/World/ground"],
    )
    contact_forces = ContactSensorCfg(prim_path="{ENV_REGEX_NS}/Robot/.*", history_length=3, track_air_time=True)
    # lights
    sky_light = AssetBaseCfg(
        prim_path="/World/skyLight",
        spawn=sim_utils.DomeLightCfg(
            intensity=750.0,
            texture_file=f"{ISAAC_NUCLEUS_DIR}/Materials/Textures/Skies/PolyHaven/kloofendal_43d_clear_puresky_4k.hdr",
        ),
    )


##
# MDP settings
##
@configclass
class CommandsCfg:
    """Command specifications for the MDP."""

    base_velocity = mdp.UniformThresholdVelocityCommandCfg(
        asset_name="robot",
        resampling_time_range=(10.0, 10.0),
        rel_standing_envs=0.02,
        rel_heading_envs=1.0,
        heading_command=True,
        heading_control_stiffness=0.5,
        debug_vis=True,
        ranges=mdp.UniformThresholdVelocityCommandCfg.Ranges(
            lin_vel_x=(-1.0, 1.0), lin_vel_y=(-1.0, 1.0), ang_vel_z=(-1.0, 1.0), heading=(-math.pi, math.pi)
        ),
    )

@configclass
class ActionsCfg:
    """Action specifications for the MDP."""

    joint_pos = mdp.JointPositionActionCfg(
        asset_name="robot", joint_names=[".*"], scale=0.5, use_default_offset=True, clip=None, preserve_order=True
    )

@configclass
class ObservationsCfg:
    """Observation specifications for the MDP."""

    @configclass
    class PolicyCfg(ObsGroup):
        """Observations for policy group."""

        # observation terms (order preserved)
        base_lin_vel = ObsTerm(
            func=mdp.base_lin_vel,
            noise=Unoise(n_min=-0.1, n_max=0.1),
            clip=(-100.0, 100.0),
            scale=1.0,
        )
        base_ang_vel = ObsTerm(
            func=mdp.base_ang_vel,
            noise=Unoise(n_min=-0.2, n_max=0.2),
            clip=(-100.0, 100.0),
            scale=1.0,
        )
       # ……(省略)

        def __post_init__(self):
            self.enable_corruption = True
            self.concatenate_terms = True

    @configclass
    class CriticCfg(ObsGroup):
        """Observations for critic group."""

        # observation terms (order preserved)
        base_lin_vel = ObsTerm(
            func=mdp.base_lin_vel,
            clip=(-100.0, 100.0),
            scale=1.0,
        )
        base_ang_vel = ObsTerm(
            func=mdp.base_ang_vel,
            clip=(-100.0, 100.0),
            scale=1.0,
        )
       # ……(省略)

        def __post_init__(self):
            self.enable_corruption = False
            self.concatenate_terms = True

    # observation groups
    policy: PolicyCfg = PolicyCfg()
    critic: CriticCfg = CriticCfg()


@configclass
class EventCfg:   #用于随机化一些量(域随机化),减少sim-to-real的gap
    """Configuration for events."""

    # startup (开始时触发的事件)
    randomize_rigid_body_material = EventTerm(
        func=mdp.randomize_rigid_body_material,
        mode="startup",
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names=".*"),
            "static_friction_range": (0.3, 1.0),
            "dynamic_friction_range": (0.3, 0.8),
            "restitution_range": (0.0, 0.4),
            "num_buckets": 1024,
        },
    )

    #……(省略)

    # reset (重置时触发的事件)
    randomize_apply_external_force_torque = EventTerm(
        func=mdp.apply_external_force_torque,
        mode="reset",
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names=""),
            "force_range": (-10.0, 10.0),
            "torque_range": (-10.0, 10.0),
        },
    )

    #……(省略)

    # interval ()
    randomize_push_robot = EventTerm(
        func=mdp.push_by_setting_velocity,
        mode="interval",
        interval_range_s=(10.0, 15.0),
        params={"velocity_range": {"x": (-0.5, 0.5), "y": (-0.5, 0.5)}},
    )


@configclass
class RewardsCfg:
    """Reward terms for the MDP."""

    # General 模板化的惩罚项
    is_terminated = RewTerm(func=mdp.is_terminated, weight=0.0)

    # Root penalties 对基座的惩罚
    lin_vel_z_l2 = RewTerm(func=mdp.lin_vel_z_l2, weight=0.0)  #防止z有速度,减少起伏
    ang_vel_xy_l2 = RewTerm(func=mdp.ang_vel_xy_l2, weight=0.0)  #减少前倾/后仰/侧倾
    flat_orientation_l2 = RewTerm(func=mdp.flat_orientation_l2, weight=0.0) #基座base保持平行地面
    base_height_l2 = RewTerm(                                             #保证基座与地面的高度
        func=mdp.base_height_l2,
        weight=0.0,
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names=""),
            "sensor_cfg": SceneEntityCfg("height_scanner_base"),
            "target_height": 0.0,
        },
    )
    body_lin_acc_l2 = RewTerm(                            #惩罚加速度,防止大动作
        func=mdp.body_lin_acc_l2,
        weight=0.0,
        params={"asset_cfg": SceneEntityCfg("robot", body_names="")},
    )

    # Joint penalties 关节惩罚
    joint_torques_l2 = RewTerm(
        func=mdp.joint_torques_l2, weight=0.0, params={"asset_cfg": SceneEntityCfg("robot", joint_names=".*")}
    )
    joint_vel_l2 = RewTerm(
        func=mdp.joint_vel_l2, weight=0.0, params={"asset_cfg": SceneEntityCfg("robot", joint_names=".*")}
    )
    #……(省略)

    # Action penalties  动作惩罚
    applied_torque_limits = RewTerm(
        func=mdp.applied_torque_limits,
        weight=0.0,
        params={"asset_cfg": SceneEntityCfg("robot", joint_names=".*")},
    )
    action_rate_l2 = RewTerm(func=mdp.action_rate_l2, weight=0.0)
    #……(省略)
    
    # Velocity-tracking rewards 速度跟踪奖励
    track_lin_vel_xy_exp = RewTerm(
        func=mdp.track_lin_vel_xy_exp, weight=0.0, params={"command_name": "base_velocity", "std": math.sqrt(0.5)}
    )
    track_ang_vel_z_exp = RewTerm(
        func=mdp.track_ang_vel_z_exp, weight=0.0, params={"command_name": "base_velocity", "std": math.sqrt(0.5)}
    )

    # Others  其他
    feet_slide = RewTerm(
        func=mdp.feet_slide,
        weight=0.0,
        params={
            "sensor_cfg": SceneEntityCfg("contact_forces", body_names=""),
            "asset_cfg": SceneEntityCfg("robot", body_names=""),
        },
    )

    stand_still = RewTerm(
        func=mdp.stand_still_joint_deviation_l1,
        weight=0,
        params={
            "command_name": "base_velocity",
            "asset_cfg": SceneEntityCfg("robot", joint_names=""),
        },
    ) # negetive

    feet_height = RewTerm(
        func=mdp.feet_height,
        weight=0.0,
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names=""),
            "tanh_mult": 2.0,
            "target_height": 0.05,
            "command_name": "base_velocity",
        },
    )

    feet_height_body = RewTerm(
        func=mdp.feet_height_body,
        weight=0.0,
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names=""),
            "tanh_mult": 2.0,
            "target_height": -0.3,
            "command_name": "base_velocity",
        },
    )

    feet_distance_y_exp = RewTerm(
        func=mdp.feet_distance_y_exp,
        weight=0.0,
        params={
            "std": math.sqrt(0.25),
            "asset_cfg": SceneEntityCfg("robot", body_names=""),
            "stance_width": float,
        },
    )

#……(省略)


@configclass
class TerminationsCfg:
    """Termination terms for the MDP."""

    # MDP terminations
    time_out = DoneTerm(func=mdp.time_out, time_out=True)
    # command_resample
    terrain_out_of_bounds = DoneTerm(
        func=mdp.terrain_out_of_bounds,
        params={"asset_cfg": SceneEntityCfg("robot"), "distance_buffer": 3.0},
        time_out=True,
    )

    # Contact sensor
    illegal_contact = DoneTerm(
        func=mdp.illegal_contact,
        params={"sensor_cfg": SceneEntityCfg("contact_forces", body_names=""), "threshold": 1.0},
    )

    bad_orientation_2 = DoneTerm(func=mdp.bad_orientation_2)
    


@configclass
class CurriculumCfg:
    """Curriculum terms for the MDP."""

    terrain_levels = CurrTerm(func=mdp.terrain_levels_vel)

    command_levels = CurrTerm(
        func=mdp.command_levels_vel,
        params={
            "reward_term_name": "track_lin_vel_xy_exp",
            "range_multiplier": (0.1, 1.0),
        },
    )


##
# Environment configuration  组合在一起。
##


@configclass
class LocomotionVelocityRoughEnvCfg(ManagerBasedRLEnvCfg):
    """Configuration for the locomotion velocity-tracking environment."""

    # Scene settings 场景设置
    scene: MySceneCfg = MySceneCfg(num_envs=4096, env_spacing=2.5)
    # Basic settings  基础量设置
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
    commands: CommandsCfg = CommandsCfg() 
    # MDP settings   强化学习设置
    rewards: RewardsCfg = RewardsCfg()
    terminations: TerminationsCfg = TerminationsCfg()
    events: EventCfg = EventCfg()
    curriculum: CurriculumCfg = CurriculumCfg()

    def __post_init__(self):
        """Post initialization."""
        # general settings  训练基础配置
        self.decimation = 4
        self.episode_length_s = 20.0
        # simulation settings  仿真设置
        self.sim.dt = 0.005
        self.sim.render_interval = self.decimation
        self.sim.physics_material = self.scene.terrain.physics_material
        self.sim.physx.gpu_max_rigid_patch_count = 10 * 2**15
        self.sim.physx.max_position_iteration_count = 4
        self.sim.physx.max_velocity_iteration_count = 1
        # update sensor update periods
        # we tick all the sensors based on the smallest update period (physics update period)
        if self.scene.height_scanner is not None:
            self.scene.height_scanner.update_period = self.decimation * self.sim.dt
        if self.scene.contact_forces is not None:
            self.scene.contact_forces.update_period = self.sim.dt

        # check if terrain levels curriculum is enabled - if so, enable curriculum for terrain generator
        # this generates terrains with increasing difficulty and is useful for training
        if getattr(self.curriculum, "terrain_levels", None) is not None:
            if self.scene.terrain.terrain_generator is not None:
                self.scene.terrain.terrain_generator.curriculum = True
        else:
            if self.scene.terrain.terrain_generator is not None:
                self.scene.terrain.terrain_generator.curriculum = False

    def disable_zero_weight_rewards(self):
        """If the weight of rewards is 0, set rewards to None"""
        for attr in dir(self.rewards):
            if not attr.startswith("__"):
                reward_attr = getattr(self.rewards, attr)
                if not callable(reward_attr) and reward_attr.weight == 0:
                    setattr(self.rewards, attr, None)


def create_obsgroup_class(class_name, terms, enable_corruption=False, concatenate_terms=True):
    """
    Dynamically create and register a ObsGroup class based on the given configuration terms.

    :param class_name: Name of the configuration class.
    :param terms: Configuration terms, a dictionary where keys are term names and values are term content.
    :param enable_corruption: Whether to enable corruption for the observation group. Defaults to False.
    :param concatenate_terms: Whether to concatenate the observation terms in the group. Defaults to True.
    :return: The dynamically created class.
    """
    # Dynamically determine the module name
    module_name = inspect.getmodule(inspect.currentframe()).__name__

    # Define the post-init function
    def post_init_wrapper(self):
        setattr(self, "enable_corruption", enable_corruption)
        setattr(self, "concatenate_terms", concatenate_terms)

    # Dynamically create the class using ObsGroup as the base class
    terms["__post_init__"] = post_init_wrapper
    dynamic_class = configclass(type(class_name, (ObsGroup,), terms))

    # Custom serialization and deserialization
    def __getstate__(self):
        state = self.__dict__.copy()
        return state

    def __setstate__(self, state):
        self.__dict__.update(state)

    # Add custom serialization methods to the class
    dynamic_class.__getstate__ = __getstate__
    dynamic_class.__setstate__ = __setstate__

    # Place the class in the global namespace for accessibility
    globals()[class_name] = dynamic_class

    # Register the dynamic class in the module's dictionary
    if module_name in sys.modules:
        sys.modules[module_name].__dict__[class_name] = dynamic_class
    else:
        raise ImportError(f"Module {module_name} not found.")

    # Return the class for external instantiation
    return dynamic_class

velocity\config\quadruped\deeprobotics_lite3\agents\rsl_rl_ppo_cfg.py

该文件用于配置核心强化学习算法PPO的超参数。

该文件就是最终在velocity\config\quadruped\deeprobotics_lite3\init.py中gym注册gym环境的智能体算法超参数配置类

rsl_rl_ppo_cfg.py
from isaaclab.utils import configclass
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg


@configclass
class DeeproboticsLite3RoughPPORunnerCfg(RslRlOnPolicyRunnerCfg):
    num_steps_per_env = 24
    max_iterations = 10000
    save_interval = 100
    experiment_name = "deeprobotics_lite3_rough"
    empirical_normalization = False
    clip_actions = 100
    policy = RslRlPpoActorCriticCfg(
        init_noise_std=1.0,
        noise_std_type="log",
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
    )
    algorithm = RslRlPpoAlgorithmCfg(
        value_loss_coef=1.0,
        use_clipped_value_loss=True,
        clip_param=0.2,
        entropy_coef=0.01,
        num_learning_epochs=5,
        num_mini_batches=4,
        learning_rate=1.0e-3,
        schedule="adaptive",
        gamma=0.99,
        lam=0.95,
        desired_kl=0.01,
        max_grad_norm=1.0,
    )


@configclass
class DeeproboticsLite3FlatPPORunnerCfg(DeeproboticsLite3RoughPPORunnerCfg):
    def __post_init__(self):
        super().__post_init__()

        self.max_iterations = 10000
        self.experiment_name = "deeprobotics_lite3_flat"

velocity\config\quadruped\deeprobotics_lite3\rough_env_cfg.py

该文件才是实际中lite3训练环境的最终配置文件,它继承并使用了velocity_env_cfg.py这个通用基类中定义好的奖励等接口,给每个奖励/惩罚设置了权重。并在这里加载了之前在rl_training/assets/deeprobotics.py 中定义的机器人。

调试的时候主要就是在这个文件里面调参!!

该文件就是最终在velocity\config\quadruped\deeprobotics_lite3\init.py中gym注册环境的环境配置类

rough_env_cfg.py
# Copyright (c) 2025 Deep Robotics
# SPDX-License-Identifier: BSD 3-Clause

# Copyright (c) 2024-2025 Ziqi Fan
# SPDX-License-Identifier: Apache-2.0

from isaaclab.utils import configclass

from rl_training.tasks.manager_based.locomotion.velocity.velocity_env_cfg import LocomotionVelocityRoughEnvCfg
# from isaaclab.sensors.ray_caster import GridPatternCfg
##
# Pre-defined configs
##
from rl_training.assets.deeprobotics import DEEPROBOTICS_LITE3_CFG  # isort: skip


@configclass
class DeeproboticsLite3RoughEnvCfg(LocomotionVelocityRoughEnvCfg):
    base_link_name = "TORSO"
    foot_link_name = ".*_FOOT"
    # fmt: off
    joint_names = [
        "FL_HipX_joint", "FL_HipY_joint", "FL_Knee_joint",
        "FR_HipX_joint", "FR_HipY_joint", "FR_Knee_joint",
        "HL_HipX_joint", "HL_HipY_joint", "HL_Knee_joint",
        "HR_HipX_joint", "HR_HipY_joint", "HR_Knee_joint",
    ]

    link_names = [
       'TORSO', 
       'FL_HIP', 'FR_HIP', 'HL_HIP', 'HR_HIP', 
       'FL_THIGH', 'FR_THIGH', 'HL_THIGH', 'HR_THIGH', 
       'FL_SHANK', 'FR_SHANK', 'HL_SHANK', 'HR_SHANK', 
       'FL_FOOT', 'FR_FOOT', 'HL_FOOT', 'HR_FOOT',
    ]
    # fmt: on

    def __post_init__(self):   #初始化初始状态
        # post init of parent
        super().__post_init__()

        # ------------------------------Sence------------------------------
        self.scene.robot = DEEPROBOTICS_LITE3_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
        self.scene.height_scanner.prim_path = "{ENV_REGEX_NS}/Robot/" + self.base_link_name
        self.scene.height_scanner_base.prim_path = "{ENV_REGEX_NS}/Robot/" + self.base_link_name
        self.scene.height_scanner.pattern_cfg.resolution = 0.07 #  = GridPatternCfg(resolution=0.07, size=[1.6, 1.0]),

        # ------------------------------Observations------------------------------
        self.observations.policy.base_lin_vel = None # type: ignore
        self.observations.policy.height_scan = None # type: ignore
        self.observations.policy.base_ang_vel.scale = 0.25
        self.observations.policy.joint_pos.scale = 1.0
        self.observations.policy.joint_vel.scale = 0.05
        self.observations.policy.joint_pos.params["asset_cfg"].joint_names = self.joint_names
        self.observations.policy.joint_vel.params["asset_cfg"].joint_names = self.joint_names

        # ------------------------------Actions------------------------------
        # reduce action scale
        self.actions.joint_pos.scale = {".*_HipX_joint": 0.125, "^(?!.*_HipX_joint).*": 0.25}
        self.actions.joint_pos.clip = {".*": (-100.0, 100.0)}
        self.actions.joint_pos.joint_names = self.joint_names

        # ------------------------------Events------------------------------
        self.events.randomize_reset_base.params = {
            "pose_range": {
                "x": (-1.0, 1.0),
                "y": (-1.0, 1.0),
                "z": (0.0, 0.0),
                "roll": (-0.3, 0.3),
                "pitch": (-0.3, 0.3),
                "yaw": (-3.14, 3.14),
            },
            "velocity_range": {
                "x": (-0.2, 0.2),
                "y": (-0.2, 0.2),
                "z": (-0.2, 0.2),
                "roll": (-0.05, 0.05),
                "pitch": (-0.05, 0.05),
                "yaw": (-0.0, 0.0),
            },
        }

        #设置具体的参数和权重
        
        self.events.randomize_rigid_body_mass.params["asset_cfg"].body_names = self.link_names # [self.base_link_name]
        self.events.randomize_rigid_body_mass_base = None
        self.events.randomize_com_positions.params["asset_cfg"].body_names = self.base_link_name # [self.base_link_name]
        # self.events.randomize_com_positions = None
        self.events.randomize_apply_external_force_torque = None
        self.events.randomize_push_robot = None
        self.events.randomize_actuator_gains.params["asset_cfg"].joint_names = self.joint_names

        # set terrain generation probability to 0 for boxes and stairs
        self.scene.terrain.terrain_generator.sub_terrains["random_rough"].proportion = 0.4
        self.scene.terrain.terrain_generator.sub_terrains["hf_pyramid_slope"].proportion = 0.3
        self.scene.terrain.terrain_generator.sub_terrains["hf_pyramid_slope_inv"].proportion = 0.3
        self.scene.terrain.terrain_generator.sub_terrains["boxes"].proportion = 0.0
        self.scene.terrain.terrain_generator.sub_terrains["pyramid_stairs"].proportion = 0.0
        self.scene.terrain.terrain_generator.sub_terrains["pyramid_stairs_inv"].proportion = 0.0
        # scale down the terrains because the robot is small
        # self.scene.terrain.terrain_generator.sub_terrains["boxes"].grid_height_range = (0.025, 0.1)
        # self.scene.terrain.terrain_generator.sub_terrains["boxes"].grid_width = 0.8
        self.scene.terrain.terrain_generator.sub_terrains["random_rough"].noise_range = (0.01, 0.06)
        self.scene.terrain.terrain_generator.sub_terrains["random_rough"].noise_step = 0.01

        # ------------------------------Rewards------------------------------
        self.rewards.action_rate_l2.weight = -0.02 #-0.02
        # self.rewards.smoothness_2.weight = -0.0075

        self.rewards.base_height_l2.weight = -10.0
        self.rewards.base_height_l2.params["target_height"] = 0.35
        self.rewards.base_height_l2.params["asset_cfg"].body_names = [self.base_link_name]

        self.rewards.feet_air_time.weight = 5.0 # 5.0
        self.rewards.feet_air_time.params["threshold"] = 0.5
        self.rewards.feet_air_time.params["sensor_cfg"].body_names = [self.foot_link_name]
        self.rewards.feet_air_time_variance.weight = -8.0 # -8.0
        self.rewards.feet_air_time_variance.params["sensor_cfg"].body_names = [self.foot_link_name]
        self.rewards.feet_slide.weight = -0.05
        self.rewards.feet_slide.params["sensor_cfg"].body_names = [self.foot_link_name]
        self.rewards.feet_slide.params["asset_cfg"].body_names = [self.foot_link_name]
        self.rewards.stand_still.weight = -0.5 # -1.0
        self.rewards.stand_still.params["asset_cfg"].joint_names = self.joint_names
        self.rewards.stand_still.params["command_threshold"] = 0.1
        self.rewards.feet_height_body.weight = -2.5 # -2.5
        self.rewards.feet_height_body.params["target_height"] = -0.35
        self.rewards.feet_height_body.params["asset_cfg"].body_names = [self.foot_link_name]
        self.rewards.feet_height.weight = -0.2 # -0.2
        self.rewards.feet_height.params["asset_cfg"].body_names = [self.foot_link_name]
        self.rewards.feet_height.params["target_height"] = 0.05
        self.rewards.contact_forces.weight = -1e-1 # -2e-2
        self.rewards.contact_forces.params["sensor_cfg"].body_names = [self.foot_link_name]

        self.rewards.lin_vel_z_l2.weight = -2.0 #-2.0
        self.rewards.ang_vel_xy_l2.weight = -0.05 # -0.05

        self.rewards.track_lin_vel_xy_exp.weight = 3.0
        self.rewards.track_ang_vel_z_exp.weight = 1.5

        self.rewards.undesired_contacts.weight = -0.5
        self.rewards.undesired_contacts.params["sensor_cfg"].body_names = [f"^(?!.*{self.foot_link_name}).*"]

        self.rewards.joint_torques_l2.weight = -2.5e-5
        self.rewards.joint_acc_l2.weight = -1e-8
        self.rewards.joint_deviation_l1.weight = -0.5
        self.rewards.joint_deviation_l1.params["asset_cfg"].joint_names = [".*HipX.*"]
        self.rewards.joint_power.weight = -2e-5
        self.rewards.flat_orientation_l2.weight = -5.0

        # add the following rewards to improve the gait
        self.rewards.feet_gait.weight = 0.5
        self.rewards.feet_gait.params["synced_feet_pair_names"] = [
            ["FL_FOOT", "HR_FOOT"],
            ["FR_FOOT", "HL_FOOT"]
        ]

        self.rewards.joint_mirror.weight = -0.05
        self.rewards.joint_mirror.params["mirror_joints"] = [
            ["FL_(HipX|HipY|Knee).*", "HR_(HipX|HipY|Knee).*"],
            ["FR_(HipX|HipY|Knee).*", "HL_(HipX|HipY|Knee).*"],
        ]

        self.rewards.joint_pos_limits.weight = -5.0
        # self.rewards.joint_pos_penalty.weight = -1.0
        self.rewards.feet_contact_without_cmd.weight = 0.1
        self.rewards.feet_contact_without_cmd.params["sensor_cfg"].body_names = [self.foot_link_name]


        # If the weight of rewards is 0, set rewards to None
        if self.__class__.__name__ == "DeeproboticsLite3RoughEnvCfg":
            self.disable_zero_weight_rewards()

        # ------------------------------Terminations------------------------------
        self.terminations.illegal_contact = None
        # self.terminations.bad_orientation_2 = None

        # ------------------------------Curriculums------------------------------
        # self.curriculum.command_levels.params["range_multiplier"] = (0.2, 1.0)
        self.curriculum.command_levels = None

        # ------------------------------Commands------------------------------
        self.commands.base_velocity.ranges.lin_vel_x = (-1.5, 1.5)
        self.commands.base_velocity.ranges.lin_vel_y = (-0.8, 0.8)
        self.commands.base_velocity.ranges.ang_vel_z = (-0.8, 0.8)

velocity\config\quadruped\deeprobotics_lite3\init.py

该init.py文件与其他只是用于“标识该文件夹为python模块“的init.py不一样,其他init.py只是标识+导入库。而这个init.py是用于定义一个gym的环境。是Isaac Lab 项目中 Gym 环境的注册入口(Registration Entry Point),属于 任务(Task)模块的初始化文件(通常是 __init__.py),用于将自定义的机器人强化学习环境 注册到 Gymnasium(Gym)框架中,使其可以通过标准的 gym.make("Env-ID") 方式创建。

一个gym环境包括了:

1、环境名id(在一定规则下自拟)

2、entry_point=“isaaclab.envs:ManagerBasedRLEnv”(在manager下,这里的入口都是一样的。但在direct模式下,此处有不同)

3、disable_env_checker=True,

4、kwargs:一般包括env_cfg的配置类、智能体(某个框架下的某个算法,如rsl-rl下的PPO算法)的配置类

init.py
import gymnasium as gym
from . import agents
##
# Register Gym environments.
##
gym.register(
    id="Flat-Deeprobotics-Lite3-v0",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": f"{__name__}.flat_env_cfg:DeeproboticsLite3FlatEnvCfg",
        
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DeeproboticsLite3FlatPPORunnerCfg", 
        #rsl_rl_ppo_cfg就是上个标题分析的rsl_rl_ppo_cfg.py文件的文件名,DeeproboticsLite3FlatPPORunnerCfg就是rsl_rl_ppo_cfg.py文件中的一个超参数配置类
        
        "cusrl_cfg_entry_point": f"{agents.__name__}.cusrl_ppo_cfg:DeeproboticsLite3FlatTrainerCfg", 
        #另一种框架下的配置,方便兼容另一种强化学习算法框架
    },
)

gym.register(
    id="Rough-Deeprobotics-Lite3-v0",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": f"{__name__}.rough_env_cfg:DeeproboticsLite3RoughEnvCfg",
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DeeproboticsLite3RoughPPORunnerCfg",
        "cusrl_cfg_entry_point": f"{agents.__name__}.cusrl_ppo_cfg:DeeproboticsLite3RoughTrainerCfg",
    },
)

开发思路“串”起来

整个isaaclab都是基于面向对象编程的。基本思路都是,先给它的配置类cfg进行配置,然后再将cfg类放到实际的运行类中。

1️⃣自定义mdp(rewards、commands、observations、events、curriculums);

2️⃣在rl_training/assets/deeprobotics.py 中完成机器人usd、物理属性、驱动器等的配置参数编写,用一个变量DEEPROBOTICS_LITE3_CFG来接收

3️⃣在velocity_env_cfg.py的通用基类编写中调用官方mdp和自定义mdp中的函数,构建出包括场景、mdp配置、仿真配置等的基类

4️⃣在config文件夹的具体的机器人型号lite3的rough_env_cfg.py中 (1)继承velocity_env_cfg的基类,并赋予参数以及权重(之后调参就在这里调) (2) 根据deeprobotics.py中的机器人定义,把机器人加载进来;

5️⃣ 定义lite3\agents\ rsl_rl_ppo_cfg.py中智能体的超参数

6️⃣在deeprobotics_lite3\ init.py中注册gym环境,kwarg参数分别配置为rough_env_cfg.pyrsl_rl_ppo_cfg.py中的类名。

7️⃣ 在script脚本中对注册好的gym环境 以及 rsl-rl库 进行调用,编写训练脚本

补充

运行方式

除了这种在gym里面注册环境,然后在脚本中调用gym去调用我们定义的环境来训练的方法以外,其实官方还给出了另一种方式。这种方式是更贴近gymnasium库的调用方式的,对初学者更友好。

创建基于管理器的强化学习环境 — Isaac Lab 文档

其中env_cfg.py没什么区别。只是不需要脚本运行文件了,变为了run_env.py,与env_cfg.py放在同一目录下。

下面贴出run_cartpole_rl_env.py的代码:

"""Launch Isaac Sim Simulator first.""" 开启仿真应用

import argparse

from isaaclab.app import AppLauncher

# add argparse arguments   解析命令行参数
parser = argparse.ArgumentParser(description="Tutorial on running the cartpole RL environment.")
parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")

# append AppLauncher cli args  
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
args_cli = parser.parse_args()

# launch omniverse app  启动
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

"""Rest everything follows."""

import torch

from isaaclab.envs import ManagerBasedRLEnv

from isaaclab_tasks.manager_based.classic.cartpole.cartpole_env_cfg import CartpoleEnvCfg


def main():
    """Main function."""
    # create environment configuration  根据env_cfg.py来创建环境配置
    env_cfg = CartpoleEnvCfg()
    env_cfg.scene.num_envs = args_cli.num_envs
    env_cfg.sim.device = args_cli.device
    # setup RL environment     依照配置创建环境
    env = ManagerBasedRLEnv(cfg=env_cfg)

    # simulate physics  仿真设置、多久重置一次
    count = 0
    while simulation_app.is_running():
        with torch.inference_mode():
            # reset
            if count % 300 == 0:
                count = 0
                env.reset()
                print("-" * 80)
                print("[INFO]: Resetting environment...")
            # sample random actions  随机动作
            joint_efforts = torch.randn_like(env.action_manager.action)
            # step the environment
            obs, rew, terminated, truncated, info = env.step(joint_efforts)
            # print current orientation of pole
            print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
            # update counter
            count += 1

    # close the environment
    env.close()


if __name__ == "__main__":
    # run the main function
    main()
    # close sim app
    simulation_app.close()
奖励函数的设计思路

官方文档:

探索 RL 问题 — Isaac Lab 文档

奖励函数的设计是非常关键的步骤,设计思路有很多。就云深处这个项目来说(机器狗),从具体的层面来看,他们定义了两腿之间的距离、平稳度等奖励函数接口(这些都是可以通过基础物理量线速度、关节线/角速度、足部传感器、高度传感器等封装出来);进一步来看,想要设计好的奖励函数还需要一定的线性代数的理解,官方文档中的举例就很生动。在这里插入图片描述
在这里插入图片描述

Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐