G1 编程与工程 IsaacLab 训练配置

G1-23dof CPG-Flat 任务训练配置规范

2026-04-12 · 6 min read

G1-23dof CPG-Flat 任务训练配置规范

文档版本：V1.0
机器人平台：Unitree G1-23dof
仿真框架：IsaacLab 2.3.0 + IsaacSim 5.1.0
训练算法：RSL-RL PPO
编写日期：2026-04-12
任务标识：Unitree-G1-23dof-CPG-Flat
相关文档：G1-23dof Velocity 任务训练配置规范

1. 任务概述

1.1 任务目标

CPG-Flat 任务在 Velocity 基线基础上引入 Central Pattern Generator (CPG) 振荡器模块，用于生成周期性步态基准动作，策略学习叠加在其之上的残差修正。

核心目标：

利用生物启发的 CPG 机制生成结构化周期步态，减少策略探索空间，提升步态稳定性和自然性。

1.2 与 Velocity 任务的差异

维度	Velocity	CPG-Flat
动作接口	直接关节位置	CPG 基准 + 残差
步态生成	策略全权学习	CPG 强制周期 + 策略修正
观测空间	78D	82D（+ 4D CPG 相位）
地形	混合地形	仅平地
策略复杂度	需学习步态节奏	专注残差学习

1.3 任务特性

特性	说明
任务类型	CPG 引导的速度跟踪
地形	平地（flat only）
CPG 周期	0.8 秒/步态周期
Episode 长度	20 秒
并行环境数	4096（训练）/ 32（推理）

2. CPG 振荡器设计

2.1 生物学背景

Central Pattern Generator 是生物神经系统中负责生成节律性运动（如行走、跑步、呼吸）的 neural circuitry。在哺乳动物中，CPG 位于脊髓，能够在无高层控制信号的情况下自主产生周期性的肌肉激活模式。

本项目将 CPG 概念数学化为相位振荡器，驱动关节产生周期性轨迹。

2.2 数学模型

采用相位振荡器实现，核心变量为环境级别的相位 $\phi \in [0, 1)$：

# 相位更新（每控制步）
phi = (phi + dt / period) % 1.0

# 左右腿相位差 0.5（对角步态）
phi_left = phi
phi_right = (phi + 0.5) % 1.0

关节偏移量通过正弦函数计算：

offset[joint] = amplitude[joint] × sin(2π × (phi + joint_phase_offset[joint]))

2.3 关节驱动配置

关节	驱动方式	幅度	相位
`hip_pitch`	sin 波	0.25 rad	0 或 0.5
`knee`	ReLU(sin)	0.30 rad	0 或 0.5（仅摆动相激活）
`ankle_pitch`	-sin 波	0.15 rad	0 或 0.5（补偿髋关节）
`hip_roll`, `hip_yaw`, `ankle_roll`	无	0	—
手臂/腰部	无	0	—

2.4 CPG 状态观测（4D）

def get_state(self):
    # 返回左右腿相位编码：[sin(phi_L), cos(phi_L), sin(phi_R), cos(phi_R)]
    return torch.stack([sin(phi_L), cos(phi_L), sin(phi_R), cos(phi_R)], dim=-1)

采用 sin/cos 编码而非单一相位值，避免周期性边界的梯度不连续问题。

3. CPG-Residual 动作接口

3.1 接口原理

最终关节命令 = 默认姿态 + CPG基准偏移 + scale × 残差动作
              default_pose  + cpg_offset   +  0.25  ×   residual

策略网络输出残差动作，CPG 提供周期性基准轨迹，两者叠加形成完整动作。

3.2 实现类

class CPGResidualJointPositionAction(JointPositionAction):
    def __init__(self, cfg, env):
        super().__init__(cfg, env)
        self._cpg = HumanoidCPG(
            num_envs=env.num_envs,
            num_joints=self._num_joints,
            joint_names=list(self._joint_names),
            period=cfg.cpg_period,
            hip_amplitude=cfg.cpg_hip_amplitude,
            knee_amplitude=cfg.cpg_knee_amplitude,
            ankle_amplitude=cfg.cpg_ankle_amplitude,
            left_right_offset=cfg.cpg_left_right_offset,
            device=env.device,
        )

    def process_actions(self, actions):
        # 1. 推进 CPG 相位
        self._cpg.step(self._env.step_dt)

        # 2. 计算 CPG 偏移
        cpg_offsets = self._cpg.get_offsets()

        # 3. 组合：default + CPG + scale × residual
        self._processed_actions = (
            self._raw_actions * self._scale
            + self._offset
            + cpg_offsets
        )

3.3 配置参数

CPGResidualJointPositionActionCfg(
    asset_name="robot",
    joint_names=[".*"],
    scale=0.25,                      # 残差尺度（与 Velocity 一致）
    use_default_offset=True,

    # CPG 参数
    cpg_period=0.8,                 # 步态周期 0.8 秒
    cpg_hip_amplitude=0.25,         # 髋关节幅度（~14°）
    cpg_knee_amplitude=0.30,        # 膝关节幅度（~17°）
    cpg_ankle_amplitude=0.15,       # 踝关节补偿幅度（~8.6°）
    cpg_left_right_offset=0.5,       # 左右脚相位差（对角步态）
)

4. 环境配置（与 Velocity 差异点）

4.1 地形配置

FLAT_TERRAIN_CFG = TerrainGeneratorCfg(
    size=(8.0, 8.0),
    border_width=20.0,
    num_rows=9,
    num_cols=21,
    horizontal_scale=0.1,
    vertical_scale=0.005,
    slope_threshold=0.75,
    sub_terrains={
        "flat": MeshPlaneTerrainCfg(proportion=1.0),  # 100% 平地
    },
)

与 Velocity 的区别：Velocity 使用混合纹理地形，CPG-Flat 专注于平坦地形，隔离 CPG 模块的验证。

4.2 观测空间（82D vs 78D）

观测项	维度	新增/变化
`base_ang_vel`	3D	—
`projected_gravity`	3D	—
`velocity_commands`	3D	—
`joint_pos_rel`	23D	—
`joint_vel_rel`	23D	—
`last_action`	23D	—
`cpg_phase`	4D	新增（CPG 4D 相位编码）

Critic 观测：同 Velocity，78D（privileged base_lin_vel）

5. 奖励函数设计（CPG 特异项）

5.1 奖励分组对比

CPG-Flat 在 Velocity 基础上新增 2 项 CPG 特异奖励，替换/增强了部分原有力学惩罚。

新增奖励项

奖励项	权重	函数	物理意义
`cpg_residual`	-0.1	`cpg_residual_penalty`	惩罚大残差，鼓励依赖 CPG
`cpg_phase_alignment`	+0.3	`cpg_gait_phase_alignment`	奖励触地与 CPG 相位同步

保留的 Velocity 奖励项

分组	保留项	权重变化
Task	`track_lin_vel_xy`, `track_ang_vel_z`, `alive`	不变
Base	`base_linear_velocity`, `base_angular_velocity`	不变
Joint	`joint_vel`, `joint_acc`, `action_rate`, `dof_pos_limits`, `energy`	不变
Posture	`joint_deviation_arms/waists/legs`, `flat_orientation_l2`, `base_height`	不变
Feet	`gait`, `feet_slide`, `feet_clearance`, `undesired_contacts`	不变

5.2 奖励项详解

`cpg_residual_penalty`

def cpg_residual_penalty(env):
    # L2 范式衡量原始（未缩放）残差动作幅度
    return sum(residual²)

设计意图：残差越小，说明策略越依赖 CPG 生成的周期性基准，而非完全从头学习步态。当残差趋向 0 时，机器人按纯 CPG 轨迹运动。

调参建议：

权重	效果
-0.5（过强）	压制学习，策略无法修正 CPG 错误
-0.1（默认）	平衡，鼓励使用 CPG 但保留修正空间
-0.01（过弱）	策略忽视 CPG，回到纯学习模式

`cpg_gait_phase_alignment`

def cpg_gait_phase_alignment(env, sensor_cfg, threshold=0.55):
    # 读取 CPG 实际相位
    phase = cpg.phase  # [0, 1)
    phi_left = phase
    phi_right = (phase + 0.5) % 1.0

    # 当相位 < threshold 时，该腿应为支撑相（触地）
    # 当相位 >= threshold 时，该腿应为摆动相（腾空）
    is_stance_left = phi_left < threshold
    is_stance_right = phi_right < threshold

    # 奖励实际触地与预期相位匹配
    reward =agreement(stance, contact)

与 feet_gait 的区别：

对比项	`feet_gait`	`cpg_gait_phase_alignment`
相位来源	episode_length_buf 计算	CPG 振荡器实际相位
同步性	可能与动作不同步	与 CPG 动作严格同步
作用	诱导周期性步态	强化 CPG 指定的时序

threshold=0.55 意味着：相位 [0, 0.55) 为支撑相（50%+），[0.55, 1.0) 为摆动相。

6. 超参数配置

6.1 CPG 专用超参数

超参数	默认值	调整范围	影响
`cpg_period`	0.8	0.5~1.5	周期越短步频越快，0.8≈75步/分钟
`cpg_hip_amplitude`	0.25	0.15~0.35	幅度越大抬腿越高
`cpg_knee_amplitude`	0.30	0.20~0.40	膝关节弯曲程度
`cpg_ankle_amplitude`	0.15	0.10~0.20	踝关节补偿量
`cpg_left_right_offset`	0.5	固定	0.5=对角步态，0.25=同侧小跑
`cpg_residual` 权重	-0.1	-0.5~-0.01	见 5.2 调参建议

6.2 CPG 幅度与步态关系

CPG 幅度过大的表现（> 0.4 rad）：
  → 步态夸张，耗能增加
  → 策略难以修正，训练初期易摔倒

CPG 幅度不足的表现（< 0.15 rad）：
  → 步态接近无，策略回到从头学习
  → CPG 失去意义

6.3 PPO 超参数（与 Velocity 相同）

policy=RslRlPpoActorCriticCfg(
    init_noise_std=1.0,           # 初始探索
    actor_hidden_dims=[512, 256, 128],
    critic_hidden_dims=[512, 256, 128],
    activation="elu",
)

algorithm=RslRlPpoAlgorithmCfg(
    clip_param=0.2,
    entropy_coef=0.01,
    learning_rate=1e-3,
    gamma=0.99,
    lam=0.95,
    desired_kl=0.01,
    num_learning_epochs=5,
    num_mini_batches=4,
)

7. 训练命令与监控

7.1 训练命令

source ~/miniconda3/etc/profile.d/conda.sh && \
conda activate unitree_sim_env && \
source /home/robot/0_lerobot/IsaacLab/_isaac_sim/setup_conda_env.sh && \
python scripts/rsl_rl/train.py \
    --task Unitree-G1-23dof-CPG-Flat \
    --num_envs 4096 \
    --max_iterations 50000 \
    --headless

7.2 TensorBoard 监控

tensorboard --logdir logs/rsl_rl/ --port 6006

7.3 关键监控指标

指标	健康范围	含义
`Mean reward`	单调上升	总奖励收敛
`Episode_Reward/cpg_residual`	趋向减小	残差变小=更依赖 CPG
`Episode_Reward/cpg_phase_alignment`	趋向增大	触地与 CPG 相位同步
`Episode_Reward/track_lin_vel_xy`	> 0.5	速度跟踪质量
`Episode_Reward/gait`	> 0.3	步态周期同步
`Mean episode length`	> 500 (=10s)	存活时长

7.4 排查清单

问题现象	可能原因	解决方案
机器人立即倒下	CPG 幅度过大	减小 `cpg_hip_amplitude` 至 0.15
reward 不上升	`cpg_residual` 权重过强	改为 -0.2 或 -0.1
noise_std 发散	学习率过高	降低至 5e-4
步态不规律	`cpg_phase_alignment` 无正贡献	调高权重或检查相位同步
抬脚不足	`feet_clearance` 权重不足	适当增加权重

8. Phase 1 完成记录

8.1 创建的文件

文件	用途
`modules/cpg.py`	CPG 振荡器模块（纯 PyTorch）
`tasks/locomotion/mdp/cpg_actions.py`	CPG-Residual 动作 term
`tasks/locomotion/robots/g1/23dof/cpg_flat_env_cfg.py`	平地训练环境配置

8.2 修改的文件

文件	修改内容
`tasks/locomotion/mdp/observations.py`	新增 `cpg_phase_from_action()`
`tasks/locomotion/mdp/rewards.py`	新增 `cpg_residual_penalty()`, `cpg_gait_phase_alignment()`
`tasks/locomotion/mdp/__init__.py`	导出 CPG action 类
`robots/g1/23dof/__init__.py`	注册 `Unitree-G1-23dof-CPG-Flat`
`modules/__init__.py`	导出 `HumanoidCPG`
`utils/export_deploy_cfg.py`	修复 float scale bug

8.3 Phase 2 待办

[ ] CPG 参数变为策略可学习输出（4D action: omega, A_hip, A_knee, stance_ratio）
[ ] 添加 height_scanner 到 actor observations
[ ] 添加地形课程学习（flat → slopes → stairs）
[ ] 从 Phase 1 checkpoint 热启动

版本记录

版本	日期	修改内容	作者
V1.0	2026-04-12	初始版本，CPG-Flat 完整配置规范	AI Assistant

本文档由 AI 辅助整理自 unitree_lab_locomotion 仓库源码

← 上一篇

G1-23dof 强化学习训练文档 Phase 1 总结

G1-23dof Blind-Teacher 盲教师任务配置规范 (V0~V2)

← 返回博客列表