人形机器人强化学习真机部署

人形强化学习部署

王于之

2383人浏览 · 2025-06-04 14:55:00

王于之 · 2025-06-04 14:55:00 发布

1 前言

仿真训练的开源已经有很多，如何将训练好的策略部署到自己的机器人上，需要自己解决。

当然，也有一些部署或包括部署的开源，例如智元的，迪士尼的。

使用宇树的机器人是比较容易的，因为有丰富的接口，直接调宇树自己的开源就行，一般插网线拿自己电脑做主控运行就行，当然也有直接部署在宇树自己的控制器的方法。

使用自己的机器人时，需要考虑如何将控制器的内容，通过各种规定的通讯协议发到电机，并且是多个电机，同时，要接入遥控器，IMU等，需要对系统的控制和各传感器、电机内容等非常熟悉。

本文尽量使用最简单基础的方法：电机使用CAN通讯，下位机使用STM32，上位机先直接用电脑，等打通了后可以改移植到JETSON类似的主控。电脑与32直接采用UDP通讯。

要部署策略，就需要推理环境，输入输出。表面看起来与sim2sim 类似，甚至可以直接拿过来改下，但实际会有诸多问题（状态切换与控制、真机的关节到电机传动、真机的反馈、传感器的反馈、实时性、安全问题）。

本文基于rl_sar进行修改，是来自B站大七妙妙屋的开源。本文更希望直接简单的解决问题，而不是一层套一层的类和配置。

2 rl_sar解释

1 有real 和gazebo sim 两个部分。

2 sim 使用ROS2环境，接入gazebo节点，并获取反馈。

3 运行逻辑：3个周期线程，（机器人控制、模型推理、键盘），其余的与gazebo沟通靠ROS2帆布订阅。

    //控制RobotControl以dt时间运行
    this->loop_control = std::make_shared<LoopFunc>("loop_control", this->params.dt, std::bind(&RL_Sim::RobotControl, this));
    //RL的控制周期是dt * decimation ,RunModel
    this->loop_rl = std::make_shared<LoopFunc>("loop_rl", this->params.dt * this->params.decimation, std::bind(&RL_Sim::RunModel, this));
    this->loop_control->start();
    this->loop_rl->start();
    // keyboard
    //KeyboardInterface()键盘
    this->loop_keyboard = std::make_shared<LoopFunc>("loop_keyboard", 0.05, std::bind(&RL_Sim::KeyboardInterface, this));
    this->loop_keyboard->start();

先看怎么推理的：就是获得观测，给到模型，模型输出动作（关节位置等），然后算力矩下发。

void RL_Sim::RunModel()
{
    if (this->running_state == STATE_RL_RUNNING && simulation_running)
    {
        //转为策略模型需要的张量形式
        // this->obs.lin_vel NOT USE
        this->obs.ang_vel = torch::tensor(this->robot_state.imu.gyroscope).unsqueeze(0);
        // this->obs.commands = torch::tensor({{this->cmd_vel.linear.x, this->cmd_vel.linear.y, this->cmd_vel.angular.z}});
        this->obs.commands = torch::tensor({{this->control.x, this->control.y, this->control.yaw}});
        this->obs.base_quat = torch::tensor(this->robot_state.imu.quaternion).unsqueeze(0);
        this->obs.dof_pos = torch::tensor(this->robot_state.motor_state.q).narrow(0, 0, this->params.num_of_dofs).unsqueeze(0);
        this->obs.dof_vel = torch::tensor(this->robot_state.motor_state.dq).narrow(0, 0, this->params.num_of_dofs).unsqueeze(0);
        
        //在这里计算策略输出
        torch::Tensor clamped_actions = this->Forward();

        for (int i : this->params.hip_scale_reduction_indices)
        {
            clamped_actions[0][i] *= this->params.hip_scale_reduction;
        }
        //观测需要知道上次的动作
        this->obs.actions = clamped_actions;
        
        //计算下发力矩
        torch::Tensor origin_output_torques = this->ComputeTorques(this->obs.actions);

        // this->TorqueProtect(origin_output_torques);
       //力矩限制，注意:f要看本代码的RL运行时的下发，到底输出了力矩还是位置
        this->output_torques = torch::clamp(origin_output_torques, -(this->params.torque_limits), this->params.torque_limits);
        this->output_dof_pos = this->ComputePosition(this->obs.actions);

直接读文件，加载

        # model
        model_path = os.path.join(get_package_share_directory('rl_sar'), 'models', self.robot_name, self.params.model_name)
        self.model = torch.jit.load(model_path)

输入的维度需要匹配。例如在humanoid-gym的开源中，其对actor网络使用了14个历史信息+当前状态观测，共15*47 = 705维，因此输入需要匹配。

torch::Tensor RL_Sim::Forward()
{
    torch::autograd::GradMode::set_enabled(false);

    torch::Tensor clamped_obs = this->ComputeObservation();

    torch::Tensor actions;
    if (this->params.observations_history.size() != 0)
    {
         // 存入环形缓冲区
        this->history_obs_buf.insert(clamped_obs);
        this->history_obs = this->history_obs_buf.get_obs_vec(this->params.observations_history);
        actions = this->model.forward({this->history_obs}).toTensor();
    }
    //直接推理
    else
    {
        actions = this->model.forward({clamped_obs}).toTensor();
    }
}

机器人控制循环（读状态->控制逻辑操作->下发）

    //控制循环
    if (simulation_running)
    {
        //计数
        this->motiontime++;
        //读取从Gazebo获得的状态，其数据通过订阅的形式，获得为成员变量
        this->GetState(&this->robot_state);
        //状态机运行，里面有关节数据的获取，和数据的下发
        //里面明确了要发送什么数据下去 RL的输出被赋值到output_dof_pos下发
        this->StateController(&this->robot_state, &this->robot_command);
        //这里是给仿真环境发消息
        this->SetCommand(&this->robot_command);
    }

核心与传统机器人差不多，只不过传统的步态调度、规划、各种位置、速度层，基于模型的算法层都被RL策略代替。

看看状态机：

写的很简洁，对单一任务很实用。用当前运行状态和指令来控制。

用快照来保存一些需要记录的位置。

逻辑：机器人上电后，先到一个定义的默认位置去，在这个位置进入RL控制（进入前的一个周期是一个RL初始化准备过程），退出时，直接回到开始的状态。

按时间进行线性插值，如果不顺，可以自己修改多项式轨迹。

小bug: GET——UP阶段， if (this->control.control_state == STATE_RL_INIT), 应该改为else if

同时，对float值修改为区间的等号。

void RL::StateController(const RobotState<double> *state, RobotCommand<double> *command)
{
    static RobotState<double> start_state; // 记录过渡起始状态
    static RobotState<double> now_state; // 当前过渡中的状态
    static float getup_percent = 0.0;// 起身动作完成度 (0~1)
    static float getdown_percent = 0.0;  // 趴下动作完成度 (0~1)

    // waiting，当前状态是
    if (this->running_state == STATE_WAITING)
    {
        //等待状态，下发命令等于反馈
        for (int i = 0; i < this->params.num_of_dofs; ++i)
        {
            command->motor_command.q[i] = state->motor_state.q[i];
        }

        //起身
        if (this->control.control_state == STATE_POS_GETUP)
        {
            this->control.control_state = STATE_WAITING;//控制状态继续改为等待
            getup_percent = 0.0;
            for (int i = 0; i < this->params.num_of_dofs; ++i)
            {
                now_state.motor_state.q[i] = state->motor_state.q[i]; // 当前状态快照
                start_state.motor_state.q[i] = now_state.motor_state.q[i];// 起始状态备份
            }
            this->running_state = STATE_POS_GETUP;//当前运行的状态改为起身
            std::cout << std::endl << LOGGER::INFO << "Switching to STATE_POS_GETUP" << std::endl;
        }
    }
    // stand up (position control)
    //起身状态了，进入RL前的状态
    else if (this->running_state == STATE_POS_GETUP)
    {
        if (getup_percent < 1.0)
        {
            getup_percent += 1 / 500.0;//500次循环起身
            getup_percent = getup_percent > 1.0 ? 1.0 : getup_percent;
            for (int i = 0; i < this->params.num_of_dofs; ++i)
            {
                //这一操作，将关节位置，从当前位置逐渐转为定义的default位置。
                command->motor_command.q[i] = (1 - getup_percent) * now_state.motor_state.q[i] + getup_percent * this->params.default_dof_pos[0][i].item<double>();
                command->motor_command.dq[i] = 0;
                //这里是1行N列的数据，以匹配不同KP
                command->motor_command.kp[i] = this->params.fixed_kp[0][i].item<double>();
                command->motor_command.kd[i] = this->params.fixed_kd[0][i].item<double>();
                command->motor_command.tau[i] = 0;
            }
            std::cout << "\r" << std::flush << LOGGER::INFO << "Getting up " << std::fixed << std::setprecision(2) << getup_percent * 100.0 << std::flush;
        }
        //在起身状态下，控制为初始化RL
        if (this->control.control_state == STATE_RL_INIT)
        {
            //控制命令重置为待机
            this->control.control_state = STATE_WAITING;
            //运行状态出初始
            this->running_state = STATE_RL_INIT;
            std::cout << std::endl << LOGGER::INFO << "Switching to STATE_RL_INIT" << std::endl;
        }
        //命令为趴下，自然状态。
        else if (this->control.control_state == STATE_POS_GETDOWN)
        {
            this->control.control_state = STATE_WAITING;
            getdown_percent = 0.0;
            for (int i = 0; i < this->params.num_of_dofs; ++i)
            {
                now_state.motor_state.q[i] = state->motor_state.q[i];//当前位置快照
            }
            //
            this->running_state = STATE_POS_GETDOWN;
            std::cout << std::endl << LOGGER::INFO << "Switching to STATE_POS_GETDOWN" << std::endl;
        }
    }
    // init obs and start rl loop
    //当运行状态处于RL初始化时，
    else if (this->running_state == STATE_RL_INIT)
    {
        //如果已经处于起身状态（预备）。
        if (getup_percent == 1)
        {
            //这个时候才初始化观测，
            this->InitObservations();//注：这里直接输入的观测位置不是0，是default位置。
            this->InitOutputs();//初始输出也是default位置。
            this->InitControl();//初始命令都是0.
            //运行状态切换为RL模式
            this->running_state = STATE_RL_RUNNING;
            std::cout << std::endl << LOGGER::INFO << "Switching to STATE_RL_RUNNING" << std::endl;
        }
    }
    // rl loop
    else if (this->running_state == STATE_RL_RUNNING)
    {
        //打印指令
        std::cout << "\r" << std::flush << LOGGER::INFO << "RL Controller x:" << this->control.x << " y:" << this->control.y << " yaw:" << this->control.yaw << std::flush;
        for (int i = 0; i < this->params.num_of_dofs; ++i)
        {
            //输出位置
            command->motor_command.q[i] = this->output_dof_pos[0][i].item<double>();
            command->motor_command.dq[i] = 0;
            command->motor_command.kp[i] = this->params.rl_kp[0][i].item<double>();
            command->motor_command.kd[i] = this->params.rl_kd[0][i].item<double>();
            command->motor_command.tau[i] = 0;
        }
        //如果要切换初始趴下，切换状态，记录当前位置快照，然后差值回到
        if (this->control.control_state == STATE_POS_GETDOWN)
        {
            this->control.control_state = STATE_WAITING;
            getdown_percent = 0.0;
            for (int i = 0; i < this->params.num_of_dofs; ++i)
            {
                now_state.motor_state.q[i] = state->motor_state.q[i];//记录当前位置快照
            }
            this->running_state = STATE_POS_GETDOWN;
            std::cout << std::endl << LOGGER::INFO << "Switching to STATE_POS_GETDOWN" << std::endl;
        }
        //如果要回到进入RL前的状态。
        else if (this->control.control_state == STATE_POS_GETUP)
        {
            this->control.control_state = STATE_WAITING;
            getup_percent = 0.0;
            for (int i = 0; i < this->params.num_of_dofs; ++i)
            {
                now_state.motor_state.q[i] = state->motor_state.q[i];
            }
            this->running_state = STATE_POS_GETUP;
            std::cout << std::endl << LOGGER::INFO << "Switching to STATE_POS_GETUP" << std::endl;
        }
    }
    // get down (position control)
    //直接回到开机状态
    else if (this->running_state == STATE_POS_GETDOWN)
    {
        if (getdown_percent < 1.0)
        {
            getdown_percent += 1 / 500.0;
            getdown_percent = getdown_percent > 1.0 ? 1.0 : getdown_percent;
            for (int i = 0; i < this->params.num_of_dofs; ++i)
            {
                command->motor_command.q[i] = (1 - getdown_percent) * now_state.motor_state.q[i] + getdown_percent * start_state.motor_state.q[i];
                command->motor_command.dq[i] = 0;
                command->motor_command.kp[i] = this->params.fixed_kp[0][i].item<double>();
                command->motor_command.kd[i] = this->params.fixed_kd[0][i].item<double>();
                command->motor_command.tau[i] = 0;
            }
            std::cout << "\r" << std::flush << LOGGER::INFO << "Getting down " << std::fixed << std::setprecision(2) << getdown_percent * 100.0 << std::flush;
        }
        //这里注意：如果回到开机状态，把控制指令都初始化一下
        if (getdown_percent == 1)
        {
            this->InitObservations();
            this->InitOutputs();
            this->InitControl();
            this->running_state = STATE_WAITING;
            std::cout << std::endl << LOGGER::INFO << "Switching to STATE_WAITING" << std::endl;
        }
    }
}

自己开发最麻烦的部分在于各个依赖的环境配置，张量的各种转换，强化学习逻辑的充分理解，及需要单独实验一些部分的运行是正确的。这些环节rl_sar已经解决，感谢。

3部署到自己的机器人上（）

先运行该开源，能将狗在gazebo跑起来。foxy和humble都可以。

先部署到gazabo上是有必要的，能够检查各逻辑BUG.

为了避免在gazebo上修改xacro文件浪费时间，本文先将自己的模型人形下肢12关节模型映射到go2的gazebo上。

直接将模型扔到go2_isaacgym里面，接下来修改它的配置文件。以下是按humanoid_gym默认的配置来修改的，默认位置是机器人的蹲姿。

go2_isaacgym:
  model_name: "policy_1.pt"
  framework: "isaacgym"
  rows: 4
  cols: 3
  dt: 0.01
  decimation: 1
  num_observations: 47
  observations: ["phase","commands", "dof_pos", "dof_vel", "actions", "angle_vel_body", "base_ang"]
  observations_history: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
  clip_obs: 18.0
  clip_actions_lower: [-18, -18, -18,
                       -18, -18, -18,
                       -18, -18, -18,
                       -18, -18, -18]
  clip_actions_upper: [18, 18, 18,
                       18, 18, 18,
                       18, 18, 18,
                       18, 18, 18]
  rl_kp: [40, 40, 40,
          40, 40, 40,
          40, 40, 40,
          40, 40, 40]
  rl_kd: [1, 1, 1,
          1, 1, 1,
          1, 1, 1,
          1, 1, 1]
  fixed_kp: [60, 60, 60,
             60, 60, 60,
             60, 60, 60,
             60, 60, 60]
  fixed_kd: [5, 5, 5,
             5, 5, 5,
             5, 5, 5,
             5, 5, 5]
  hip_scale_reduction: 1.0
  hip_scale_reduction_indices: [0, 6]
  num_of_dofs: 12
  action_scale: 0.25
  lin_vel_scale: 2.0
  ang_vel_scale: 0.25
  dof_pos_scale: 1.0
  dof_vel_scale: 0.05
  commands_scale: [2.0, 2.0, 0.25]
  torque_limits: [333.5, 333.5, 333.5,
                  333.5, 333.5, 333.5,
                  333.5, 333.5, 333.5,
                  333.5, 333.5, 333.5]
  default_dof_pos: [ 0.000, 0.0, -0.2848,
                    0.6416, -0.3568, 0.0,
                    0.000, 0.0, -0.2848,
                    0.6416, -0.3568, 0.0]
  joint_controller_names: ["FL_hip_controller", "FL_thigh_controller", "FL_calf_controller",
                           "FR_hip_controller", "FR_thigh_controller", "FR_calf_controller",
                           "RL_hip_controller", "RL_thigh_controller", "RL_calf_controller",
                           "RR_hip_controller", "RR_thigh_controller", "RR_calf_controller"]

其中，尤为重要的是，对齐观测，及缩放等。需要与训练一致。

观测直接在代码里改：

torch::Tensor RL::ComputeObservation()
{
    std::vector<torch::Tensor> obs_list;
    obs_list.push_back(this->obs.phase); 
    obs_list.push_back(this->obs.commands * this->params.commands_scale);         
    obs_list.push_back((this->obs.dof_pos - this->params.default_dof_pos) * this->params.dof_pos_scale);        
    obs_list.push_back(this->obs.dof_vel * this->params.dof_vel_scale);   
    obs_list.push_back(this->obs.actions);            
    obs_list.push_back(this->obs.ang_vel * this->params.ang_vel_scale);
    obs_list.push_back(this->obs.base_ang);   
    torch::Tensor obs = torch::cat(obs_list, 1);
    torch::Tensor clamped_obs = torch::clamp(obs, -this->params.clip_obs, this->params.clip_obs);
    return clamped_obs;
}

在void RL_Sim::RunModel()这里也要对齐一下，

如果使用了相位，自己从进入RL状态利用计数开始算，张量维度对齐就好：

this->obs.phase = torch::tensor({0.0, 0.0}).unsqueeze(0); // [1,2]

如果使用了RPY而不是四元数，则自己转一下，当然还是需要对齐一下。

还是运行原go2仿真，能看到狗在仿真中有节奏的乱动，说明模型换成功。

接下来部署真机：使用ROS2

将go2的real 复制，按sim 改成ROS2 节点的方式，不要gazobo, 也不要宇树的sdk内容。

为了保持通用性，该节点不考虑真机的内容，还按仿真的逻辑：

   // publisher
    this->robot_command_publisher = this->create_publisher<robot_msgs::msg::RobotCommand>(
        this->ros_namespace + "robot_joint_controller/command", rclcpp::SystemDefaultsQoS());
    // subscriber
    this->lunqu_imu_subscriber = this->create_subscription<sensor_msgs::msg::Imu>(
        "/imu", rclcpp::SystemDefaultsQoS(), [this] (const sensor_msgs::msg::Imu::SharedPtr msg) {this->LunquImuCallback(msg);}
    );
    this->robot_state_subscriber = this->create_subscription<robot_msgs::msg::RobotState>(
        this->ros_namespace + "robot_joint_controller/state", rclcpp::SystemDefaultsQoS(),
        [this] (const robot_msgs::msg::RobotState::SharedPtr msg) {this->RobotStateCallback(msg);}
    );

    // loop
    this->loop_keyboard = std::make_shared<LoopFunc>("loop_keyboard", 0.05, std::bind(&RL_Real::KeyboardInterface, this));
    //this->loop_control = std::make_shared<LoopFunc>("loop_control", this->params.dt, std::bind(&RL_Real::RobotControl, this));
    //test
    this->loop_control = std::make_shared<LoopFunc>("loop_control", 0.01, std::bind(&RL_Real::RobotControl, this));
    this->loop_rl = std::make_shared<LoopFunc>("loop_rl", this->params.dt * this->params.decimation, std::bind(&RL_Real::RunModel, this));
    this->loop_keyboard->start();
    this->loop_control->start();
    this->loop_rl->start();

订阅了机器人关节信息，发布了期望的关节信息。同时订阅了IMU的信息。

自定义的消息很全，不需要修改

本文使用的IMU直接将ROS2包放入该项目下，

需要仔细看IMU代码确定，并实验打印观察：XYZ轴是否对齐，正方向是否对齐，四元数的顺序是否符合期望。

注意修改cmakelist

新建一个包，用来控制真机，使用UDP与32通讯。

先订阅RL给的指令，发布获得的真机的关节反馈：

同样开个线程周期控制机器人。

    // publisherRobotState
    this->robot_state_publisher = this->create_publisher<robot_msgs::msg::RobotState>(
    this->ros_namespace + "robot_joint_controller/state", rclcpp::SystemDefaultsQoS());
    // subscriber
    this->robot_command_subscriber = this->create_subscription<robot_msgs::msg::RobotCommand>(
        this->ros_namespace + "robot_joint_controller/command", rclcpp::SystemDefaultsQoS(),
        [this] (const robot_msgs::msg::RobotCommand::SharedPtr msg) {this->RobotCommandCallback(msg);}
    );
    //
    this->loop_control = std::make_shared<LoopFunc>("loop_control", ConfigRealCycleTime, std::bind(&Mera::RobotControl, this));
    this->loop_control->start();

10 UDP

    //===================UDP
    // 初始化本机地址 (192.168.1.105)
    local_addr_.sin_family = AF_INET;
    local_addr_.sin_port = htons(2042);// 系统自动分配端口
    inet_pton(AF_INET, "192.168.1.105", &local_addr_.sin_addr);

    // 初始化目标地址 (192.168.1.142:2042)
    target_addr_.sin_family = AF_INET;
    target_addr_.sin_port = htons(2042);
    inet_pton(AF_INET, "192.168.1.240", &target_addr_.sin_addr);
        // 创建UDP Socket
        if ((sockfd_ = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
            RCLCPP_FATAL(get_logger(), "Socket creation failed: %s", strerror(errno));
            rclcpp::shutdown();
        }
    
        // 绑定本机地址
        if (bind(sockfd_, (struct sockaddr*)&local_addr_, sizeof(local_addr_)) < 0) {
            RCLCPP_FATAL(get_logger(), "Bind failed: %s", strerror(errno));
            close(sockfd_);
            rclcpp::shutdown();
        }
    
        RCLCPP_INFO(get_logger(), 
            "Socket initialized [Local: %s:%d] [Target: %s:%d]",
            inet_ntoa(local_addr_.sin_addr), ntohs(local_addr_.sin_port),
            inet_ntoa(target_addr_.sin_addr), ntohs(target_addr_.sin_port)
        );
        const char *iface = "enp5s0";
        if (setsockopt(sockfd_, SOL_SOCKET, SO_BINDTODEVICE, iface, strlen(iface)) < 0) {
            RCLCPP_ERROR(get_logger(), "Bind to enp5s0 failed: %s", strerror(errno));
        }
    initUdpReceiver();

按自己定义的格式下发。

例如两个字节表示一个位置，最高位表示正负，那么能表示正负32768的值。

32 按定义格式接收。

下面给出接收调试函数

/**
 * @brief 调试函数：打印所有接收到的UDP原始数据（16进制格式）
 * @param recv_sockfd 已绑定的UDP socket文件描述符
 */
void MYROBOT::debugPrintUdpData(int recv_sockfd) {
    constexpr int BUFFER_SIZE = 1024*10; // 足够大的缓冲区
    uint8_t buffer[BUFFER_SIZE];
    struct sockaddr_in src_addr;
    socklen_t addr_len = sizeof(src_addr);

    // 设置1秒接收超时
    struct timeval tv;
    tv.tv_sec = 1;  // 秒级超时
    tv.tv_usec = 0;
    if (setsockopt(recv_sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) < 0) {
        RCLCPP_ERROR(get_logger(), "Set timeout failed: %s", strerror(errno));
    }

    while (rclcpp::ok()) {
        // 接收数据（带1秒超时）
        ssize_t recv_len = recvfrom(recv_sockfd, buffer, BUFFER_SIZE,
                                   0, (struct sockaddr*)&src_addr, &addr_len);

        if (recv_len > 0) {
            // 打印来源信息
            RCLCPP_INFO(get_logger(), 
                "Recv %zd bytes from %s:%d", 
                recv_len,
                inet_ntoa(src_addr.sin_addr), 
                ntohs(src_addr.sin_port));

            // 打印16进制数据（每行16字节）
            std::stringstream hex_ss;
            hex_ss << std::hex << std::setfill('0');
            for (int i = 0; i < recv_len; ++i) {
                hex_ss << std::setw(2) << static_cast<int>(buffer[i]) << " ";
                if ((i + 1) % 16 == 0) hex_ss << "\n";
            }

            RCLCPP_INFO(get_logger(), "Hex data:\n%s", hex_ss.str().c_str());
        }
        else if (recv_len < 0) {
            if (errno == EAGAIN || errno == EWOULDBLOCK) {
                RCLCPP_DEBUG(get_logger(), "Receive timeout (1s), retrying...");
            } 
            else if (errno == EBADF) {
                RCLCPP_ERROR(get_logger(), "Socket closed, exiting...");
                break;
            }
            else {
                RCLCPP_ERROR(get_logger(), "Recv error: %s", strerror(errno));
            }
        }
    }
}

注：需要连接网线后修改电脑的IP，发送与接收对齐。

下面给出发送示例：

void MYROBOT::sendUdpData() {
    // ================= 构造56字节数据 =================
    uint8_t send_buffer[56]; // 总数据包
    // (1) 前4字节：独立状态（每个字节1个状态）
    send_buffer[0] = this->low_FSM_cmd;//0x01; // 状态1（示例：0x01表示正常）
    send_buffer[1] = 0x7F; // 状态2（示例：0x02表示电机使能）
    send_buffer[2] = 0x7F; // 状态3（示例：0x03表示急停状态）
    send_buffer[3] = 0x7F; // 状态4（示例：0x04表示通信状态）
    // (2) 后52字节：26个电机位置（每个int16_t，最高位表示正负）
    //std::cout << "cmd_to_stm32: " <<this->low_FSM_back<< std::endl;
    // ================ 后52字节：电机位置 ================
    for (int i = 0; i < 26 && i < MAX_MOTOR_CONFIG_NUM; ++i) {
        // (1) 将浮点位置转换为int16_t（带符号位）
        float pos_float = this->pos[i]; // 获取浮点位置
        //@ care about the number = nan
        if (std::isnan(pos_float)) {
            RCLCPP_WARN(get_logger(), "NaN in pos[%d], using 0.0", i);
            pos_float = this->last_pos_sendLow_NAN[i];
        }
        else{
            this->last_pos_sendLow_NAN[i]=this->pos[i];
        }
        int16_t pos_int;
        // 处理符号和数值范围
        if (pos_float >= 0) {
            pos_int = static_cast<int16_t>(pos_float); 
            pos_int &= 0x7FFF; // 确保最高位为0（正数）
        } else {
            pos_int = static_cast<int16_t>(-pos_float);
            pos_int |= 0x8000; // 设置最高位为1（负数）
        }
        // (2) 网络字节序转换
        pos_int = htons(pos_int); // 主机序→网络序
        // (3) 写入缓冲区
        memcpy(&send_buffer[4 + i*2], &pos_int, sizeof(int16_t));
    }
    // ================= 发送数据 =================
    //  const char send_buffer2[] = {'T', 'E', 'S', 'T'}; 
    //  ssize_t sent = sendto(
    //     sockfd_,
    //     send_buffer2,       // 数据指针
    //     sizeof(send_buffer2), // 固定56字节
    //     0,
    //     (struct sockaddr*)&target_addr_,
    //     sizeof(target_addr_)
    // );
   //=============================
    ssize_t sent = sendto(
        sockfd_,
        send_buffer,       // 数据指针
        sizeof(send_buffer), // 固定56字节
        0,
        (struct sockaddr*)&target_addr_,
        sizeof(target_addr_)
    );

    // ================= 错误处理 =================
    if (sent < 0) {
        RCLCPP_ERROR(get_logger(), "Send failed: %s", strerror(errno));
    } else {
        RCLCPP_DEBUG(get_logger(), 
            "Sent %zd bytes [States: %02X %02X %02X %02X]",
            sent, 
            send_buffer[0], send_buffer[1], 
            send_buffer[2], send_buffer[3]
        );
    }
}

UDP接收是阻塞线程，发送是主动的。

下位机系统需要上层指令进行状态切换，因此，真机节点需要状态机控制32的状态切换。

本文真机控制逻辑如下，

（从下位机获取真机反馈，需要进行电机数据到关节数据的转换）

（发布消息到RL节点）

（从RL节点订阅目标关节信息）

（计算下发到电机的数据，涉及到关节到电机传动、关节限位、电机限位、传动比、信息打包的缩放）

（控制下位机的状态切换）

（发布信息到下位机）

void MYROBOT::RobotControl()
{
    this->motiontime++;
    this->WriteStateFormLow();
    this->sendStateToRL();
    this->SetCommandForLow(); //from RL 

    this->ComputeMotorFromRL();//to stm23 
    this->FSMControl();

    this->sendUdpData();
}

13：

由于变量更新的原因，对于发布订阅，通讯，都使用独立的变量，不要在计算过程中使用这些变量，建立新的变量用于计算。

节点发布订阅、上位机下位机通讯将带来固定的延迟，两节点间应该使用内存共享或同步定时器等手段消除固定延迟。

类似G1机器人能允许策略动作20到40ms的延迟，需要训练时域随机化一下。

键盘控制可以按自己需要的改。

14 ：
连接真机后，打开RL节点，真机节点，IMU节点。直接在RL节点中键入。原代码为1 可实现下蹲，2 进入RL模式（WASD前后左右）3回到初始零点。