云浮市网站建设_网站建设公司_在线商城_seo优化
2025/12/22 0:43:28 网站建设 项目流程

环境配置

本文示例基于verl v0.5开发,环境配置详见这篇博客

数据准备

  1. 下载数据
python examples/data_preprocess/gsm8k_tool_agent_loop.py --local-dir <data-path>

一个示例数据为:

{"data_source": "openai/gsm8k","agent_name": "tool_agent","prompt": [{"content": "You are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the `calc_gsm8k_reward` tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.","role": "system"},{"content": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? Let's think step by step and output the final answer after `####`.","role": "user"}],"ability": "math","reward_model": {"ground_truth": "72","style": "rule"},"extra_info": {"answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72","index": 0,"interaction_kwargs": {"ground_truth": "72","query": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? Let's think step by step and output the final answer after `####`."},"need_tools_kwargs": true,"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?","split": "train","tools_kwargs": {"calc_gsm8k_reward": {"create_kwargs": {"ground_truth": "72"}}}}
}

和普通对话的数据格式相比,主要是多了一个agent_name字段。

在AgentLoopWorker类中,要根据agent-name找到对应的AgentLoop类并执行。

    async def _run_agent_loop(self,agent_name: str,messages: list[dict[str, Any]],sampling_params: dict[str, Any],trajectory: dict[str, Any],) -> AgentLoopOutput:with rollout_trace_attr(step=trajectory["step"],sample_index=trajectory["sample_index"],rollout_n=trajectory["rollout_n"],validate=trajectory["validate"],name="agent_loop",):assert agent_name in _agent_loop_registry, (f"Agent loop {agent_name} not registered, registered agent loops: {_agent_loop_registry.keys()}")agent_loop_config = _agent_loop_registry[agent_name]agent_loop = hydra.utils.instantiate(config=agent_loop_config,trainer_config=_DummyConfig(config=self.config),server_manager=self.server_manager,tokenizer=self.tokenizer,)output = await agent_loop.run(messages, sampling_params)return output

配置文件与开始训练

基本命令如下:

bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_tool_agent_mlflow.sh

这里面需要修改几个参数

  • actor_rollout_ref.model.path:改为自己的模型配置
  • actor_rollout_ref.rollout.name 从sglang改为vlllm(sglang和verl的环境不太好装,如果可以run起来,那么sglang会比vllm更快)
  • data.train_files:训练集的地址
  • data.val_files:验证集的地址
  • actor_rollout_ref.rollout.multi_turn.tool_config_path 改为自己工具的定义
    工具定义的一个例子如下:
tools:- class_name: "verl.tools.gsm8k_tool.Gsm8kTool"config: type: nativetool_schema:type: "function"function:name: "calc_gsm8k_reward"description: "A tool for calculating the reward of gsm8k. (1.0 if parsed answer is correct, 0.0 if parsed answer is incorrect or not correctly parsed)"parameters:type: "object"properties:answer:type: "string"description: "The model's answer to the GSM8K math problem, must be a digits"required: ["answer"]

运行评估

只需要加上这两行,就可以使用AgentLoop运行测评集了

trainer.val_before_train=True \ # 进行训练前验证 
trainer.val_only=True # 仅验证模式

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询