Intrinsic Self-Critique 应用于 MBE

版本: v2.0 (已实现)
更新日期: 2026-01-21
参考论文: Enhancing LLM Planning Capabilities through Intrinsic Self-Critique (arXiv:2512.24103v1)
状态: ✅ 已实现

概述

本文档描述如何将 Google DeepMind 提出的 Intrinsic Self-Critique（内在自我批评） 方法应用到米塞斯行为引擎 (MBE)，以提升路径生成和决策建议的质量。

实现状态

模块	状态	文件
Self-Critique 核心	✅ 已实现	`src/core/self_critique.py`
路径生成集成	✅ 已实现	`src/core/paths.py`
引擎集成	✅ 已实现	`src/core/engine.py`
米塞斯领域定义	✅ 已实现	`src/core/self_critique.py`

论文核心方法

核心算法

论文提出的 Intrinsic Self-Critique 方法包含三个关键步骤：

┌─────────────────────────────────────────────────────────────────┐
│                    Self-Critique 迭代流程                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   for step = 0 to k:                                            │
│       │                                                         │
│       ▼                                                         │
│   ┌─────────────────────────────────────────────────┐          │
│   │  Step 1: Plan Generation (计划生成)              │          │
│   │  - 根据问题定义和之前的失败记录生成方案          │          │
│   └─────────────────────────────────────────────────┘          │
│       │                                                         │
│       ▼                                                         │
│   ┌─────────────────────────────────────────────────┐          │
│   │  Step 2: Self-Critique (自我批评)                │          │
│   │  - LLM 评估自己生成的方案                        │          │
│   │  - 验证每个步骤的前置条件是否满足                │          │
│   │  - 可选: Self-Consistency 多次采样投票          │          │
│   └─────────────────────────────────────────────────┘          │
│       │                                                         │
│       ▼                                                         │
│   ┌─────────────────────────────────────────────────┐          │
│   │  Step 3: Revise (修正)                           │          │
│   │  - 如果方案通过验证 → 返回                       │          │
│   │  - 否则: 记录失败和批评，继续迭代               │          │
│   └─────────────────────────────────────────────────┘          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

关键要素

要素	说明	论文中的作用
领域定义 (Domain Definition)	包含动作、前置条件、效果	提供验证规则
自我批评提示词	包含逐步验证指令	确保全面检查
失败上下文	之前失败的方案和批评	避免重复错误
Self-Consistency	多次采样投票	提升评估准确性

实验效果

数据集	无 Self-Critique	有 Self-Critique	提升
Blocksworld 3-5	49.8%	89.3%	+79%
Blocksworld 3-7	57.2%	79.5%	+39%
Logistics	60.7%	93.2%	+54%
Mini-Grid	57.7%	75.2%	+30%

MBE 应用方案

1. 路径生成的自我批评 (核心应用)

MBE 的核心是 不舒适 → 愿望 → 路径 的行为分析。路径生成是最适合应用 Self-Critique 的场景：

┌─────────────────────────────────────────────────────────────────┐
│              路径生成 + Self-Critique 流程                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  用户输入: "我想换工作，但又担心风险"                             │
│                                                                 │
│  Step 1: 生成初始路径                                            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  路径1: 先面试几家试水 → 拿到offer再辞职                  │   │
│  │  路径2: 先存6个月生活费 → 裸辞寻找                       │   │
│  │  路径3: 内部转岗，降低风险                                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                         ↓                                       │
│  Step 2: Self-Critique (自我批评)                                │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  验证每条路径:                                            │   │
│  │                                                          │   │
│  │  路径1 检查:                                              │   │
│  │  ✅ 前置条件: 有时间准备面试? → 用户画像显示加班多 ⚠️     │   │
│  │  ✅ 可行性: 行业是否有需求? → 需补充行业信息              │   │
│  │  ✅ 风险评估: 被发现的风险? → 需要提醒                    │   │
│  │                                                          │   │
│  │  评估结果: 路径1 需要补充 "时间管理建议"                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                         ↓                                       │
│  Step 3: Revise (修正)                                           │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  优化后的路径1:                                           │   │
│  │  "考虑到您工作较忙，建议:                                 │   │
│  │   1. 先更新简历(周末2小时)                                │   │
│  │   2. 设置招聘平台提醒，被动接收机会                       │   │
│  │   3. 选择周末面试的公司优先..."                           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

实现代码

class PathGeneratorWithSelfCritique:
    """带自我批评的路径生成器"""
    
    async def generate_paths(
        self, 
        uneasiness: str, 
        desires: list[str], 
        user_profile: dict,
        max_iterations: int = 3
    ) -> list[dict]:
        """
        迭代生成和优化路径
        
        对应论文 Algorithm 1
        """
        context = []  # 失败的路径和批评记录
        
        for step in range(max_iterations):
            # Step 1: 生成路径 (Plan Generation)
            paths = await self._generate_paths(
                uneasiness=uneasiness,
                desires=desires,
                user_profile=user_profile,
                previous_failures=context
            )
            
            # Step 2: 自我批评 (Self-Critique)
            critique = await self._self_critique(
                paths=paths,
                user_profile=user_profile,
                domain_definition=MISES_BEHAVIOR_DOMAIN
            )
            
            # 如果所有路径都通过验证
            if critique["all_valid"]:
                return paths
            
            # Step 3: 记录失败，准备下一次迭代
            context.append({
                "paths": paths,
                "critique": critique,
                "issues": critique["issues"]
            })
        
        return paths  # 返回最后一次生成的路径
    
    async def _self_critique(
        self, 
        paths: list[dict], 
        user_profile: dict,
        domain_definition: str
    ) -> dict:
        """
        自我批评 - 验证每条路径
        
        参考论文的 Self-Critique prompt (A.2)
        """
        prompt = f"""
你是米塞斯行为学专家。请验证以下路径方案。

## 领域定义（米塞斯行为学）
{domain_definition}

## 用户画像
{json.dumps(user_profile, ensure_ascii=False)}

## 待验证的路径
{json.dumps(paths, ensure_ascii=False)}

## 验证步骤（对每条路径的每一步）：
1. 检查该步骤的前置条件是否满足（考虑用户的时间、资源、能力）
2. 验证是否符合用户的价值偏好（时间偏好、风险偏好）
3. 检查是否考虑了机会成本
4. 评估不确定性是否被充分说明

请逐步验证，不要省略。最后给出评估结论：
- "路径可行" / "路径有问题"
- 如有问题，说明具体问题和修正建议
"""
        return await self.llm.generate(prompt)

2. 愿望分析的自我批评

在分析用户"愿望"阶段，应用 Self-Critique 验证分析的深度和准确性：

DESIRE_CRITIQUE_PROMPT = """
## 用户表达
{user_input}

## 分析结果
- 表面愿望: {surface_desire}
- 深层需求: {deep_needs}
- 根本动机: {root_motivation}

## 验证清单：
1. 表面愿望是否准确捕捉了用户的直接诉求？
2. 深层需求是否挖掘到了用户未明说的真实需求？
3. 根本动机是否触及了米塞斯所说的"移除内心不安"？
4. 需求之间的因果链是否清晰？

请逐项验证，如有遗漏或错误，指出并给出修正。
"""

3. 与 TITANS 惊喜检测器结合

论文的 Self-Critique 可以增强 MBE 现有的 TITANS 惊喜检测器：

class EnhancedSurpriseDetector:
    """增强的惊喜检测器 - 结合 Self-Critique"""
    
    def compute_surprise(
        self, 
        prediction: str, 
        actual: str,
        critique_result: dict
    ) -> float:
        """
        惊喜度 = 基础惊喜度 + 自我批评惊喜度
        """
        # 原有的惊喜度计算
        base_surprise = 1 - cosine_similarity(prediction, actual)
        
        # 如果 Self-Critique 发现了问题，增加惊喜度
        if critique_result["has_issues"]:
            critique_surprise = len(critique_result["issues"]) * 0.1
            return min(1.0, base_surprise + critique_surprise)
        
        return base_surprise

效果：当 Self-Critique 发现路径有问题时，TITANS 会更积极地学习和更新记忆。

4. MoE 专家回答的 Self-Consistency

论文使用 Self-Consistency（自洽性投票） 来提升 Self-Critique 的准确性。这可以应用到 MBE 的 MoE 专家系统：

class ExpertAnswerWithSelfConsistency:
    """专家回答 + Self-Consistency"""
    
    async def answer_with_verification(
        self, 
        query: str, 
        expert_id: str,
        num_samples: int = 3
    ) -> str:
        """
        多次生成答案，投票选择最佳
        """
        # 生成多个答案
        answers = await asyncio.gather(*[
            self._generate_answer(query, expert_id)
            for _ in range(num_samples)
        ])
        
        # 对每个答案进行 Self-Critique
        critiques = await asyncio.gather(*[
            self._critique_answer(query, answer, expert_id)
            for answer in answers
        ])
        
        # 投票：选择通过验证最多的答案
        valid_answers = [
            (answer, critique) 
            for answer, critique in zip(answers, critiques)
            if critique["is_valid"]
        ]
        
        if valid_answers:
            # 返回验证得分最高的答案
            return max(valid_answers, key=lambda x: x[1]["score"])[0]
        
        # 如果都没通过，返回第一个并标记需要人工审核
        return answers[0]

5. 完整集成架构

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MBE + Intrinsic Self-Critique 架构                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   用户输入                                                                   │
│      │                                                                      │
│      ▼                                                                      │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                    阶段1: 不舒适识别                                  │  │
│   │                                                                     │  │
│   │   生成分析 → Self-Critique → 修正 → 验证通过                        │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│      │                                                                      │
│      ▼                                                                      │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                    阶段2: 愿望分析                                    │  │
│   │                                                                     │  │
│   │   生成愿望 → Self-Critique → 修正 → 验证通过                        │  │
│   │   (验证: 是否挖掘深层需求? 因果链是否清晰?)                          │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│      │                                                                      │
│      ▼                                                                      │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                    阶段3: 路径生成 ⭐ 核心应用                        │  │
│   │                                                                     │  │
│   │   生成路径 → Self-Critique(验证前置条件) → 修正 → ...              │  │
│   │                                                                     │  │
│   │   迭代优化直到:                                                      │  │
│   │   ✅ 所有前置条件满足                                               │  │
│   │   ✅ 符合用户价值偏好                                               │  │
│   │   ✅ 机会成本已说明                                                 │  │
│   │   ✅ 不确定性已披露                                                 │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│      │                                                                      │
│      ▼                                                                      │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                    TITANS + Self-Critique 融合                       │  │
│   │                                                                     │  │
│   │   惊喜度 = f(预测误差) + g(Self-Critique发现的问题数)               │  │
│   │                                                                     │  │
│   │   高惊喜度 → 更新记忆 → 下次更好                                    │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

米塞斯领域定义 (Domain Definition)

论文强调 领域定义 是 Self-Critique 成功的关键。以下是为 MBE 设计的米塞斯行为学领域定义：

MISES_BEHAVIOR_DOMAIN = """
## 米塞斯行为学领域定义

### 核心公理
1. 人的行动是有目的的行为
2. 行动旨在移除内心的不安/不舒适
3. 人会选择他认为最能达成目的的手段

### 路径验证规则

#### 规则1: 前置条件检查
- 每一步行动必须基于用户当前拥有的资源（时间、金钱、技能）
- 如果前置条件不满足，必须先添加"获取资源"的步骤

#### 规则2: 价值偏好一致性
- 时间偏好：用户是偏好即时满足还是延迟满足？
- 风险偏好：用户对不确定性的容忍度如何？
- 路径必须与用户的偏好一致

#### 规则3: 机会成本说明
- 选择A意味着放弃B
- 每条路径必须说明"如果选择这条路，你需要放弃什么"

#### 规则4: 不确定性披露
- 未来无法完全预测
- 必须说明每条路径的主要不确定因素

#### 规则5: 主观价值论
- 没有客观的"最佳"选择
- 只有"对这个用户最合适"的选择
"""

Self-Critique 提示词模板

路径验证提示词

你是米塞斯行为学专家。请验证以下路径方案。

## 领域定义（米塞斯行为学）
{domain_definition}

## 用户画像
- 年龄: {age}
- 职业: {occupation}
- 时间偏好: {time_preference}
- 风险偏好: {risk_tolerance}
- 当前约束: {constraints}

## 待验证的路径
{paths}

## 验证步骤（对每条路径的每一步）：
1. 检查该步骤的前置条件是否满足（考虑用户的时间、资源、能力）
2. 验证是否符合用户的价值偏好（时间偏好、风险偏好）
3. 检查是否考虑了机会成本
4. 评估不确定性是否被充分说明

请逐步验证，不要省略任何步骤。

最后给出评估结论：
- "路径可行" 或 "路径有问题"
- 如有问题，说明具体问题和修正建议

愿望分析验证提示词

请验证以下愿望分析结果。

## 用户原话
{user_input}

## 分析结果
- 表面愿望: {surface_desire}
- 深层需求: {deep_needs}
- 根本动机: {root_motivation}

## 验证清单：
1. 表面愿望是否准确捕捉了用户的直接诉求？
2. 深层需求是否挖掘到了用户未明说的真实需求？
3. 根本动机是否触及了米塞斯所说的"移除内心不安"？
4. 需求之间的因果链是否清晰？

请逐项验证。如有遗漏或错误，指出并给出修正建议。

结论：
- "分析完整" 或 "分析不足"
- 需要补充的内容

预期效果

指标	当前	应用 Self-Critique 后 (预估)
路径可行性评分	~70%	85-90%
用户采纳率	~60%	75-80%
深层需求识别准确率	~65%	80%
LLM 调用次数	1-2次	2-4次 (迭代成本)
响应时间	3-4秒	5-8秒 (可接受)

实施状态

阶段1: 路径生成 ✅ 已完成

✅ 实现 SelfCritique 类 (src/core/self_critique.py)
✅ 设计米塞斯领域定义 (MISES_BEHAVIOR_DOMAIN)
✅ 编写路径验证提示词
✅ 集成到 PathGenerator.generate_with_self_critique()
✅ 集成到 MisesEngine._generate_paths()

阶段2: 愿望分析 ✅ 已完成

✅ 实现 DesireCritique 类 (src/core/self_critique.py)
✅ 添加愿望分析验证提示词（检查深层需求挖掘、因果链完整性、价值冲突识别）
✅ 实现 DesireAnalyzer.analyze_with_self_critique() 方法
✅ 实现 DesireAnalyzer._revise_desires() 修正方法
✅ 集成到 MisesEngine._handle_uneasiness_confirm()

阶段3: TITANS 增强 ✅ 已完成

✅ 实现 SelfCritiqueWithTITANS 类 (src/core/self_critique.py)
✅ 将 Self-Critique 结果纳入惊喜度计算 (compute_enhanced_surprise())
✅ 实现 process_with_learning() 触发 HOPE 在线学习
✅ 集成到愿望分析和路径生成流程

阶段4: Self-Consistency ✅ 已完成

✅ 实现 SelfConsistencyVoter 类 (src/core/self_critique.py)
✅ 实现 vote_on_paths() 路径投票方法（多采样+批评+投票）
✅ 实现 vote_on_desires() 愿望分析投票方法
✅ 添加配置开关 SELF_CONSISTENCY_ENABLED
✅ 集成到 MisesEngine._generate_paths()（通过配置控制）

使用方法

自动启用（默认）

Self-Critique 默认在路径生成和愿望分析时自动启用：

# engine.py 中自动调用
paths, critique_metadata = await self.path_generator.generate_with_self_critique(
    desires=desires,
    uneasiness=uneasiness,
    user_profile=profile,
    max_iterations=2  # 默认最多2次迭代
)

# 愿望分析也自动启用
desires, desire_metadata = await self.desire_analyzer.analyze_with_self_critique(
    uneasiness=uneasiness,
    user_input=user_input,
    user_profile=profile,
    max_iterations=2
)

手动调用

from src.core.paths import PathGenerator
from src.core.desires import DesireAnalyzer
from src.core.self_critique import (
    get_self_critique,
    get_desire_critique,
    get_titans_integration,
    get_consistency_voter
)

# 方式1: 使用 PathGenerator 的集成方法
generator = PathGenerator()
paths, metadata = await generator.generate_with_self_critique(
    desires=desires,
    uneasiness=uneasiness,
    user_profile=profile
)

# 方式2: 使用 DesireAnalyzer 的集成方法
analyzer = DesireAnalyzer()
desires, metadata = await analyzer.analyze_with_self_critique(
    uneasiness=uneasiness,
    user_input="我想换工作",
    user_profile=profile
)

# 方式3: 单独使用路径批评
self_critique = get_self_critique()
result = await self_critique.critique_paths(
    paths=paths,
    desires=desires
)
print(f"Valid: {result.is_valid}, Issues: {len(result.issues)}")

# 方式4: 单独使用愿望批评
desire_critique = get_desire_critique()
result = await desire_critique.critique_desires(
    desires=desires,
    uneasiness=uneasiness,
    user_input="我想换工作"
)

# 方式5: TITANS 惊喜度增强
titans_integration = get_titans_integration()
enhanced_surprise = titans_integration.compute_enhanced_surprise(
    critique_result=result,
    base_surprise=0.5
)

# 方式6: Self-Consistency 投票（高级）
voter = get_consistency_voter()
best_paths, vote_metadata = await voter.vote_on_paths(
    desires=desires,
    uneasiness=uneasiness,
    user_profile=profile,
    num_samples=3
)
print(f"Best sample: {vote_metadata['best_index']}, Consensus: {vote_metadata['consensus_rate']:.0%}")

禁用 Self-Critique

在 src/core/engine.py 中设置：

SELF_CRITIQUE_ENABLED = False
SELF_CONSISTENCY_ENABLED = False

或在配置文件中：

settings.enable_self_critique = False
settings.enable_self_consistency = False

配置参数

engine.py 配置标志

# Self-Critique 配置标志
SELF_CRITIQUE_ENABLED = True   # 启用路径和愿望分析的 Self-Critique
SELF_CONSISTENCY_ENABLED = False  # 启用 Self-Consistency 投票（多采样，高成本）

settings 配置（可选覆盖）

# 在 config.py 或环境变量中配置
settings.enable_self_critique = True        # 启用 Self-Critique
settings.enable_self_consistency = False    # 启用 Self-Consistency
settings.self_critique_max_iterations = 2   # 最大迭代次数
settings.self_consistency_samples = 3       # Self-Consistency 采样数

SelfCritiqueConfig 详细配置

from src.core.self_critique import SelfCritiqueConfig

config = SelfCritiqueConfig(
    # 最大迭代次数
    max_iterations=3,
    
    # Self-Consistency 采样数
    num_samples=3,
    
    # 是否启用 Self-Consistency
    enable_self_consistency=False,
    
    # 验证有效的最低分数阈值
    min_valid_score=0.6,
    
    # 超时时间 (秒)
    timeout=15,
    
    # 是否记录详细日志
    log_critiques=True,
)

架构图

┌─────────────────────────────────────────────────────────────┐
│                     MisesEngine                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    │
│  │ Uneasiness  │───▶│  Desires    │───▶│   Paths     │    │
│  │  Detector   │    │  Analyzer   │    │  Generator  │    │
│  └─────────────┘    └──────┬──────┘    └──────┬──────┘    │
│                            │                   │           │
│                            ▼                   ▼           │
│                   ┌─────────────────────────────────┐     │
│                   │       Self-Critique Module       │     │
│                   ├─────────────────────────────────┤     │
│                   │ • DesireCritique               │     │
│                   │ • SelfCritique (Paths)         │     │
│                   │ • SelfCritiqueWithTITANS       │     │
│                   │ • SelfConsistencyVoter         │     │
│                   └──────────────┬──────────────────┘     │
│                                  │                        │
│                                  ▼                        │
│                   ┌─────────────────────────────────┐     │
│                   │     TITANS/HOPE Integration      │     │
│                   │ • 惊喜度增强                    │     │
│                   │ • 在线学习触发                  │     │
│                   └─────────────────────────────────┘     │
│                                                            │
└─────────────────────────────────────────────────────────────┘

参考资料

论文: Enhancing LLM Planning Capabilities through Intrinsic Self-Critique (arXiv:2512.24103v1)
作者: Bernd Bohnet, Pierre-Alexandre Kamienny, Hanie Sedghi 等 (Google DeepMind)
发布日期: 2025-12-30

文档版本: v2.0 (全部实现) | 更新时间: 2026-01-21