MBE 文档生成模块设计方案

文档版本: v1.0
创建日期: 2026-01-31
模块名称: Document Generation Engine (DGE)
核心能力: 对话摘要、方案生成、签署文件、多格式导出


一、模块概述

1.1 设计目标

┌─────────────────────────────────────────────────────────────────────────┐
│                    文档生成模块核心目标                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  【功能目标】                                                            │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ 1. 对话记录整理:自动提取、结构化、格式化用户对话内容           │   │
│  │ 2. 方案文档生成:基于对话生成专业的建议书、方案书               │   │
│  │ 3. 签署文件生成:生成需要用户确认/签署的正式文件                │   │
│  │ 4. 多格式输出:支持 PDF、Word、HTML、Markdown、JSON             │   │
│  │ 5. 多场景适配:保险、医疗、理财、教育、法律等垂直领域           │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
│  【设计原则】                                                            │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ • 模板驱动:通过模板实现内容与格式分离                          │   │
│  │ • 插件式架构:新场景通过插件扩展,无需修改核心代码              │   │
│  │ • LLM增强:利用大模型能力进行内容优化和摘要生成                 │   │
│  │ • 合规优先:敏感行业(医疗、金融)严格遵循监管要求              │   │
│  │ • 可追溯性:文档与源对话可关联追溯                              │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

1.2 支持的行业场景

行业 典型文档类型 签署文件 合规要求
保险 保险方案书、需求分析报告、产品对比表 投保单、健康告知书、风险确认书 保险销售可回溯
医疗 问诊摘要、健康评估报告、康复计划 知情同意书、治疗确认书 医疗文书规范、隐私保护
理财 理财建议书、资产配置方案、投资组合报告 风险测评确认书、投资协议 适当性管理、风险揭示
教育 学习报告、课程规划、能力评估 服务协议、课程确认书 教育培训合同规范
法律 法律咨询摘要、案件分析报告 委托协议、授权书 律师执业规范
心理 咨询记录、心理评估报告、干预计划 咨询同意书、保密协议 心理咨询伦理规范

二、系统架构

2.1 整体架构图

┌─────────────────────────────────────────────────────────────────────────────┐
│                         文档生成模块 (Document Generation Engine)            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                           API Layer                                  │   │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │   │
│  │  │ Summary  │ │ Document │ │ Signing  │ │  Export  │ │ Template │  │   │
│  │  │   API    │ │Generate  │ │   API    │ │   API    │ │   API    │  │   │
│  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘  │   │
│  └───────┼────────────┼────────────┼────────────┼────────────┼────────┘   │
│          │            │            │            │            │             │
│  ┌───────┴────────────┴────────────┴────────────┴────────────┴────────┐   │
│  │                        Core Engine Layer                            │   │
│  │  ┌──────────────────────────────────────────────────────────────┐  │   │
│  │  │                    Document Orchestrator                      │  │   │
│  │  │     (文档编排器:协调各组件完成文档生成流程)                  │  │   │
│  │  └──────────────────────────────────────────────────────────────┘  │   │
│  │                                                                     │   │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────┐  │   │
│  │  │  Content   │ │  Template  │ │   Format   │ │   Compliance   │  │   │
│  │  │  Extractor │ │   Engine   │ │  Renderer  │ │    Checker     │  │   │
│  │  │ (内容提取) │ │ (模板引擎) │ │(格式渲染器)│ │  (合规检查)    │  │   │
│  │  └────────────┘ └────────────┘ └────────────┘ └────────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      Industry Plugin Layer                           │   │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │   │
│  │  │Insurance │ │ Medical  │ │ Finance  │ │Education │ │  Legal   │  │   │
│  │  │  Plugin  │ │  Plugin  │ │  Plugin  │ │  Plugin  │ │  Plugin  │  │   │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      Infrastructure Layer                            │   │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │   │
│  │  │   LLM    │ │ Template │ │  TITANS  │ │  Redis   │ │   OSS    │  │   │
│  │  │ Service  │ │  Store   │ │  Memory  │ │  Cache   │ │ Storage  │  │   │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

2.2 目录结构

src/document/
├── __init__.py
├── orchestrator.py              # 文档编排器(核心协调器)
├── api/
│   ├── __init__.py
│   ├── summary.py               # 对话摘要API
│   ├── document.py              # 文档生成API
│   ├── signing.py               # 签署文件API
│   ├── export.py                # 批量导出API
│   └── template.py              # 模板管理API
│
├── core/
│   ├── __init__.py
│   ├── content_extractor.py     # 内容提取器
│   ├── template_engine.py       # 模板引擎
│   ├── format_renderer.py       # 格式渲染器
│   ├── compliance_checker.py    # 合规检查器
│   └── llm_enhancer.py          # LLM内容增强
│
├── plugins/
│   ├── __init__.py
│   ├── base.py                  # 插件基类
│   ├── insurance/               # 保险行业插件
│   │   ├── __init__.py
│   │   ├── plugin.py
│   │   ├── extractors.py
│   │   └── validators.py
│   ├── medical/                 # 医疗行业插件
│   │   ├── __init__.py
│   │   ├── plugin.py
│   │   ├── extractors.py
│   │   └── validators.py
│   ├── finance/                 # 理财行业插件
│   ├── education/               # 教育行业插件
│   └── legal/                   # 法律行业插件
│
├── templates/
│   ├── base/                    # 基础模板
│   │   ├── summary.html.j2
│   │   ├── report.html.j2
│   │   └── agreement.html.j2
│   ├── insurance/               # 保险模板
│   │   ├── proposal.html.j2
│   │   ├── application.html.j2
│   │   └── risk_disclosure.html.j2
│   ├── medical/                 # 医疗模板
│   │   ├── consultation_summary.html.j2
│   │   ├── health_assessment.html.j2
│   │   └── informed_consent.html.j2
│   └── finance/                 # 理财模板
│       ├── advisory_report.html.j2
│       └── risk_assessment.html.j2
│
├── renderers/
│   ├── __init__.py
│   ├── pdf_renderer.py          # PDF渲染器
│   ├── word_renderer.py         # Word渲染器
│   ├── html_renderer.py         # HTML渲染器
│   └── markdown_renderer.py     # Markdown渲染器
│
├── models/
│   ├── __init__.py
│   ├── document.py              # 文档数据模型
│   ├── template.py              # 模板数据模型
│   └── signing.py               # 签署文件数据模型
│
└── storage/
    ├── __init__.py
    ├── document_store.py        # 文档存储
    └── template_store.py        # 模板存储

三、核心组件设计

3.1 内容提取器 (Content Extractor)

负责从对话历史中提取结构化信息。

# src/document/core/content_extractor.py

from typing import Dict, List, Optional, Any
from dataclasses import dataclass, field
from enum import Enum
from abc import ABC, abstractmethod

class ExtractionType(Enum):
    """提取类型"""
    SUMMARY = "summary"           # 对话摘要
    NEEDS = "needs"               # 用户需求
    RECOMMENDATIONS = "recommendations"  # 建议
    DECISIONS = "decisions"       # 决策
    ACTIONS = "actions"           # 行动项
    PROFILE = "profile"           # 用户画像
    RISK = "risk"                 # 风险点
    COMPLIANCE = "compliance"     # 合规信息


@dataclass
class ExtractedContent:
    """提取的内容结构"""
    session_id: str
    device_id: str
    extraction_time: str
    
    # 基础信息
    conversation_summary: str = ""
    key_points: List[str] = field(default_factory=list)
    
    # 用户相关
    user_needs: List[Dict] = field(default_factory=list)
    user_profile: Dict = field(default_factory=dict)
    user_concerns: List[str] = field(default_factory=list)
    
    # 专家相关
    expert_recommendations: List[Dict] = field(default_factory=list)
    expert_explanations: List[str] = field(default_factory=list)
    
    # 决策相关
    decisions_made: List[Dict] = field(default_factory=list)
    pending_decisions: List[Dict] = field(default_factory=list)
    
    # 行业特定字段(由插件填充)
    industry_specific: Dict[str, Any] = field(default_factory=dict)
    
    # 元数据
    confidence_score: float = 0.0
    extraction_method: str = ""


class ContentExtractor:
    """内容提取器"""
    
    def __init__(
        self,
        llm_service,
        titans_memory,
        industry_plugins: Dict[str, 'IndustryPlugin'] = None
    ):
        self.llm = llm_service
        self.memory = titans_memory
        self.plugins = industry_plugins or {}
    
    async def extract(
        self,
        session_id: str,
        device_id: str,
        extraction_types: List[ExtractionType] = None,
        industry: str = None
    ) -> ExtractedContent:
        """
        从对话中提取结构化内容
        
        Args:
            session_id: 会话ID
            device_id: 设备ID
            extraction_types: 需要提取的内容类型
            industry: 行业标识(用于加载特定插件)
        """
        # 1. 获取对话历史
        conversation = await self._get_conversation(session_id, device_id)
        
        # 2. 基础提取(使用LLM)
        base_content = await self._extract_base_content(conversation)
        
        # 3. 行业特定提取(使用插件)
        if industry and industry in self.plugins:
            plugin = self.plugins[industry]
            industry_content = await plugin.extract_content(conversation)
            base_content.industry_specific = industry_content
        
        # 4. 质量评分
        base_content.confidence_score = await self._calculate_confidence(base_content)
        
        return base_content
    
    async def _extract_base_content(self, conversation: List[Dict]) -> ExtractedContent:
        """使用LLM提取基础内容"""
        
        prompt = self._build_extraction_prompt(conversation)
        
        response = await self.llm.generate(
            prompt=prompt,
            system_prompt=EXTRACTION_SYSTEM_PROMPT,
            response_format="json"
        )
        
        return self._parse_extraction_response(response)
    
    def _build_extraction_prompt(self, conversation: List[Dict]) -> str:
        """构建提取提示词"""
        
        conv_text = "\n".join([
            f"{'用户' if msg['role'] == 'user' else '专家'}: {msg['content']}"
            for msg in conversation
        ])
        
        return f"""
请从以下对话中提取结构化信息:

【对话内容】
{conv_text}

【提取要求】
请以JSON格式返回以下信息:
1. conversation_summary: 对话整体摘要(100-200字)
2. key_points: 关键要点列表(3-5条)
3. user_needs: 用户需求列表,每项包含 need(需求)、priority(优先级)、status(状态)
4. user_concerns: 用户顾虑列表
5. expert_recommendations: 专家建议列表,每项包含 recommendation(建议)、reason(理由)、confidence(置信度)
6. decisions_made: 已做决策列表
7. pending_decisions: 待决策事项列表
"""


# 提取系统提示词
EXTRACTION_SYSTEM_PROMPT = """
你是一个专业的对话内容分析师,擅长从对话中提取结构化信息。

你的任务是:
1. 准确理解对话的上下文和意图
2. 提取关键信息,不遗漏重要内容
3. 区分事实和观点
4. 识别明确的决策和待定事项
5. 保持客观中立,不添加主观判断

输出要求:
- 使用JSON格式
- 所有字段必须填充(无内容则为空数组/字符串)
- 摘要要简洁但完整
- 关键点要具体可行
"""

3.2 模板引擎 (Template Engine)

基于Jinja2的模板引擎,支持动态内容渲染。

# src/document/core/template_engine.py

from typing import Dict, Any, Optional, List
from pathlib import Path
from jinja2 import Environment, FileSystemLoader, select_autoescape
from dataclasses import dataclass
from enum import Enum
import json

class TemplateType(Enum):
    """模板类型"""
    SUMMARY = "summary"              # 对话摘要
    REPORT = "report"                # 分析报告
    PROPOSAL = "proposal"            # 方案建议书
    AGREEMENT = "agreement"          # 协议/合同
    APPLICATION = "application"      # 申请表
    CONSENT = "consent"              # 同意书
    DISCLOSURE = "disclosure"        # 披露/告知书


@dataclass
class TemplateConfig:
    """模板配置"""
    template_id: str
    template_type: TemplateType
    industry: str
    name: str
    description: str
    version: str
    
    # 模板文件
    html_template: str              # HTML模板路径
    css_style: Optional[str] = None # CSS样式路径
    
    # 必填字段
    required_fields: List[str] = None
    
    # 可选字段
    optional_fields: List[str] = None
    
    # 合规要求
    compliance_rules: List[str] = None
    
    # 输出格式支持
    supported_formats: List[str] = None  # ["pdf", "word", "html"]


class TemplateEngine:
    """模板引擎"""
    
    def __init__(self, template_base_path: str):
        self.template_path = Path(template_base_path)
        self.env = Environment(
            loader=FileSystemLoader(str(self.template_path)),
            autoescape=select_autoescape(['html', 'xml']),
            trim_blocks=True,
            lstrip_blocks=True
        )
        
        # 注册自定义过滤器
        self._register_filters()
        
        # 模板配置缓存
        self.template_configs: Dict[str, TemplateConfig] = {}
        self._load_template_configs()
    
    def _register_filters(self):
        """注册自定义Jinja2过滤器"""
        
        # 日期格式化
        self.env.filters['date_format'] = lambda d, fmt='%Y年%m月%d日': d.strftime(fmt) if d else ''
        
        # 金额格式化
        self.env.filters['currency'] = lambda v: f"¥{v:,.2f}" if v else "¥0.00"
        
        # 百分比格式化
        self.env.filters['percent'] = lambda v: f"{v*100:.1f}%" if v else "0%"
        
        # 列表连接
        self.env.filters['join_list'] = lambda l, sep='、': sep.join(l) if l else ''
        
        # 安全HTML(用于签署文件)
        self.env.filters['safe_text'] = lambda t: t.replace('<', '&lt;').replace('>', '&gt;')
        
        # 电话脱敏
        self.env.filters['mask_phone'] = lambda p: p[:3] + '****' + p[-4:] if p and len(p) >= 11 else p
        
        # 身份证脱敏
        self.env.filters['mask_id'] = lambda i: i[:6] + '********' + i[-4:] if i and len(i) >= 18 else i
    
    def render(
        self,
        template_id: str,
        data: Dict[str, Any],
        locale: str = "zh_CN"
    ) -> str:
        """
        渲染模板
        
        Args:
            template_id: 模板ID
            data: 渲染数据
            locale: 语言设置
            
        Returns:
            渲染后的HTML字符串
        """
        config = self.template_configs.get(template_id)
        if not config:
            raise ValueError(f"Template not found: {template_id}")
        
        # 验证必填字段
        self._validate_required_fields(config, data)
        
        # 加载模板
        template = self.env.get_template(config.html_template)
        
        # 准备渲染上下文
        context = {
            **data,
            "template_config": config,
            "locale": locale,
            "generation_time": datetime.now(),
        }
        
        # 渲染
        return template.render(**context)
    
    def _validate_required_fields(self, config: TemplateConfig, data: Dict):
        """验证必填字段"""
        if not config.required_fields:
            return
            
        missing = [f for f in config.required_fields if f not in data or not data[f]]
        if missing:
            raise ValueError(f"Missing required fields: {missing}")
    
    def get_template_schema(self, template_id: str) -> Dict:
        """获取模板的数据结构定义"""
        config = self.template_configs.get(template_id)
        if not config:
            raise ValueError(f"Template not found: {template_id}")
        
        return {
            "template_id": template_id,
            "template_type": config.template_type.value,
            "required_fields": config.required_fields or [],
            "optional_fields": config.optional_fields or [],
            "supported_formats": config.supported_formats or ["html", "pdf"]
        }
    
    def list_templates(
        self,
        industry: str = None,
        template_type: TemplateType = None
    ) -> List[TemplateConfig]:
        """列出可用模板"""
        templates = list(self.template_configs.values())
        
        if industry:
            templates = [t for t in templates if t.industry == industry]
        
        if template_type:
            templates = [t for t in templates if t.template_type == template_type]
        
        return templates

3.3 格式渲染器 (Format Renderer)

将HTML转换为各种输出格式。

# src/document/core/format_renderer.py

from abc import ABC, abstractmethod
from typing import Optional, Dict, Any
from pathlib import Path
import tempfile
import base64

class OutputFormat(Enum):
    """输出格式"""
    HTML = "html"
    PDF = "pdf"
    WORD = "word"
    MARKDOWN = "markdown"
    JSON = "json"


class BaseRenderer(ABC):
    """渲染器基类"""
    
    @abstractmethod
    async def render(
        self,
        html_content: str,
        options: Dict[str, Any] = None
    ) -> bytes:
        """渲染HTML为目标格式"""
        pass
    
    @property
    @abstractmethod
    def output_format(self) -> OutputFormat:
        """输出格式"""
        pass


class PDFRenderer(BaseRenderer):
    """PDF渲染器"""
    
    def __init__(self, wkhtmltopdf_path: str = None):
        """
        初始化PDF渲染器
        
        支持两种方式:
        1. WeasyPrint(纯Python,推荐)
        2. wkhtmltopdf(需要系统安装)
        """
        self.wkhtmltopdf_path = wkhtmltopdf_path
    
    @property
    def output_format(self) -> OutputFormat:
        return OutputFormat.PDF
    
    async def render(
        self,
        html_content: str,
        options: Dict[str, Any] = None
    ) -> bytes:
        """
        渲染HTML为PDF
        
        Options:
            page_size: 页面大小 (A4, Letter等)
            margin: 边距设置
            header_html: 页眉HTML
            footer_html: 页脚HTML
            watermark: 水印文字
        """
        options = options or {}
        
        try:
            # 优先使用WeasyPrint
            return await self._render_with_weasyprint(html_content, options)
        except ImportError:
            # 回退到wkhtmltopdf
            return await self._render_with_wkhtmltopdf(html_content, options)
    
    async def _render_with_weasyprint(
        self,
        html_content: str,
        options: Dict
    ) -> bytes:
        """使用WeasyPrint渲染"""
        from weasyprint import HTML, CSS
        from weasyprint.text.fonts import FontConfiguration
        
        font_config = FontConfiguration()
        
        # 基础CSS(支持中文)
        base_css = CSS(string="""
            @font-face {
                font-family: 'SimSun';
                src: local('SimSun'), local('宋体');
            }
            body {
                font-family: 'SimSun', 'Microsoft YaHei', sans-serif;
                font-size: 12pt;
                line-height: 1.6;
            }
            @page {
                size: A4;
                margin: 2cm;
            }
        """, font_config=font_config)
        
        # 添加水印CSS
        if options.get('watermark'):
            watermark_css = CSS(string=f"""
                @page {{
                    background-image: url('data:image/svg+xml,...');
                }}
            """)
        
        html = HTML(string=html_content)
        pdf_bytes = html.write_pdf(
            stylesheets=[base_css],
            font_config=font_config
        )
        
        return pdf_bytes


class WordRenderer(BaseRenderer):
    """Word文档渲染器"""
    
    @property
    def output_format(self) -> OutputFormat:
        return OutputFormat.WORD
    
    async def render(
        self,
        html_content: str,
        options: Dict[str, Any] = None
    ) -> bytes:
        """
        渲染HTML为Word文档
        
        Options:
            template: Word模板路径(.dotx)
            styles: 自定义样式映射
        """
        from docx import Document
        from docx.shared import Inches, Pt, RGBColor
        from docx.enum.text import WD_ALIGN_PARAGRAPH
        from bs4 import BeautifulSoup
        import io
        
        options = options or {}
        
        # 解析HTML
        soup = BeautifulSoup(html_content, 'html.parser')
        
        # 创建文档
        if options.get('template'):
            doc = Document(options['template'])
        else:
            doc = Document()
        
        # 设置默认字体
        style = doc.styles['Normal']
        style.font.name = '宋体'
        style.font.size = Pt(12)
        
        # 转换HTML元素到Word
        await self._convert_html_to_docx(soup, doc, options)
        
        # 保存到字节流
        buffer = io.BytesIO()
        doc.save(buffer)
        buffer.seek(0)
        
        return buffer.read()
    
    async def _convert_html_to_docx(
        self,
        soup: BeautifulSoup,
        doc: Document,
        options: Dict
    ):
        """将HTML元素转换为Word元素"""
        
        for element in soup.body.children if soup.body else soup.children:
            if element.name == 'h1':
                doc.add_heading(element.get_text(), level=1)
            elif element.name == 'h2':
                doc.add_heading(element.get_text(), level=2)
            elif element.name == 'h3':
                doc.add_heading(element.get_text(), level=3)
            elif element.name == 'p':
                doc.add_paragraph(element.get_text())
            elif element.name == 'ul':
                for li in element.find_all('li'):
                    doc.add_paragraph(li.get_text(), style='List Bullet')
            elif element.name == 'ol':
                for li in element.find_all('li'):
                    doc.add_paragraph(li.get_text(), style='List Number')
            elif element.name == 'table':
                await self._convert_table(element, doc)
            elif element.name == 'div':
                # 递归处理div
                await self._convert_html_to_docx(element, doc, options)


class MarkdownRenderer(BaseRenderer):
    """Markdown渲染器"""
    
    @property
    def output_format(self) -> OutputFormat:
        return OutputFormat.MARKDOWN
    
    async def render(
        self,
        html_content: str,
        options: Dict[str, Any] = None
    ) -> bytes:
        """将HTML转换为Markdown"""
        import html2text
        
        h = html2text.HTML2Text()
        h.body_width = 0  # 不自动换行
        h.unicode_snob = True  # 保留Unicode字符
        
        markdown = h.handle(html_content)
        return markdown.encode('utf-8')


class FormatRendererFactory:
    """渲染器工厂"""
    
    _renderers: Dict[OutputFormat, BaseRenderer] = {}
    
    @classmethod
    def get_renderer(cls, format: OutputFormat) -> BaseRenderer:
        """获取渲染器实例"""
        if format not in cls._renderers:
            if format == OutputFormat.PDF:
                cls._renderers[format] = PDFRenderer()
            elif format == OutputFormat.WORD:
                cls._renderers[format] = WordRenderer()
            elif format == OutputFormat.MARKDOWN:
                cls._renderers[format] = MarkdownRenderer()
            else:
                raise ValueError(f"Unsupported format: {format}")
        
        return cls._renderers[format]

3.4 合规检查器 (Compliance Checker)

确保生成的文档符合行业监管要求。

# src/document/core/compliance_checker.py

from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from enum import Enum
from abc import ABC, abstractmethod

class ComplianceLevel(Enum):
    """合规级别"""
    PASS = "pass"           # 通过
    WARNING = "warning"     # 警告(可继续,建议修改)
    FAIL = "fail"           # 失败(必须修改)


@dataclass
class ComplianceIssue:
    """合规问题"""
    rule_id: str
    rule_name: str
    level: ComplianceLevel
    message: str
    field: Optional[str] = None
    suggestion: Optional[str] = None


@dataclass 
class ComplianceResult:
    """合规检查结果"""
    passed: bool
    issues: List[ComplianceIssue]
    checked_rules: int
    pass_rate: float
    
    def to_dict(self) -> Dict:
        return {
            "passed": self.passed,
            "issues": [
                {
                    "rule_id": i.rule_id,
                    "level": i.level.value,
                    "message": i.message,
                    "field": i.field,
                    "suggestion": i.suggestion
                }
                for i in self.issues
            ],
            "checked_rules": self.checked_rules,
            "pass_rate": self.pass_rate
        }


class ComplianceRule(ABC):
    """合规规则基类"""
    
    @property
    @abstractmethod
    def rule_id(self) -> str:
        pass
    
    @property
    @abstractmethod
    def rule_name(self) -> str:
        pass
    
    @property
    @abstractmethod
    def industries(self) -> List[str]:
        """适用行业"""
        pass
    
    @abstractmethod
    def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
        """执行检查,返回None表示通过"""
        pass


# ==================== 通用规则 ====================

class PersonalInfoDisclosureRule(ComplianceRule):
    """个人信息披露规则"""
    
    rule_id = "COMMON_001"
    rule_name = "个人信息脱敏"
    industries = ["*"]  # 所有行业
    
    def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
        """检查是否包含未脱敏的个人信息"""
        import re
        
        # 检查身份证号
        id_pattern = r'\b\d{17}[\dXx]\b'
        # 检查手机号
        phone_pattern = r'\b1[3-9]\d{9}\b'
        
        text = str(content)
        
        if re.search(id_pattern, text):
            return ComplianceIssue(
                rule_id=self.rule_id,
                rule_name=self.rule_name,
                level=ComplianceLevel.FAIL,
                message="文档包含未脱敏的身份证号",
                suggestion="请使用mask_id过滤器进行脱敏处理"
            )
        
        if re.search(phone_pattern, text):
            return ComplianceIssue(
                rule_id=self.rule_id,
                rule_name=self.rule_name,
                level=ComplianceLevel.WARNING,
                message="文档包含完整手机号,建议脱敏",
                suggestion="请使用mask_phone过滤器进行脱敏处理"
            )
        
        return None


# ==================== 保险行业规则 ====================

class InsuranceRiskDisclosureRule(ComplianceRule):
    """保险风险揭示规则"""
    
    rule_id = "INS_001"
    rule_name = "风险揭示完整性"
    industries = ["insurance"]
    
    REQUIRED_DISCLOSURES = [
        "投保前请仔细阅读保险条款",
        "保险产品存在退保损失",
        "请如实告知健康状况",
        "免责条款"
    ]
    
    def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
        text = str(content).lower()
        
        missing = []
        for disclosure in self.REQUIRED_DISCLOSURES:
            if disclosure not in text:
                missing.append(disclosure)
        
        if missing:
            return ComplianceIssue(
                rule_id=self.rule_id,
                rule_name=self.rule_name,
                level=ComplianceLevel.FAIL,
                message=f"缺少必要的风险揭示内容: {', '.join(missing)}",
                suggestion="请确保文档包含完整的风险揭示信息"
            )
        
        return None


class InsuranceSuitabilityRule(ComplianceRule):
    """保险适当性规则"""
    
    rule_id = "INS_002"
    rule_name = "产品适当性说明"
    industries = ["insurance"]
    
    def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
        # 检查是否包含适当性说明
        required_fields = ["risk_tolerance", "insurance_needs", "payment_ability"]
        
        missing = [f for f in required_fields if f not in content]
        
        if missing:
            return ComplianceIssue(
                rule_id=self.rule_id,
                rule_name=self.rule_name,
                level=ComplianceLevel.WARNING,
                message=f"建议补充适当性评估信息: {', '.join(missing)}",
                suggestion="添加用户风险承受能力、保险需求、缴费能力等信息"
            )
        
        return None


# ==================== 医疗行业规则 ====================

class MedicalDisclaimerRule(ComplianceRule):
    """医疗免责声明规则"""
    
    rule_id = "MED_001"
    rule_name = "医疗建议免责声明"
    industries = ["medical"]
    
    REQUIRED_DISCLAIMER = "以上建议仅供参考,不能替代专业医生的诊断和治疗"
    
    def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
        text = str(content)
        
        if self.REQUIRED_DISCLAIMER not in text and "仅供参考" not in text:
            return ComplianceIssue(
                rule_id=self.rule_id,
                rule_name=self.rule_name,
                level=ComplianceLevel.FAIL,
                message="医疗相关文档必须包含免责声明",
                suggestion=f"请添加免责声明: '{self.REQUIRED_DISCLAIMER}'"
            )
        
        return None


class MedicalPrivacyRule(ComplianceRule):
    """医疗隐私保护规则"""
    
    rule_id = "MED_002"
    rule_name = "医疗隐私保护"
    industries = ["medical"]
    
    SENSITIVE_FIELDS = ["diagnosis", "medical_history", "medication"]
    
    def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
        # 检查敏感医疗信息是否有保护措施
        has_sensitive = any(f in content for f in self.SENSITIVE_FIELDS)
        has_consent = content.get("privacy_consent", False)
        
        if has_sensitive and not has_consent:
            return ComplianceIssue(
                rule_id=self.rule_id,
                rule_name=self.rule_name,
                level=ComplianceLevel.WARNING,
                message="文档包含敏感医疗信息,建议获取隐私授权",
                field="privacy_consent",
                suggestion="建议在文档中添加隐私保护声明并获取用户同意"
            )
        
        return None


# ==================== 理财行业规则 ====================

class FinanceRiskWarningRule(ComplianceRule):
    """理财风险警示规则"""
    
    rule_id = "FIN_001"
    rule_name = "投资风险警示"
    industries = ["finance"]
    
    REQUIRED_WARNINGS = [
        "投资有风险",
        "过往业绩不代表未来表现",
        "请根据自身风险承受能力"
    ]
    
    def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
        text = str(content)
        
        missing = [w for w in self.REQUIRED_WARNINGS if w not in text]
        
        if missing:
            return ComplianceIssue(
                rule_id=self.rule_id,
                rule_name=self.rule_name,
                level=ComplianceLevel.FAIL,
                message=f"缺少必要的风险警示: {', '.join(missing)}",
                suggestion="理财建议必须包含完整的风险提示"
            )
        
        return None


# ==================== 合规检查器 ====================

class ComplianceChecker:
    """合规检查器"""
    
    def __init__(self):
        self.rules: List[ComplianceRule] = [
            # 通用规则
            PersonalInfoDisclosureRule(),
            
            # 保险规则
            InsuranceRiskDisclosureRule(),
            InsuranceSuitabilityRule(),
            
            # 医疗规则
            MedicalDisclaimerRule(),
            MedicalPrivacyRule(),
            
            # 理财规则
            FinanceRiskWarningRule(),
        ]
    
    def check(
        self,
        content: Dict[str, Any],
        industry: str
    ) -> ComplianceResult:
        """
        执行合规检查
        
        Args:
            content: 文档内容
            industry: 行业标识
        """
        issues = []
        checked = 0
        
        for rule in self.rules:
            # 检查规则是否适用
            if "*" in rule.industries or industry in rule.industries:
                checked += 1
                issue = rule.check(content)
                if issue:
                    issues.append(issue)
        
        # 判断是否通过
        has_fail = any(i.level == ComplianceLevel.FAIL for i in issues)
        pass_rate = (checked - len(issues)) / checked if checked > 0 else 1.0
        
        return ComplianceResult(
            passed=not has_fail,
            issues=issues,
            checked_rules=checked,
            pass_rate=pass_rate
        )
    
    def add_rule(self, rule: ComplianceRule):
        """添加自定义规则"""
        self.rules.append(rule)

四、行业插件设计

4.1 插件基类

# src/document/plugins/base.py

from abc import ABC, abstractmethod
from typing import Dict, List, Any, Optional
from dataclasses import dataclass

@dataclass
class IndustryDocumentType:
    """行业文档类型定义"""
    type_id: str
    name: str
    description: str
    template_id: str
    required_fields: List[str]
    compliance_rules: List[str]


class IndustryPlugin(ABC):
    """行业插件基类"""
    
    @property
    @abstractmethod
    def industry_id(self) -> str:
        """行业标识"""
        pass
    
    @property
    @abstractmethod
    def industry_name(self) -> str:
        """行业名称"""
        pass
    
    @property
    @abstractmethod
    def document_types(self) -> List[IndustryDocumentType]:
        """支持的文档类型"""
        pass
    
    @abstractmethod
    async def extract_content(
        self,
        conversation: List[Dict]
    ) -> Dict[str, Any]:
        """
        从对话中提取行业特定内容
        
        Args:
            conversation: 对话历史
            
        Returns:
            行业特定的结构化数据
        """
        pass
    
    @abstractmethod
    def get_document_schema(
        self,
        document_type: str
    ) -> Dict[str, Any]:
        """
        获取文档数据结构定义
        
        Args:
            document_type: 文档类型
            
        Returns:
            JSON Schema格式的数据结构定义
        """
        pass
    
    @abstractmethod
    def validate_document_data(
        self,
        document_type: str,
        data: Dict[str, Any]
    ) -> List[str]:
        """
        验证文档数据
        
        Returns:
            错误消息列表,空列表表示验证通过
        """
        pass

4.2 保险行业插件

# src/document/plugins/insurance/plugin.py

from typing import Dict, List, Any
from ..base import IndustryPlugin, IndustryDocumentType

class InsurancePlugin(IndustryPlugin):
    """保险行业插件"""
    
    @property
    def industry_id(self) -> str:
        return "insurance"
    
    @property
    def industry_name(self) -> str:
        return "保险"
    
    @property
    def document_types(self) -> List[IndustryDocumentType]:
        return [
            IndustryDocumentType(
                type_id="insurance_proposal",
                name="保险方案书",
                description="根据用户需求生成的保险配置方案",
                template_id="insurance/proposal",
                required_fields=["user_info", "needs_analysis", "product_recommendations"],
                compliance_rules=["INS_001", "INS_002"]
            ),
            IndustryDocumentType(
                type_id="needs_analysis",
                name="需求分析报告",
                description="用户保险需求详细分析",
                template_id="insurance/needs_analysis",
                required_fields=["user_profile", "risk_assessment", "coverage_gaps"],
                compliance_rules=["INS_002"]
            ),
            IndustryDocumentType(
                type_id="product_comparison",
                name="产品对比表",
                description="多款保险产品对比分析",
                template_id="insurance/product_comparison",
                required_fields=["products", "comparison_dimensions"],
                compliance_rules=["INS_001"]
            ),
            IndustryDocumentType(
                type_id="application_form",
                name="投保单",
                description="保险投保申请表",
                template_id="insurance/application",
                required_fields=["applicant", "insured", "product", "premium"],
                compliance_rules=["INS_001", "INS_002", "COMMON_001"]
            ),
            IndustryDocumentType(
                type_id="health_disclosure",
                name="健康告知书",
                description="投保人健康状况告知",
                template_id="insurance/health_disclosure",
                required_fields=["health_questions", "declarations"],
                compliance_rules=["INS_001"]
            ),
            IndustryDocumentType(
                type_id="risk_confirmation",
                name="风险确认书",
                description="投保风险确认签署文件",
                template_id="insurance/risk_confirmation",
                required_fields=["risk_items", "confirmation"],
                compliance_rules=["INS_001"]
            )
        ]
    
    async def extract_content(
        self,
        conversation: List[Dict]
    ) -> Dict[str, Any]:
        """提取保险相关内容"""
        
        return {
            # 用户信息
            "user_info": await self._extract_user_info(conversation),
            
            # 家庭信息
            "family_info": await self._extract_family_info(conversation),
            
            # 财务状况
            "financial_status": await self._extract_financial_status(conversation),
            
            # 已有保障
            "existing_coverage": await self._extract_existing_coverage(conversation),
            
            # 保障需求
            "insurance_needs": await self._extract_insurance_needs(conversation),
            
            # 产品偏好
            "product_preferences": await self._extract_product_preferences(conversation),
            
            # 顾虑问题
            "concerns": await self._extract_concerns(conversation),
            
            # 推荐产品
            "recommended_products": await self._extract_recommendations(conversation),
        }
    
    async def _extract_user_info(self, conversation: List[Dict]) -> Dict:
        """提取用户基本信息"""
        # 使用LLM从对话中提取
        prompt = """
        从对话中提取用户基本信息,返回JSON格式:
        {
            "age": 年龄(数字),
            "gender": 性别("male"/"female"),
            "occupation": 职业,
            "income_level": 收入水平("low"/"medium"/"high"),
            "location": 所在城市
        }
        """
        # ... LLM调用
        return {}
    
    async def _extract_insurance_needs(self, conversation: List[Dict]) -> List[Dict]:
        """提取保险需求"""
        prompt = """
        从对话中提取用户的保险需求,返回JSON数组:
        [
            {
                "type": 保险类型("life"/"health"/"accident"/"medical"/"education"/"pension"),
                "priority": 优先级(1-5),
                "coverage_amount": 建议保额,
                "reason": 需要原因
            }
        ]
        """
        # ... LLM调用
        return []
    
    def get_document_schema(self, document_type: str) -> Dict[str, Any]:
        """获取文档数据结构"""
        
        schemas = {
            "insurance_proposal": {
                "type": "object",
                "required": ["user_info", "needs_analysis", "product_recommendations"],
                "properties": {
                    "user_info": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "age": {"type": "integer"},
                            "gender": {"type": "string", "enum": ["male", "female"]},
                            "occupation": {"type": "string"},
                            "annual_income": {"type": "number"}
                        }
                    },
                    "family_info": {
                        "type": "object",
                        "properties": {
                            "marital_status": {"type": "string"},
                            "children_count": {"type": "integer"},
                            "dependents": {"type": "array"}
                        }
                    },
                    "needs_analysis": {
                        "type": "object",
                        "properties": {
                            "risk_tolerance": {"type": "string", "enum": ["low", "medium", "high"]},
                            "coverage_gaps": {"type": "array"},
                            "priority_needs": {"type": "array"}
                        }
                    },
                    "product_recommendations": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "product_name": {"type": "string"},
                                "product_type": {"type": "string"},
                                "coverage_amount": {"type": "number"},
                                "premium": {"type": "number"},
                                "payment_period": {"type": "integer"},
                                "reasons": {"type": "array"}
                            }
                        }
                    },
                    "total_premium": {"type": "number"},
                    "premium_ratio": {"type": "number", "description": "保费占收入比例"}
                }
            },
            # ... 其他文档类型的schema
        }
        
        return schemas.get(document_type, {})
    
    def validate_document_data(
        self,
        document_type: str,
        data: Dict[str, Any]
    ) -> List[str]:
        """验证保险文档数据"""
        errors = []
        
        if document_type == "insurance_proposal":
            # 验证保费占收入比例
            if data.get("premium_ratio", 0) > 0.15:
                errors.append("建议保费不超过年收入的15%")
            
            # 验证必要信息
            if not data.get("user_info", {}).get("age"):
                errors.append("用户年龄信息缺失")
        
        elif document_type == "application_form":
            # 投保单必须有详细个人信息
            required = ["name", "id_number", "phone", "address"]
            applicant = data.get("applicant", {})
            for field in required:
                if not applicant.get(field):
                    errors.append(f"投保人{field}信息缺失")
        
        return errors

4.3 医疗行业插件

# src/document/plugins/medical/plugin.py

from typing import Dict, List, Any
from ..base import IndustryPlugin, IndustryDocumentType

class MedicalPlugin(IndustryPlugin):
    """医疗行业插件"""
    
    @property
    def industry_id(self) -> str:
        return "medical"
    
    @property
    def industry_name(self) -> str:
        return "医疗健康"
    
    @property
    def document_types(self) -> List[IndustryDocumentType]:
        return [
            # ========== 问诊类文档 ==========
            IndustryDocumentType(
                type_id="consultation_summary",
                name="问诊摘要",
                description="在线问诊对话的结构化摘要",
                template_id="medical/consultation_summary",
                required_fields=["chief_complaint", "present_illness", "assessment"],
                compliance_rules=["MED_001", "MED_002"]
            ),
            IndustryDocumentType(
                type_id="symptom_record",
                name="症状记录单",
                description="用户症状详细记录",
                template_id="medical/symptom_record",
                required_fields=["symptoms", "duration", "severity"],
                compliance_rules=["MED_002"]
            ),
            
            # ========== 评估类文档 ==========
            IndustryDocumentType(
                type_id="health_assessment",
                name="健康评估报告",
                description="综合健康状况评估",
                template_id="medical/health_assessment",
                required_fields=["basic_info", "health_indicators", "risk_factors", "recommendations"],
                compliance_rules=["MED_001", "MED_002"]
            ),
            IndustryDocumentType(
                type_id="disease_risk_report",
                name="疾病风险评估",
                description="特定疾病风险预测报告",
                template_id="medical/disease_risk",
                required_fields=["target_disease", "risk_factors", "risk_level", "prevention_advice"],
                compliance_rules=["MED_001"]
            ),
            IndustryDocumentType(
                type_id="nutrition_assessment",
                name="营养评估报告",
                description="个人营养状况评估",
                template_id="medical/nutrition_assessment",
                required_fields=["dietary_habits", "nutrient_analysis", "recommendations"],
                compliance_rules=["MED_001"]
            ),
            
            # ========== 计划类文档 ==========
            IndustryDocumentType(
                type_id="treatment_plan",
                name="治疗/康复计划",
                description="个性化治疗或康复方案",
                template_id="medical/treatment_plan",
                required_fields=["diagnosis", "treatment_goals", "interventions", "timeline"],
                compliance_rules=["MED_001", "MED_002"]
            ),
            IndustryDocumentType(
                type_id="medication_guide",
                name="用药指导",
                description="药物使用说明和注意事项",
                template_id="medical/medication_guide",
                required_fields=["medications", "dosage", "precautions", "interactions"],
                compliance_rules=["MED_001"]
            ),
            IndustryDocumentType(
                type_id="lifestyle_plan",
                name="生活方式管理计划",
                description="饮食、运动、作息等生活管理建议",
                template_id="medical/lifestyle_plan",
                required_fields=["current_habits", "improvement_goals", "action_plan"],
                compliance_rules=["MED_001"]
            ),
            
            # ========== 签署类文档 ==========
            IndustryDocumentType(
                type_id="informed_consent",
                name="知情同意书",
                description="治疗/检查前的知情同意",
                template_id="medical/informed_consent",
                required_fields=["procedure", "risks", "alternatives", "patient_consent"],
                compliance_rules=["MED_001", "MED_002", "COMMON_001"]
            ),
            IndustryDocumentType(
                type_id="privacy_authorization",
                name="隐私授权书",
                description="医疗信息使用授权",
                template_id="medical/privacy_authorization",
                required_fields=["data_scope", "usage_purpose", "authorization_period"],
                compliance_rules=["MED_002", "COMMON_001"]
            ),
            
            # ========== 转诊类文档 ==========
            IndustryDocumentType(
                type_id="referral_letter",
                name="转诊建议书",
                description="建议就医/转诊的说明文档",
                template_id="medical/referral_letter",
                required_fields=["reason", "urgency", "recommended_specialty", "summary"],
                compliance_rules=["MED_001"]
            )
        ]
    
    async def extract_content(
        self,
        conversation: List[Dict]
    ) -> Dict[str, Any]:
        """提取医疗相关内容"""
        
        return {
            # 主诉
            "chief_complaint": await self._extract_chief_complaint(conversation),
            
            # 现病史
            "present_illness": await self._extract_present_illness(conversation),
            
            # 症状详情
            "symptoms": await self._extract_symptoms(conversation),
            
            # 既往史
            "medical_history": await self._extract_medical_history(conversation),
            
            # 家族史
            "family_history": await self._extract_family_history(conversation),
            
            # 过敏史
            "allergies": await self._extract_allergies(conversation),
            
            # 用药史
            "medications": await self._extract_medications(conversation),
            
            # 生活习惯
            "lifestyle": await self._extract_lifestyle(conversation),
            
            # AI评估
            "ai_assessment": await self._extract_ai_assessment(conversation),
            
            # AI建议
            "ai_recommendations": await self._extract_ai_recommendations(conversation),
        }
    
    async def _extract_chief_complaint(self, conversation: List[Dict]) -> str:
        """提取主诉"""
        prompt = """
        从对话中提取用户的主诉(主要症状/问题),用一句话概括:
        格式:"[部位/症状][持续时间][主要特点]"
        例如:"头痛3天,伴有恶心"
        """
        # ... LLM调用
        return ""
    
    async def _extract_symptoms(self, conversation: List[Dict]) -> List[Dict]:
        """提取症状详情"""
        prompt = """
        从对话中提取所有提及的症状,返回JSON数组:
        [
            {
                "symptom": 症状名称,
                "location": 部位,
                "duration": 持续时间,
                "severity": 严重程度(1-10),
                "frequency": 发作频率,
                "triggers": 诱发因素,
                "relieving_factors": 缓解因素,
                "associated_symptoms": 伴随症状
            }
        ]
        """
        # ... LLM调用
        return []
    
    async def _extract_ai_assessment(self, conversation: List[Dict]) -> Dict:
        """提取AI评估结果"""
        prompt = """
        从对话中提取AI/专家的评估,返回JSON:
        {
            "possible_conditions": [可能的情况列表],
            "severity_assessment": 严重程度评估,
            "urgency_level": 紧急程度("routine"/"soon"/"urgent"/"emergency"),
            "confidence_level": 置信度
        }
        """
        # ... LLM调用
        return {}
    
    def get_document_schema(self, document_type: str) -> Dict[str, Any]:
        """获取医疗文档数据结构"""
        
        schemas = {
            "consultation_summary": {
                "type": "object",
                "required": ["chief_complaint", "present_illness", "assessment"],
                "properties": {
                    "patient_info": {
                        "type": "object",
                        "properties": {
                            "age": {"type": "integer"},
                            "gender": {"type": "string"},
                            "height": {"type": "number"},
                            "weight": {"type": "number"}
                        }
                    },
                    "chief_complaint": {"type": "string", "description": "主诉"},
                    "present_illness": {
                        "type": "object",
                        "properties": {
                            "onset": {"type": "string", "description": "起病情况"},
                            "progression": {"type": "string", "description": "病情演变"},
                            "symptoms": {"type": "array", "description": "症状列表"}
                        }
                    },
                    "medical_history": {
                        "type": "object",
                        "properties": {
                            "past_illnesses": {"type": "array"},
                            "surgeries": {"type": "array"},
                            "allergies": {"type": "array"},
                            "current_medications": {"type": "array"}
                        }
                    },
                    "assessment": {
                        "type": "object",
                        "properties": {
                            "possible_conditions": {"type": "array"},
                            "differential_diagnosis": {"type": "array"},
                            "severity": {"type": "string"},
                            "urgency": {"type": "string"}
                        }
                    },
                    "recommendations": {
                        "type": "object",
                        "properties": {
                            "further_tests": {"type": "array"},
                            "lifestyle_changes": {"type": "array"},
                            "follow_up": {"type": "string"},
                            "red_flags": {"type": "array", "description": "警示症状"}
                        }
                    },
                    "disclaimer": {"type": "string", "description": "免责声明"}
                }
            },
            
            "health_assessment": {
                "type": "object",
                "required": ["basic_info", "health_indicators", "recommendations"],
                "properties": {
                    "basic_info": {
                        "type": "object",
                        "properties": {
                            "age": {"type": "integer"},
                            "gender": {"type": "string"},
                            "bmi": {"type": "number"}
                        }
                    },
                    "health_indicators": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "indicator": {"type": "string"},
                                "value": {"type": "string"},
                                "status": {"type": "string", "enum": ["normal", "borderline", "abnormal"]},
                                "reference_range": {"type": "string"}
                            }
                        }
                    },
                    "risk_factors": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "factor": {"type": "string"},
                                "level": {"type": "string"},
                                "impact": {"type": "string"}
                            }
                        }
                    },
                    "overall_score": {"type": "integer", "minimum": 0, "maximum": 100},
                    "recommendations": {"type": "array"}
                }
            },
            
            "informed_consent": {
                "type": "object",
                "required": ["procedure", "risks", "patient_consent"],
                "properties": {
                    "procedure": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "description": {"type": "string"},
                            "purpose": {"type": "string"}
                        }
                    },
                    "risks": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "risk": {"type": "string"},
                                "probability": {"type": "string"},
                                "severity": {"type": "string"}
                            }
                        }
                    },
                    "benefits": {"type": "array"},
                    "alternatives": {"type": "array"},
                    "patient_questions": {"type": "array"},
                    "patient_consent": {
                        "type": "object",
                        "properties": {
                            "understood": {"type": "boolean"},
                            "consent_given": {"type": "boolean"},
                            "signature_placeholder": {"type": "boolean"}
                        }
                    }
                }
            }
        }
        
        return schemas.get(document_type, {})
    
    def validate_document_data(
        self,
        document_type: str,
        data: Dict[str, Any]
    ) -> List[str]:
        """验证医疗文档数据"""
        errors = []
        
        # 所有医疗文档必须有免责声明
        if not data.get("disclaimer"):
            errors.append("医疗文档必须包含免责声明")
        
        if document_type == "consultation_summary":
            # 验证主诉
            if not data.get("chief_complaint"):
                errors.append("问诊摘要必须包含主诉")
            
            # 验证评估
            assessment = data.get("assessment", {})
            if not assessment.get("possible_conditions"):
                errors.append("问诊摘要应包含可能的情况评估")
        
        elif document_type == "informed_consent":
            # 知情同意必须有签名位置
            consent = data.get("patient_consent", {})
            if not consent.get("signature_placeholder"):
                errors.append("知情同意书必须预留签名位置")
        
        elif document_type == "medication_guide":
            # 用药指导必须有用药禁忌
            if not data.get("contraindications"):
                errors.append("用药指导应包含禁忌症说明")
        
        return errors

4.4 理财行业插件

# src/document/plugins/finance/plugin.py

from typing import Dict, List, Any
from ..base import IndustryPlugin, IndustryDocumentType

class FinancePlugin(IndustryPlugin):
    """理财行业插件"""
    
    @property
    def industry_id(self) -> str:
        return "finance"
    
    @property
    def industry_name(self) -> str:
        return "理财投资"
    
    @property
    def document_types(self) -> List[IndustryDocumentType]:
        return [
            IndustryDocumentType(
                type_id="advisory_report",
                name="理财建议书",
                description="综合理财规划建议",
                template_id="finance/advisory_report",
                required_fields=["client_profile", "financial_goals", "recommendations"],
                compliance_rules=["FIN_001", "FIN_002"]
            ),
            IndustryDocumentType(
                type_id="portfolio_plan",
                name="资产配置方案",
                description="投资组合配置建议",
                template_id="finance/portfolio_plan",
                required_fields=["risk_profile", "asset_allocation", "expected_return"],
                compliance_rules=["FIN_001"]
            ),
            IndustryDocumentType(
                type_id="risk_assessment",
                name="风险测评报告",
                description="投资者风险承受能力评估",
                template_id="finance/risk_assessment",
                required_fields=["questionnaire_results", "risk_level", "suitable_products"],
                compliance_rules=["FIN_002"]
            ),
            IndustryDocumentType(
                type_id="product_analysis",
                name="产品分析报告",
                description="理财产品详细分析",
                template_id="finance/product_analysis",
                required_fields=["product_info", "risk_analysis", "suitability"],
                compliance_rules=["FIN_001"]
            ),
            IndustryDocumentType(
                type_id="risk_disclosure",
                name="风险揭示书",
                description="投资风险确认文件",
                template_id="finance/risk_disclosure",
                required_fields=["risk_items", "investor_confirmation"],
                compliance_rules=["FIN_001", "COMMON_001"]
            ),
            IndustryDocumentType(
                type_id="investment_agreement",
                name="投资协议",
                description="投资服务协议",
                template_id="finance/investment_agreement",
                required_fields=["parties", "service_scope", "fee_structure"],
                compliance_rules=["FIN_001", "FIN_002", "COMMON_001"]
            )
        ]
    
    async def extract_content(
        self,
        conversation: List[Dict]
    ) -> Dict[str, Any]:
        """提取理财相关内容"""
        
        return {
            # 客户基本信息
            "client_profile": await self._extract_client_profile(conversation),
            
            # 财务状况
            "financial_status": await self._extract_financial_status(conversation),
            
            # 投资目标
            "investment_goals": await self._extract_investment_goals(conversation),
            
            # 风险偏好
            "risk_preference": await self._extract_risk_preference(conversation),
            
            # 投资经验
            "investment_experience": await self._extract_investment_experience(conversation),
            
            # 现有投资
            "current_investments": await self._extract_current_investments(conversation),
            
            # 顾虑和问题
            "concerns": await self._extract_concerns(conversation),
            
            # 推荐方案
            "recommendations": await self._extract_recommendations(conversation),
        }
    
    async def _extract_financial_status(self, conversation: List[Dict]) -> Dict:
        """提取财务状况"""
        prompt = """
        从对话中提取用户的财务状况,返回JSON:
        {
            "annual_income": 年收入,
            "monthly_expense": 月支出,
            "total_assets": 总资产,
            "total_liabilities": 总负债,
            "investable_assets": 可投资资产,
            "emergency_fund": 应急资金
        }
        """
        return {}
    
    async def _extract_investment_goals(self, conversation: List[Dict]) -> List[Dict]:
        """提取投资目标"""
        prompt = """
        从对话中提取用户的投资目标,返回JSON数组:
        [
            {
                "goal": 目标描述,
                "target_amount": 目标金额,
                "time_horizon": 时间期限(年),
                "priority": 优先级(1-5)
            }
        ]
        """
        return []
    
    def get_document_schema(self, document_type: str) -> Dict[str, Any]:
        """获取理财文档数据结构"""
        
        schemas = {
            "advisory_report": {
                "type": "object",
                "required": ["client_profile", "financial_goals", "recommendations"],
                "properties": {
                    "client_profile": {
                        "type": "object",
                        "properties": {
                            "age": {"type": "integer"},
                            "occupation": {"type": "string"},
                            "risk_tolerance": {"type": "string", "enum": ["conservative", "moderate", "aggressive"]}
                        }
                    },
                    "financial_status": {
                        "type": "object",
                        "properties": {
                            "income": {"type": "number"},
                            "expenses": {"type": "number"},
                            "assets": {"type": "number"},
                            "liabilities": {"type": "number"}
                        }
                    },
                    "financial_goals": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "goal": {"type": "string"},
                                "amount": {"type": "number"},
                                "timeline": {"type": "string"}
                            }
                        }
                    },
                    "recommendations": {
                        "type": "object",
                        "properties": {
                            "asset_allocation": {"type": "object"},
                            "product_suggestions": {"type": "array"},
                            "action_steps": {"type": "array"}
                        }
                    },
                    "risk_warnings": {"type": "array"},
                    "disclaimer": {"type": "string"}
                }
            },
            
            "portfolio_plan": {
                "type": "object",
                "required": ["risk_profile", "asset_allocation"],
                "properties": {
                    "risk_profile": {
                        "type": "object",
                        "properties": {
                            "risk_level": {"type": "string"},
                            "risk_score": {"type": "integer"},
                            "investment_horizon": {"type": "string"}
                        }
                    },
                    "asset_allocation": {
                        "type": "object",
                        "properties": {
                            "equity": {"type": "number", "description": "股票占比"},
                            "fixed_income": {"type": "number", "description": "固收占比"},
                            "alternatives": {"type": "number", "description": "另类投资占比"},
                            "cash": {"type": "number", "description": "现金占比"}
                        }
                    },
                    "expected_return": {
                        "type": "object",
                        "properties": {
                            "conservative": {"type": "number"},
                            "expected": {"type": "number"},
                            "optimistic": {"type": "number"}
                        }
                    },
                    "rebalancing_strategy": {"type": "string"},
                    "risk_warnings": {"type": "array"}
                }
            }
        }
        
        return schemas.get(document_type, {})
    
    def validate_document_data(
        self,
        document_type: str,
        data: Dict[str, Any]
    ) -> List[str]:
        """验证理财文档数据"""
        errors = []
        
        # 所有理财文档必须有风险警示
        if not data.get("risk_warnings") and not data.get("disclaimer"):
            errors.append("理财文档必须包含风险提示")
        
        if document_type == "advisory_report":
            # 验证风险匹配
            risk_level = data.get("client_profile", {}).get("risk_tolerance")
            recommendations = data.get("recommendations", {})
            
            # 检查推荐产品是否与风险等级匹配
            # ... 具体验证逻辑
        
        elif document_type == "portfolio_plan":
            # 验证资产配置总和为100%
            allocation = data.get("asset_allocation", {})
            total = sum(allocation.values())
            if abs(total - 100) > 0.1:
                errors.append(f"资产配置比例总和应为100%,当前为{total}%")
        
        return errors

五、API设计

5.1 完整API定义

# src/document/api/document.py

from fastapi import APIRouter, HTTPException, BackgroundTasks
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any
from enum import Enum
from datetime import datetime

router = APIRouter(prefix="/api/v1/document", tags=["文档生成"])


# ==================== 数据模型 ====================

class OutputFormat(str, Enum):
    PDF = "pdf"
    WORD = "word"
    HTML = "html"
    MARKDOWN = "markdown"
    JSON = "json"


class DocumentStatus(str, Enum):
    PENDING = "pending"
    PROCESSING = "processing"
    COMPLETED = "completed"
    FAILED = "failed"


class GenerateDocumentRequest(BaseModel):
    """生成文档请求"""
    session_id: str = Field(..., description="会话ID")
    device_id: str = Field(..., description="设备ID")
    industry: str = Field(..., description="行业标识")
    document_type: str = Field(..., description="文档类型")
    output_format: OutputFormat = Field(default=OutputFormat.PDF, description="输出格式")
    template_id: Optional[str] = Field(None, description="指定模板ID")
    custom_data: Optional[Dict[str, Any]] = Field(None, description="自定义数据(覆盖提取数据)")
    options: Optional[Dict[str, Any]] = Field(None, description="生成选项")
    
    class Config:
        schema_extra = {
            "example": {
                "session_id": "sess_123456",
                "device_id": "dev_abcdef",
                "industry": "insurance",
                "document_type": "insurance_proposal",
                "output_format": "pdf",
                "custom_data": {
                    "user_info": {"name": "张三"}
                }
            }
        }


class GenerateDocumentResponse(BaseModel):
    """生成文档响应"""
    document_id: str = Field(..., description="文档ID")
    status: DocumentStatus = Field(..., description="生成状态")
    message: str = Field(..., description="状态消息")
    download_url: Optional[str] = Field(None, description="下载链接(完成时)")
    preview_url: Optional[str] = Field(None, description="预览链接")
    expires_at: Optional[datetime] = Field(None, description="链接过期时间")


class DocumentPreviewRequest(BaseModel):
    """文档预览请求"""
    session_id: str
    device_id: str
    industry: str
    document_type: str
    custom_data: Optional[Dict[str, Any]] = None


class DocumentPreviewResponse(BaseModel):
    """文档预览响应"""
    html_content: str = Field(..., description="HTML预览内容")
    extracted_data: Dict[str, Any] = Field(..., description="提取的数据")
    compliance_result: Dict[str, Any] = Field(..., description="合规检查结果")


class BatchExportRequest(BaseModel):
    """批量导出请求"""
    device_id: str
    session_ids: Optional[List[str]] = Field(None, description="指定会话ID列表")
    date_range: Optional[List[str]] = Field(None, description="日期范围 [start, end]")
    industry: str
    document_type: str
    output_format: OutputFormat = OutputFormat.PDF
    include_summary: bool = Field(default=True, description="是否包含摘要")
    merge_to_single: bool = Field(default=False, description="是否合并为单个文件")


class TemplateListRequest(BaseModel):
    """模板列表请求"""
    industry: Optional[str] = None
    document_type: Optional[str] = None


class TemplateInfo(BaseModel):
    """模板信息"""
    template_id: str
    name: str
    description: str
    industry: str
    document_type: str
    supported_formats: List[str]
    required_fields: List[str]
    preview_image: Optional[str] = None


# ==================== API端点 ====================

@router.post("/generate", response_model=GenerateDocumentResponse)
async def generate_document(
    request: GenerateDocumentRequest,
    background_tasks: BackgroundTasks
):
    """
    生成文档
    
    流程:
    1. 从对话中提取内容
    2. 合并自定义数据
    3. 合规检查
    4. 渲染模板
    5. 转换格式
    6. 存储并返回下载链接
    """
    from ..orchestrator import DocumentOrchestrator
    
    orchestrator = DocumentOrchestrator()
    
    # 异步生成
    document_id = await orchestrator.start_generation(
        session_id=request.session_id,
        device_id=request.device_id,
        industry=request.industry,
        document_type=request.document_type,
        output_format=request.output_format,
        template_id=request.template_id,
        custom_data=request.custom_data,
        options=request.options
    )
    
    return GenerateDocumentResponse(
        document_id=document_id,
        status=DocumentStatus.PROCESSING,
        message="文档生成中,请稍候..."
    )


@router.get("/status/{document_id}", response_model=GenerateDocumentResponse)
async def get_document_status(document_id: str):
    """
    查询文档生成状态
    """
    from ..orchestrator import DocumentOrchestrator
    
    orchestrator = DocumentOrchestrator()
    result = await orchestrator.get_status(document_id)
    
    if not result:
        raise HTTPException(status_code=404, detail="文档不存在")
    
    return result


@router.post("/preview", response_model=DocumentPreviewResponse)
async def preview_document(request: DocumentPreviewRequest):
    """
    预览文档
    
    返回HTML预览和提取的数据,不生成最终文件
    """
    from ..orchestrator import DocumentOrchestrator
    
    orchestrator = DocumentOrchestrator()
    
    result = await orchestrator.preview(
        session_id=request.session_id,
        device_id=request.device_id,
        industry=request.industry,
        document_type=request.document_type,
        custom_data=request.custom_data
    )
    
    return result


@router.get("/download/{document_id}")
async def download_document(document_id: str):
    """
    下载文档
    """
    from fastapi.responses import FileResponse
    from ..storage.document_store import DocumentStore
    
    store = DocumentStore()
    doc_info = await store.get(document_id)
    
    if not doc_info:
        raise HTTPException(status_code=404, detail="文档不存在或已过期")
    
    return FileResponse(
        path=doc_info["file_path"],
        filename=doc_info["filename"],
        media_type=doc_info["content_type"]
    )


@router.post("/batch-export")
async def batch_export(
    request: BatchExportRequest,
    background_tasks: BackgroundTasks
):
    """
    批量导出文档
    """
    from ..orchestrator import DocumentOrchestrator
    
    orchestrator = DocumentOrchestrator()
    
    export_id = await orchestrator.start_batch_export(
        device_id=request.device_id,
        session_ids=request.session_ids,
        date_range=request.date_range,
        industry=request.industry,
        document_type=request.document_type,
        output_format=request.output_format,
        include_summary=request.include_summary,
        merge_to_single=request.merge_to_single
    )
    
    return {
        "export_id": export_id,
        "status": "processing",
        "message": "批量导出任务已创建"
    }


@router.get("/templates", response_model=List[TemplateInfo])
async def list_templates(
    industry: Optional[str] = None,
    document_type: Optional[str] = None
):
    """
    列出可用模板
    """
    from ..core.template_engine import TemplateEngine
    
    engine = TemplateEngine("src/document/templates")
    templates = engine.list_templates(industry=industry, template_type=document_type)
    
    return [
        TemplateInfo(
            template_id=t.template_id,
            name=t.name,
            description=t.description,
            industry=t.industry,
            document_type=t.template_type.value,
            supported_formats=t.supported_formats or ["pdf", "html"],
            required_fields=t.required_fields or []
        )
        for t in templates
    ]


@router.get("/templates/{template_id}/schema")
async def get_template_schema(template_id: str):
    """
    获取模板数据结构
    """
    from ..core.template_engine import TemplateEngine
    
    engine = TemplateEngine("src/document/templates")
    return engine.get_template_schema(template_id)


@router.get("/industries")
async def list_industries():
    """
    列出支持的行业
    """
    from ..plugins import get_all_plugins
    
    plugins = get_all_plugins()
    
    return [
        {
            "industry_id": p.industry_id,
            "industry_name": p.industry_name,
            "document_types": [
                {
                    "type_id": dt.type_id,
                    "name": dt.name,
                    "description": dt.description
                }
                for dt in p.document_types
            ]
        }
        for p in plugins
    ]


@router.get("/industries/{industry}/document-types")
async def list_document_types(industry: str):
    """
    列出行业支持的文档类型
    """
    from ..plugins import get_plugin
    
    plugin = get_plugin(industry)
    if not plugin:
        raise HTTPException(status_code=404, detail=f"行业不存在: {industry}")
    
    return [
        {
            "type_id": dt.type_id,
            "name": dt.name,
            "description": dt.description,
            "template_id": dt.template_id,
            "required_fields": dt.required_fields,
            "compliance_rules": dt.compliance_rules
        }
        for dt in plugin.document_types
    ]

5.2 对话摘要API

# src/document/api/summary.py

from fastapi import APIRouter, HTTPException
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any

router = APIRouter(prefix="/api/v1/summary", tags=["对话摘要"])


class SummaryRequest(BaseModel):
    """摘要请求"""
    session_id: str
    device_id: str
    summary_type: str = Field(default="comprehensive", description="摘要类型: brief/comprehensive/structured")
    include_recommendations: bool = Field(default=True, description="是否包含建议")
    include_decisions: bool = Field(default=True, description="是否包含决策")
    language: str = Field(default="zh", description="输出语言")


class ConversationSummary(BaseModel):
    """对话摘要"""
    session_id: str
    summary_time: str
    
    # 基础摘要
    brief_summary: str = Field(..., description="简短摘要(50字内)")
    comprehensive_summary: str = Field(..., description="完整摘要(200字内)")
    
    # 结构化信息
    key_points: List[str] = Field(default=[], description="关键要点")
    user_needs: List[Dict] = Field(default=[], description="用户需求")
    expert_recommendations: List[Dict] = Field(default=[], description="专家建议")
    decisions_made: List[str] = Field(default=[], description="已做决策")
    pending_items: List[str] = Field(default=[], description="待处理事项")
    
    # 元数据
    conversation_turns: int = Field(..., description="对话轮数")
    duration_minutes: int = Field(..., description="对话时长")
    satisfaction_score: Optional[float] = Field(None, description="满意度评分")


class MultiSessionSummaryRequest(BaseModel):
    """多会话摘要请求"""
    device_id: str
    session_ids: Optional[List[str]] = None
    date_range: Optional[List[str]] = None
    industry: Optional[str] = None


class MultiSessionSummary(BaseModel):
    """多会话汇总"""
    device_id: str
    period: str
    
    # 汇总统计
    total_sessions: int
    total_turns: int
    total_duration_minutes: int
    
    # 主题分布
    topic_distribution: Dict[str, int]
    
    # 需求汇总
    aggregated_needs: List[Dict]
    
    # 建议汇总
    aggregated_recommendations: List[Dict]
    
    # 决策追踪
    decisions_timeline: List[Dict]
    
    # 会话摘要列表
    session_summaries: List[ConversationSummary]


@router.post("/conversation", response_model=ConversationSummary)
async def get_conversation_summary(request: SummaryRequest):
    """
    获取单次对话摘要
    """
    from ..core.content_extractor import ContentExtractor
    
    extractor = ContentExtractor()
    
    extracted = await extractor.extract(
        session_id=request.session_id,
        device_id=request.device_id
    )
    
    # 生成摘要
    summary = await _generate_summary(extracted, request.summary_type)
    
    return summary


@router.post("/multi-session", response_model=MultiSessionSummary)
async def get_multi_session_summary(request: MultiSessionSummaryRequest):
    """
    获取多会话汇总
    """
    # ... 实现
    pass


@router.get("/export/{session_id}")
async def export_summary(
    session_id: str,
    device_id: str,
    format: str = "markdown"
):
    """
    导出对话摘要
    
    支持格式: markdown, html, json, txt
    """
    # ... 实现
    pass

六、模板示例

6.1 保险方案书模板

<!-- templates/insurance/proposal.html.j2 -->
<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <title>保险方案建议书</title>
    <style>
        /* 基础样式 */
        body {
            font-family: "SimSun", "Microsoft YaHei", sans-serif;
            font-size: 12pt;
            line-height: 1.8;
            color: #333;
            max-width: 800px;
            margin: 0 auto;
            padding: 40px;
        }
        
        /* 标题样式 */
        h1 {
            text-align: center;
            font-size: 20pt;
            color: #1a5490;
            border-bottom: 2px solid #1a5490;
            padding-bottom: 10px;
        }
        
        h2 {
            font-size: 14pt;
            color: #1a5490;
            margin-top: 30px;
            border-left: 4px solid #1a5490;
            padding-left: 10px;
        }
        
        /* 信息卡片 */
        .info-card {
            background: #f8f9fa;
            border-radius: 8px;
            padding: 20px;
            margin: 20px 0;
        }
        
        .info-row {
            display: flex;
            margin: 10px 0;
        }
        
        .info-label {
            width: 120px;
            color: #666;
        }
        
        .info-value {
            flex: 1;
            font-weight: bold;
        }
        
        /* 表格样式 */
        table {
            width: 100%;
            border-collapse: collapse;
            margin: 20px 0;
        }
        
        th, td {
            border: 1px solid #ddd;
            padding: 12px;
            text-align: left;
        }
        
        th {
            background: #1a5490;
            color: white;
        }
        
        tr:nth-child(even) {
            background: #f8f9fa;
        }
        
        /* 推荐产品卡片 */
        .product-card {
            border: 1px solid #ddd;
            border-radius: 8px;
            padding: 20px;
            margin: 15px 0;
        }
        
        .product-name {
            font-size: 14pt;
            font-weight: bold;
            color: #1a5490;
        }
        
        .product-type {
            display: inline-block;
            background: #e8f0f8;
            color: #1a5490;
            padding: 2px 8px;
            border-radius: 4px;
            font-size: 10pt;
            margin-left: 10px;
        }
        
        .product-detail {
            margin-top: 15px;
        }
        
        /* 风险提示 */
        .risk-warning {
            background: #fff3cd;
            border: 1px solid #ffc107;
            border-radius: 4px;
            padding: 15px;
            margin: 20px 0;
        }
        
        .risk-warning-title {
            color: #856404;
            font-weight: bold;
        }
        
        /* 签名区 */
        .signature-area {
            margin-top: 50px;
            padding-top: 30px;
            border-top: 1px dashed #ccc;
        }
        
        .signature-row {
            display: flex;
            justify-content: space-between;
            margin-top: 30px;
        }
        
        .signature-box {
            width: 45%;
        }
        
        .signature-line {
            border-bottom: 1px solid #333;
            height: 40px;
            margin-top: 10px;
        }
        
        /* 页脚 */
        .footer {
            margin-top: 50px;
            padding-top: 20px;
            border-top: 1px solid #ddd;
            font-size: 10pt;
            color: #666;
            text-align: center;
        }
        
        /* 打印样式 */
        @media print {
            body {
                padding: 20px;
            }
            .page-break {
                page-break-before: always;
            }
        }
    </style>
</head>
<body>
    <!-- 封面 -->
    <h1>保险方案建议书</h1>
    
    <div style="text-align: center; margin: 30px 0;">
        <p style="font-size: 14pt;">尊敬的 <strong>{{ user_info.name | default('客户') }}</strong></p>
        <p>感谢您对我们的信任,以下是根据您的需求定制的保险方案</p>
        <p style="color: #666;">方案编号: {{ document_id }} | 生成日期: {{ generation_time | date_format }}</p>
    </div>
    
    <!-- 客户信息 -->
    <h2>一、客户基本信息</h2>
    <div class="info-card">
        <div class="info-row">
            <span class="info-label">姓名:</span>
            <span class="info-value">{{ user_info.name | default('-') }}</span>
            <span class="info-label">年龄:</span>
            <span class="info-value">{{ user_info.age | default('-') }} 岁</span>
        </div>
        <div class="info-row">
            <span class="info-label">性别:</span>
            <span class="info-value">{% if user_info.gender == 'male' %}男{% elif user_info.gender == 'female' %}女{% else %}-{% endif %}</span>
            <span class="info-label">职业:</span>
            <span class="info-value">{{ user_info.occupation | default('-') }}</span>
        </div>
        <div class="info-row">
            <span class="info-label">年收入:</span>
            <span class="info-value">{{ user_info.annual_income | currency if user_info.annual_income else '-' }}</span>
            <span class="info-label">家庭状况:</span>
            <span class="info-value">{{ family_info.marital_status | default('-') }}{% if family_info.children_count %},{{ family_info.children_count }}个子女{% endif %}</span>
        </div>
    </div>
    
    <!-- 需求分析 -->
    <h2>二、保障需求分析</h2>
    
    <h3>2.1 风险评估</h3>
    <div class="info-card">
        <div class="info-row">
            <span class="info-label">风险承受能力:</span>
            <span class="info-value">
                {% if needs_analysis.risk_tolerance == 'low' %}保守型
                {% elif needs_analysis.risk_tolerance == 'medium' %}稳健型
                {% elif needs_analysis.risk_tolerance == 'high' %}进取型
                {% else %}-{% endif %}
            </span>
        </div>
        <div class="info-row">
            <span class="info-label">建议保费预算:</span>
            <span class="info-value">年收入的 5%-10%,约 {{ (user_info.annual_income * 0.05) | currency }} - {{ (user_info.annual_income * 0.1) | currency }}</span>
        </div>
    </div>
    
    <h3>2.2 保障缺口分析</h3>
    <table>
        <thead>
            <tr>
                <th>保障类型</th>
                <th>建议保额</th>
                <th>现有保额</th>
                <th>缺口</th>
                <th>优先级</th>
            </tr>
        </thead>
        <tbody>
            {% for gap in needs_analysis.coverage_gaps | default([]) %}
            <tr>
                <td>{{ gap.type }}</td>
                <td>{{ gap.recommended | currency }}</td>
                <td>{{ gap.current | currency }}</td>
                <td style="color: {% if gap.gap > 0 %}#dc3545{% else %}#28a745{% endif %};">
                    {{ gap.gap | currency }}
                </td>
                <td>{{ '★' * gap.priority }}{{ '☆' * (5 - gap.priority) }}</td>
            </tr>
            {% else %}
            <tr>
                <td colspan="5" style="text-align: center;">暂无保障缺口数据</td>
            </tr>
            {% endfor %}
        </tbody>
    </table>
    
    <!-- 产品推荐 -->
    <h2>三、推荐保险方案</h2>
    
    {% for product in product_recommendations | default([]) %}
    <div class="product-card">
        <div>
            <span class="product-name">{{ product.product_name }}</span>
            <span class="product-type">{{ product.product_type }}</span>
        </div>
        
        <div class="product-detail">
            <table>
                <tr>
                    <td width="25%"><strong>保障额度</strong></td>
                    <td>{{ product.coverage_amount | currency }}</td>
                    <td width="25%"><strong>年缴保费</strong></td>
                    <td>{{ product.premium | currency }}</td>
                </tr>
                <tr>
                    <td><strong>缴费期限</strong></td>
                    <td>{{ product.payment_period }} 年</td>
                    <td><strong>保障期限</strong></td>
                    <td>{{ product.coverage_period | default('终身') }}</td>
                </tr>
            </table>
            
            <p><strong>推荐理由:</strong></p>
            <ul>
                {% for reason in product.reasons | default([]) %}
                <li>{{ reason }}</li>
                {% endfor %}
            </ul>
        </div>
    </div>
    {% else %}
    <p>暂无推荐产品</p>
    {% endfor %}
    
    <!-- 保费汇总 -->
    <h2>四、保费汇总</h2>
    <div class="info-card">
        <table>
            <thead>
                <tr>
                    <th>产品名称</th>
                    <th>保障类型</th>
                    <th>年缴保费</th>
                </tr>
            </thead>
            <tbody>
                {% for product in product_recommendations | default([]) %}
                <tr>
                    <td>{{ product.product_name }}</td>
                    <td>{{ product.product_type }}</td>
                    <td>{{ product.premium | currency }}</td>
                </tr>
                {% endfor %}
                <tr style="background: #e8f0f8; font-weight: bold;">
                    <td colspan="2">年缴保费合计</td>
                    <td>{{ total_premium | currency }}</td>
                </tr>
            </tbody>
        </table>
        
        <p>
            <strong>保费占年收入比例:</strong> 
            <span style="color: {% if premium_ratio <= 0.1 %}#28a745{% elif premium_ratio <= 0.15 %}#ffc107{% else %}#dc3545{% endif %};">
                {{ premium_ratio | percent }}
            </span>
            {% if premium_ratio <= 0.1 %}
            (合理范围内)
            {% elif premium_ratio <= 0.15 %}
            (略高,请评估是否可承受)
            {% else %}
            (超出建议范围,建议调整方案)
            {% endif %}
        </p>
    </div>
    
    <!-- 风险提示 -->
    <div class="risk-warning">
        <p class="risk-warning-title">⚠️ 重要提示</p>
        <ol>
            <li><strong>投保前请仔细阅读保险条款</strong>,了解保险责任、责任免除、犹豫期等重要内容。</li>
            <li><strong>请如实告知健康状况</strong>,未如实告知可能影响理赔。</li>
            <li><strong>保险产品存在退保损失</strong>,犹豫期后退保仅退还现金价值。</li>
            <li>本方案仅供参考,具体以保险合同条款为准。</li>
            <li>如有疑问,请咨询专业保险顾问或拨打保险公司客服热线。</li>
        </ol>
    </div>
    
    <!-- 免责条款 -->
    <h2>五、免责声明</h2>
    <div style="font-size: 10pt; color: #666;">
        <p>1. 本保险方案建议书基于您提供的信息生成,如信息有误可能影响方案的适用性。</p>
        <p>2. 本建议书中的产品信息仅供参考,具体保障内容、保费费率以保险公司官方条款为准。</p>
        <p>3. 投保前请充分了解产品特点,根据自身实际需求和经济能力审慎决策。</p>
        <p>4. 本建议书不构成保险合同的组成部分。</p>
    </div>
    
    <!-- 签名区 -->
    <div class="signature-area">
        <h2>六、确认签署</h2>
        <p>本人已阅读并理解上述保险方案建议书的全部内容,了解推荐产品的保障责任、费用及风险提示。</p>
        
        <div class="signature-row">
            <div class="signature-box">
                <p>客户签名:</p>
                <div class="signature-line"></div>
                <p>日期:_____年_____月_____日</p>
            </div>
            <div class="signature-box">
                <p>顾问签名:</p>
                <div class="signature-line"></div>
                <p>日期:_____年_____月_____日</p>
            </div>
        </div>
    </div>
    
    <!-- 页脚 -->
    <div class="footer">
        <p>本文档由 MBE 智能保顾系统生成</p>
        <p>文档编号:{{ document_id }} | 生成时间:{{ generation_time | date_format('%Y-%m-%d %H:%M:%S') }}</p>
        <p>如有疑问,请联系客服</p>
    </div>
</body>
</html>

6.2 医疗问诊摘要模板

<!-- templates/medical/consultation_summary.html.j2 -->
<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <title>问诊摘要</title>
    <style>
        body {
            font-family: "SimSun", "Microsoft YaHei", sans-serif;
            font-size: 12pt;
            line-height: 1.8;
            color: #333;
            max-width: 800px;
            margin: 0 auto;
            padding: 40px;
        }
        
        .header {
            border-bottom: 2px solid #2e7d32;
            padding-bottom: 20px;
            margin-bottom: 30px;
        }
        
        .header h1 {
            color: #2e7d32;
            margin: 0;
        }
        
        .header-meta {
            color: #666;
            font-size: 10pt;
            margin-top: 10px;
        }
        
        .section {
            margin: 25px 0;
        }
        
        .section-title {
            font-size: 14pt;
            color: #2e7d32;
            border-left: 4px solid #2e7d32;
            padding-left: 10px;
            margin-bottom: 15px;
        }
        
        .patient-info {
            display: grid;
            grid-template-columns: repeat(4, 1fr);
            gap: 15px;
            background: #f5f5f5;
            padding: 20px;
            border-radius: 8px;
        }
        
        .patient-info-item {
            display: flex;
            flex-direction: column;
        }
        
        .patient-info-label {
            font-size: 10pt;
            color: #666;
        }
        
        .patient-info-value {
            font-weight: bold;
        }
        
        .chief-complaint {
            background: #e8f5e9;
            padding: 15px 20px;
            border-radius: 8px;
            font-size: 14pt;
        }
        
        .symptom-list {
            display: grid;
            grid-template-columns: repeat(2, 1fr);
            gap: 15px;
        }
        
        .symptom-card {
            background: #fff;
            border: 1px solid #ddd;
            border-radius: 8px;
            padding: 15px;
        }
        
        .symptom-name {
            font-weight: bold;
            color: #2e7d32;
        }
        
        .symptom-detail {
            font-size: 10pt;
            color: #666;
            margin-top: 5px;
        }
        
        .severity-bar {
            height: 8px;
            background: #eee;
            border-radius: 4px;
            margin-top: 10px;
            overflow: hidden;
        }
        
        .severity-fill {
            height: 100%;
            border-radius: 4px;
        }
        
        .assessment-box {
            background: #fff3e0;
            border: 1px solid #ff9800;
            border-radius: 8px;
            padding: 20px;
        }
        
        .condition-tag {
            display: inline-block;
            background: #fff;
            border: 1px solid #ddd;
            padding: 5px 15px;
            border-radius: 20px;
            margin: 5px;
        }
        
        .urgency-high { color: #d32f2f; border-color: #d32f2f; }
        .urgency-medium { color: #ff9800; border-color: #ff9800; }
        .urgency-low { color: #4caf50; border-color: #4caf50; }
        
        .recommendation-list {
            list-style: none;
            padding: 0;
        }
        
        .recommendation-item {
            display: flex;
            align-items: flex-start;
            padding: 10px 0;
            border-bottom: 1px dashed #ddd;
        }
        
        .recommendation-icon {
            width: 30px;
            height: 30px;
            background: #e8f5e9;
            border-radius: 50%;
            display: flex;
            align-items: center;
            justify-content: center;
            margin-right: 15px;
            color: #2e7d32;
        }
        
        .red-flags {
            background: #ffebee;
            border: 1px solid #ef5350;
            border-radius: 8px;
            padding: 15px;
            margin: 20px 0;
        }
        
        .red-flags-title {
            color: #c62828;
            font-weight: bold;
        }
        
        .disclaimer {
            background: #f5f5f5;
            border-radius: 8px;
            padding: 20px;
            margin-top: 30px;
            font-size: 10pt;
            color: #666;
        }
        
        .disclaimer-title {
            color: #333;
            font-weight: bold;
            margin-bottom: 10px;
        }
        
        .footer {
            margin-top: 40px;
            text-align: center;
            font-size: 10pt;
            color: #999;
        }
    </style>
</head>
<body>
    <!-- 头部 -->
    <div class="header">
        <h1>📋 在线问诊摘要</h1>
        <div class="header-meta">
            问诊编号:{{ session_id }} | 问诊时间:{{ consultation_time | date_format('%Y-%m-%d %H:%M') }}
        </div>
    </div>
    
    <!-- 患者信息 -->
    <div class="section">
        <h2 class="section-title">患者信息</h2>
        <div class="patient-info">
            <div class="patient-info-item">
                <span class="patient-info-label">年龄</span>
                <span class="patient-info-value">{{ patient_info.age | default('-') }} 岁</span>
            </div>
            <div class="patient-info-item">
                <span class="patient-info-label">性别</span>
                <span class="patient-info-value">{% if patient_info.gender == 'male' %}男{% elif patient_info.gender == 'female' %}女{% else %}-{% endif %}</span>
            </div>
            <div class="patient-info-item">
                <span class="patient-info-label">身高</span>
                <span class="patient-info-value">{{ patient_info.height | default('-') }} cm</span>
            </div>
            <div class="patient-info-item">
                <span class="patient-info-label">体重</span>
                <span class="patient-info-value">{{ patient_info.weight | default('-') }} kg</span>
            </div>
        </div>
    </div>
    
    <!-- 主诉 -->
    <div class="section">
        <h2 class="section-title">主诉</h2>
        <div class="chief-complaint">
            {{ chief_complaint | default('未记录') }}
        </div>
    </div>
    
    <!-- 症状详情 -->
    <div class="section">
        <h2 class="section-title">症状详情</h2>
        <div class="symptom-list">
            {% for symptom in present_illness.symptoms | default([]) %}
            <div class="symptom-card">
                <div class="symptom-name">{{ symptom.symptom }}</div>
                <div class="symptom-detail">
                    {% if symptom.location %}部位:{{ symptom.location }} | {% endif %}
                    {% if symptom.duration %}持续:{{ symptom.duration }} | {% endif %}
                    {% if symptom.frequency %}频率:{{ symptom.frequency }}{% endif %}
                </div>
                {% if symptom.severity %}
                <div class="severity-bar">
                    <div class="severity-fill" style="width: {{ symptom.severity * 10 }}%; background: {% if symptom.severity <= 3 %}#4caf50{% elif symptom.severity <= 6 %}#ff9800{% else %}#f44336{% endif %};"></div>
                </div>
                <div style="font-size: 9pt; color: #999; margin-top: 3px;">严重程度: {{ symptom.severity }}/10</div>
                {% endif %}
            </div>
            {% else %}
            <p>未记录详细症状</p>
            {% endfor %}
        </div>
    </div>
    
    <!-- 病史 -->
    <div class="section">
        <h2 class="section-title">相关病史</h2>
        <table style="width: 100%; border-collapse: collapse;">
            <tr>
                <td style="width: 25%; padding: 10px; background: #f5f5f5; font-weight: bold;">既往病史</td>
                <td style="padding: 10px;">{{ medical_history.past_illnesses | join_list if medical_history.past_illnesses else '无特殊' }}</td>
            </tr>
            <tr>
                <td style="padding: 10px; background: #f5f5f5; font-weight: bold;">过敏史</td>
                <td style="padding: 10px;">{{ medical_history.allergies | join_list if medical_history.allergies else '无已知过敏' }}</td>
            </tr>
            <tr>
                <td style="padding: 10px; background: #f5f5f5; font-weight: bold;">用药情况</td>
                <td style="padding: 10px;">{{ medical_history.current_medications | join_list if medical_history.current_medications else '无' }}</td>
            </tr>
        </table>
    </div>
    
    <!-- AI评估 -->
    <div class="section">
        <h2 class="section-title">AI 健康评估</h2>
        <div class="assessment-box">
            <p><strong>可能的情况:</strong></p>
            <div>
                {% for condition in assessment.possible_conditions | default([]) %}
                <span class="condition-tag">{{ condition }}</span>
                {% endfor %}
            </div>
            
            <p style="margin-top: 20px;"><strong>紧急程度:</strong>
                <span class="condition-tag urgency-{{ assessment.urgency | default('low') }}">
                    {% if assessment.urgency == 'emergency' %}🚨 紧急就医
                    {% elif assessment.urgency == 'urgent' %}⚠️ 尽快就医
                    {% elif assessment.urgency == 'soon' %}📅 建议就医
                    {% else %}✅ 可自我观察
                    {% endif %}
                </span>
            </p>
            
            {% if assessment.differential_diagnosis %}
            <p style="margin-top: 15px;"><strong>鉴别诊断:</strong></p>
            <ul>
                {% for dd in assessment.differential_diagnosis %}
                <li>{{ dd }}</li>
                {% endfor %}
            </ul>
            {% endif %}
        </div>
    </div>
    
    <!-- 警示症状 -->
    {% if recommendations.red_flags %}
    <div class="red-flags">
        <p class="red-flags-title">🚨 请注意以下警示症状</p>
        <p>如果出现以下情况,请立即就医:</p>
        <ul>
            {% for flag in recommendations.red_flags %}
            <li>{{ flag }}</li>
            {% endfor %}
        </ul>
    </div>
    {% endif %}
    
    <!-- 建议 -->
    <div class="section">
        <h2 class="section-title">健康建议</h2>
        <ul class="recommendation-list">
            {% if recommendations.further_tests %}
            <li class="recommendation-item">
                <div class="recommendation-icon">🔬</div>
                <div>
                    <strong>建议检查</strong>
                    <p>{{ recommendations.further_tests | join_list }}</p>
                </div>
            </li>
            {% endif %}
            
            {% if recommendations.lifestyle_changes %}
            <li class="recommendation-item">
                <div class="recommendation-icon">🏃</div>
                <div>
                    <strong>生活方式建议</strong>
                    <ul>
                        {% for change in recommendations.lifestyle_changes %}
                        <li>{{ change }}</li>
                        {% endfor %}
                    </ul>
                </div>
            </li>
            {% endif %}
            
            {% if recommendations.follow_up %}
            <li class="recommendation-item">
                <div class="recommendation-icon">📅</div>
                <div>
                    <strong>随访建议</strong>
                    <p>{{ recommendations.follow_up }}</p>
                </div>
            </li>
            {% endif %}
        </ul>
    </div>
    
    <!-- 免责声明 -->
    <div class="disclaimer">
        <p class="disclaimer-title">⚠️ 重要声明</p>
        <p>{{ disclaimer | default('以上内容由AI健康助手根据您描述的症状生成,仅供参考,不能替代专业医生的诊断和治疗。如有不适,请及时就医。') }}</p>
        <ul>
            <li>AI分析基于您提供的信息,可能存在局限性</li>
            <li>本摘要不构成医疗诊断或治疗建议</li>
            <li>请勿自行用药,需在专业医生指导下进行治疗</li>
            <li>如症状持续或加重,请尽快前往正规医疗机构就诊</li>
        </ul>
    </div>
    
    <!-- 页脚 -->
    <div class="footer">
        <p>本文档由 MBE 智能健康助手生成</p>
        <p>文档编号:{{ document_id }} | 生成时间:{{ generation_time | date_format('%Y-%m-%d %H:%M:%S') }}</p>
    </div>
</body>
</html>

七、实施计划

7.1 开发阶段

阶段 内容 工期 交付物
Phase 1: 基础架构 核心引擎、模板引擎、格式渲染器 2周 可运行的文档生成骨架
Phase 2: 内容提取 LLM提取器、对话摘要 1.5周 对话内容结构化能力
Phase 3: 保险插件 保险行业完整支持 2周 保险文档全流程
Phase 4: 医疗插件 医疗行业完整支持 2周 医疗文档全流程
Phase 5: 理财插件 理财行业完整支持 1.5周 理财文档全流程
Phase 6: 合规&测试 合规检查、全面测试 1周 生产可用版本

总计:约10周

7.2 技术依赖

# requirements.txt 新增

# 模板引擎
Jinja2>=3.1.0

# PDF生成
weasyprint>=60.0
# 或 reportlab>=4.0

# Word生成
python-docx>=1.0.0

# HTML解析
beautifulsoup4>=4.12.0
html2text>=2020.1.16

# JSON Schema验证
jsonschema>=4.0.0

# 中文字体支持
fonttools>=4.0.0

7.3 部署配置

# config/document_generation.yaml

document_generation:
  # 模板路径
  template_path: "src/document/templates"
  
  # 存储配置
  storage:
    type: "oss"  # local / oss / s3
    bucket: "mbe-documents"
    expire_days: 30
  
  # 渲染配置
  rendering:
    pdf:
      engine: "weasyprint"  # weasyprint / wkhtmltopdf
      page_size: "A4"
      default_margins: "2cm"
    word:
      default_font: "宋体"
      default_size: 12
  
  # LLM配置
  llm:
    extraction_model: "gpt-4"
    max_tokens: 4000
    temperature: 0.3
  
  # 合规配置
  compliance:
    strict_mode: true
    fail_on_warning: false
    
  # 行业插件
  plugins:
    enabled:
      - insurance
      - medical
      - finance
      - education

八、集成与配置

8.1 主应用集成

文档生成模块已集成到 MBE 主应用(src/main.py),启动时自动初始化:

# main.py 中的集成代码
from src.document.api.router import router as document_router, setup_document_api
from src.document.llm_adapter import create_document_orchestrator

# 应用启动时初始化
if DOCUMENT_MODULE_AVAILABLE:
    orchestrator = create_document_orchestrator()
    setup_document_api(orchestrator)
    
# 注册路由
app.include_router(document_router, tags=["文档生成"])

8.2 LLM 服务配置

文档生成模块通过 MBELLMAdapter 适配器使用 MBE 的 LLM 服务:

# src/document/llm_adapter.py

class MBELLMAdapter:
    """将 MBE 的 LLM 服务适配到文档生成模块"""
    
    async def generate(self, prompt, system_prompt="", ...):
        # 使用 src.llm.base 中的 LLM 客户端
        client = get_llm_client()
        return await client.chat(system_prompt, prompt, ...)

LLM 配置通过环境变量控制(.env 文件):

# LLM 提供商配置
LLM_PROVIDER=deepseek  # deepseek / qwen / doubao / openrouter
LLM_API_KEY=your_api_key
LLM_MODEL=deepseek-chat

# 弹性客户端配置(推荐启用)
MBE_USE_RESILIENT_LLM=true

8.3 已注册的 API 端点

端点 方法 说明
/api/v1/document/generate POST 生成文档
/api/v1/document/status/{id} GET 查询文档状态
/api/v1/document/preview POST 预览文档内容
/api/v1/document/download/{id} GET 下载文档文件
/api/v1/document/{id} DELETE 删除文档
/api/v1/document/list GET 列出文档
/api/v1/document/industries GET 获取支持的行业列表
/api/v1/document/batch-export POST 批量导出
/api/v1/summary/conversation POST 生成对话摘要
/api/v1/summary/multi-session POST 多会话摘要
/api/v1/summary/export POST 导出摘要
/api/v1/templates GET 获取模板列表
/api/v1/templates/{id} GET 获取模板详情
/api/v1/templates/{id}/schema GET 获取模板数据结构
/api/v1/templates/{id}/preview POST 预览模板效果

8.4 代码使用示例

# 在其他模块中使用文档生成功能
from src.document.llm_adapter import create_document_orchestrator

async def generate_insurance_proposal(conversation: list, user_info: dict):
    """为用户生成保险方案建议书"""
    
    orchestrator = create_document_orchestrator()
    
    result = await orchestrator.generate(
        session_id="session_123",
        device_id="device_abc",
        industry="insurance",
        document_type="insurance_proposal",
        output_format="pdf",
        conversation=conversation,
        custom_data={
            "user_info": user_info,
            "generation_options": {
                "include_disclaimer": True,
                "style": "professional"
            }
        }
    )
    
    return result

# 生成对话摘要
async def summarize_conversation(conversation: list):
    """生成对话摘要"""
    
    from src.document.core.content_extractor import ContentExtractor
    
    extractor = ContentExtractor()
    content = await extractor.extract(
        session_id="session_123",
        conversation=conversation,
        industry="general"
    )
    
    return {
        "summary": content.summary,
        "key_points": content.key_points,
        "user_needs": content.user_needs
    }

九、总结

本文档生成模块设计方案具有以下特点:

  1. 通用性强:通过插件架构支持多行业扩展
  2. 合规优先:内置行业合规检查,降低法律风险
  3. LLM增强:利用大模型能力自动提取和优化内容
  4. 格式灵活:支持PDF、Word、HTML、Markdown等多种输出
  5. 模板驱动:内容与样式分离,易于维护和定制
  6. 可追溯:文档与源对话关联,支持审计
  7. 无缝集成:已集成到MBE主应用,开箱即用

通过本模块,MBE平台将具备完整的"对话→理解→文档"闭环能力,为各行业提供专业的文档生成服务。