MBE 文档生成模块设计方案
文档版本: v1.0
创建日期: 2026-01-31
模块名称: Document Generation Engine (DGE)
核心能力: 对话摘要、方案生成、签署文件、多格式导出
一、模块概述
1.1 设计目标
┌─────────────────────────────────────────────────────────────────────────┐
│ 文档生成模块核心目标 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 【功能目标】 │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 1. 对话记录整理:自动提取、结构化、格式化用户对话内容 │ │
│ │ 2. 方案文档生成:基于对话生成专业的建议书、方案书 │ │
│ │ 3. 签署文件生成:生成需要用户确认/签署的正式文件 │ │
│ │ 4. 多格式输出:支持 PDF、Word、HTML、Markdown、JSON │ │
│ │ 5. 多场景适配:保险、医疗、理财、教育、法律等垂直领域 │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ 【设计原则】 │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ • 模板驱动:通过模板实现内容与格式分离 │ │
│ │ • 插件式架构:新场景通过插件扩展,无需修改核心代码 │ │
│ │ • LLM增强:利用大模型能力进行内容优化和摘要生成 │ │
│ │ • 合规优先:敏感行业(医疗、金融)严格遵循监管要求 │ │
│ │ • 可追溯性:文档与源对话可关联追溯 │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
1.2 支持的行业场景
| 行业 | 典型文档类型 | 签署文件 | 合规要求 |
|---|---|---|---|
| 保险 | 保险方案书、需求分析报告、产品对比表 | 投保单、健康告知书、风险确认书 | 保险销售可回溯 |
| 医疗 | 问诊摘要、健康评估报告、康复计划 | 知情同意书、治疗确认书 | 医疗文书规范、隐私保护 |
| 理财 | 理财建议书、资产配置方案、投资组合报告 | 风险测评确认书、投资协议 | 适当性管理、风险揭示 |
| 教育 | 学习报告、课程规划、能力评估 | 服务协议、课程确认书 | 教育培训合同规范 |
| 法律 | 法律咨询摘要、案件分析报告 | 委托协议、授权书 | 律师执业规范 |
| 心理 | 咨询记录、心理评估报告、干预计划 | 咨询同意书、保密协议 | 心理咨询伦理规范 |
二、系统架构
2.1 整体架构图
┌─────────────────────────────────────────────────────────────────────────────┐
│ 文档生成模块 (Document Generation Engine) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ API Layer │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Summary │ │ Document │ │ Signing │ │ Export │ │ Template │ │ │
│ │ │ API │ │Generate │ │ API │ │ API │ │ API │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ └───────┼────────────┼────────────┼────────────┼────────────┼────────┘ │
│ │ │ │ │ │ │
│ ┌───────┴────────────┴────────────┴────────────┴────────────┴────────┐ │
│ │ Core Engine Layer │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ Document Orchestrator │ │ │
│ │ │ (文档编排器:协调各组件完成文档生成流程) │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────┐ │ │
│ │ │ Content │ │ Template │ │ Format │ │ Compliance │ │ │
│ │ │ Extractor │ │ Engine │ │ Renderer │ │ Checker │ │ │
│ │ │ (内容提取) │ │ (模板引擎) │ │(格式渲染器)│ │ (合规检查) │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Industry Plugin Layer │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │Insurance │ │ Medical │ │ Finance │ │Education │ │ Legal │ │ │
│ │ │ Plugin │ │ Plugin │ │ Plugin │ │ Plugin │ │ Plugin │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Infrastructure Layer │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ LLM │ │ Template │ │ TITANS │ │ Redis │ │ OSS │ │ │
│ │ │ Service │ │ Store │ │ Memory │ │ Cache │ │ Storage │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
2.2 目录结构
src/document/
├── __init__.py
├── orchestrator.py # 文档编排器(核心协调器)
├── api/
│ ├── __init__.py
│ ├── summary.py # 对话摘要API
│ ├── document.py # 文档生成API
│ ├── signing.py # 签署文件API
│ ├── export.py # 批量导出API
│ └── template.py # 模板管理API
│
├── core/
│ ├── __init__.py
│ ├── content_extractor.py # 内容提取器
│ ├── template_engine.py # 模板引擎
│ ├── format_renderer.py # 格式渲染器
│ ├── compliance_checker.py # 合规检查器
│ └── llm_enhancer.py # LLM内容增强
│
├── plugins/
│ ├── __init__.py
│ ├── base.py # 插件基类
│ ├── insurance/ # 保险行业插件
│ │ ├── __init__.py
│ │ ├── plugin.py
│ │ ├── extractors.py
│ │ └── validators.py
│ ├── medical/ # 医疗行业插件
│ │ ├── __init__.py
│ │ ├── plugin.py
│ │ ├── extractors.py
│ │ └── validators.py
│ ├── finance/ # 理财行业插件
│ ├── education/ # 教育行业插件
│ └── legal/ # 法律行业插件
│
├── templates/
│ ├── base/ # 基础模板
│ │ ├── summary.html.j2
│ │ ├── report.html.j2
│ │ └── agreement.html.j2
│ ├── insurance/ # 保险模板
│ │ ├── proposal.html.j2
│ │ ├── application.html.j2
│ │ └── risk_disclosure.html.j2
│ ├── medical/ # 医疗模板
│ │ ├── consultation_summary.html.j2
│ │ ├── health_assessment.html.j2
│ │ └── informed_consent.html.j2
│ └── finance/ # 理财模板
│ ├── advisory_report.html.j2
│ └── risk_assessment.html.j2
│
├── renderers/
│ ├── __init__.py
│ ├── pdf_renderer.py # PDF渲染器
│ ├── word_renderer.py # Word渲染器
│ ├── html_renderer.py # HTML渲染器
│ └── markdown_renderer.py # Markdown渲染器
│
├── models/
│ ├── __init__.py
│ ├── document.py # 文档数据模型
│ ├── template.py # 模板数据模型
│ └── signing.py # 签署文件数据模型
│
└── storage/
├── __init__.py
├── document_store.py # 文档存储
└── template_store.py # 模板存储
三、核心组件设计
3.1 内容提取器 (Content Extractor)
负责从对话历史中提取结构化信息。
# src/document/core/content_extractor.py
from typing import Dict, List, Optional, Any
from dataclasses import dataclass, field
from enum import Enum
from abc import ABC, abstractmethod
class ExtractionType(Enum):
"""提取类型"""
SUMMARY = "summary" # 对话摘要
NEEDS = "needs" # 用户需求
RECOMMENDATIONS = "recommendations" # 建议
DECISIONS = "decisions" # 决策
ACTIONS = "actions" # 行动项
PROFILE = "profile" # 用户画像
RISK = "risk" # 风险点
COMPLIANCE = "compliance" # 合规信息
@dataclass
class ExtractedContent:
"""提取的内容结构"""
session_id: str
device_id: str
extraction_time: str
# 基础信息
conversation_summary: str = ""
key_points: List[str] = field(default_factory=list)
# 用户相关
user_needs: List[Dict] = field(default_factory=list)
user_profile: Dict = field(default_factory=dict)
user_concerns: List[str] = field(default_factory=list)
# 专家相关
expert_recommendations: List[Dict] = field(default_factory=list)
expert_explanations: List[str] = field(default_factory=list)
# 决策相关
decisions_made: List[Dict] = field(default_factory=list)
pending_decisions: List[Dict] = field(default_factory=list)
# 行业特定字段(由插件填充)
industry_specific: Dict[str, Any] = field(default_factory=dict)
# 元数据
confidence_score: float = 0.0
extraction_method: str = ""
class ContentExtractor:
"""内容提取器"""
def __init__(
self,
llm_service,
titans_memory,
industry_plugins: Dict[str, 'IndustryPlugin'] = None
):
self.llm = llm_service
self.memory = titans_memory
self.plugins = industry_plugins or {}
async def extract(
self,
session_id: str,
device_id: str,
extraction_types: List[ExtractionType] = None,
industry: str = None
) -> ExtractedContent:
"""
从对话中提取结构化内容
Args:
session_id: 会话ID
device_id: 设备ID
extraction_types: 需要提取的内容类型
industry: 行业标识(用于加载特定插件)
"""
# 1. 获取对话历史
conversation = await self._get_conversation(session_id, device_id)
# 2. 基础提取(使用LLM)
base_content = await self._extract_base_content(conversation)
# 3. 行业特定提取(使用插件)
if industry and industry in self.plugins:
plugin = self.plugins[industry]
industry_content = await plugin.extract_content(conversation)
base_content.industry_specific = industry_content
# 4. 质量评分
base_content.confidence_score = await self._calculate_confidence(base_content)
return base_content
async def _extract_base_content(self, conversation: List[Dict]) -> ExtractedContent:
"""使用LLM提取基础内容"""
prompt = self._build_extraction_prompt(conversation)
response = await self.llm.generate(
prompt=prompt,
system_prompt=EXTRACTION_SYSTEM_PROMPT,
response_format="json"
)
return self._parse_extraction_response(response)
def _build_extraction_prompt(self, conversation: List[Dict]) -> str:
"""构建提取提示词"""
conv_text = "\n".join([
f"{'用户' if msg['role'] == 'user' else '专家'}: {msg['content']}"
for msg in conversation
])
return f"""
请从以下对话中提取结构化信息:
【对话内容】
{conv_text}
【提取要求】
请以JSON格式返回以下信息:
1. conversation_summary: 对话整体摘要(100-200字)
2. key_points: 关键要点列表(3-5条)
3. user_needs: 用户需求列表,每项包含 need(需求)、priority(优先级)、status(状态)
4. user_concerns: 用户顾虑列表
5. expert_recommendations: 专家建议列表,每项包含 recommendation(建议)、reason(理由)、confidence(置信度)
6. decisions_made: 已做决策列表
7. pending_decisions: 待决策事项列表
"""
# 提取系统提示词
EXTRACTION_SYSTEM_PROMPT = """
你是一个专业的对话内容分析师,擅长从对话中提取结构化信息。
你的任务是:
1. 准确理解对话的上下文和意图
2. 提取关键信息,不遗漏重要内容
3. 区分事实和观点
4. 识别明确的决策和待定事项
5. 保持客观中立,不添加主观判断
输出要求:
- 使用JSON格式
- 所有字段必须填充(无内容则为空数组/字符串)
- 摘要要简洁但完整
- 关键点要具体可行
"""
3.2 模板引擎 (Template Engine)
基于Jinja2的模板引擎,支持动态内容渲染。
# src/document/core/template_engine.py
from typing import Dict, Any, Optional, List
from pathlib import Path
from jinja2 import Environment, FileSystemLoader, select_autoescape
from dataclasses import dataclass
from enum import Enum
import json
class TemplateType(Enum):
"""模板类型"""
SUMMARY = "summary" # 对话摘要
REPORT = "report" # 分析报告
PROPOSAL = "proposal" # 方案建议书
AGREEMENT = "agreement" # 协议/合同
APPLICATION = "application" # 申请表
CONSENT = "consent" # 同意书
DISCLOSURE = "disclosure" # 披露/告知书
@dataclass
class TemplateConfig:
"""模板配置"""
template_id: str
template_type: TemplateType
industry: str
name: str
description: str
version: str
# 模板文件
html_template: str # HTML模板路径
css_style: Optional[str] = None # CSS样式路径
# 必填字段
required_fields: List[str] = None
# 可选字段
optional_fields: List[str] = None
# 合规要求
compliance_rules: List[str] = None
# 输出格式支持
supported_formats: List[str] = None # ["pdf", "word", "html"]
class TemplateEngine:
"""模板引擎"""
def __init__(self, template_base_path: str):
self.template_path = Path(template_base_path)
self.env = Environment(
loader=FileSystemLoader(str(self.template_path)),
autoescape=select_autoescape(['html', 'xml']),
trim_blocks=True,
lstrip_blocks=True
)
# 注册自定义过滤器
self._register_filters()
# 模板配置缓存
self.template_configs: Dict[str, TemplateConfig] = {}
self._load_template_configs()
def _register_filters(self):
"""注册自定义Jinja2过滤器"""
# 日期格式化
self.env.filters['date_format'] = lambda d, fmt='%Y年%m月%d日': d.strftime(fmt) if d else ''
# 金额格式化
self.env.filters['currency'] = lambda v: f"¥{v:,.2f}" if v else "¥0.00"
# 百分比格式化
self.env.filters['percent'] = lambda v: f"{v*100:.1f}%" if v else "0%"
# 列表连接
self.env.filters['join_list'] = lambda l, sep='、': sep.join(l) if l else ''
# 安全HTML(用于签署文件)
self.env.filters['safe_text'] = lambda t: t.replace('<', '<').replace('>', '>')
# 电话脱敏
self.env.filters['mask_phone'] = lambda p: p[:3] + '****' + p[-4:] if p and len(p) >= 11 else p
# 身份证脱敏
self.env.filters['mask_id'] = lambda i: i[:6] + '********' + i[-4:] if i and len(i) >= 18 else i
def render(
self,
template_id: str,
data: Dict[str, Any],
locale: str = "zh_CN"
) -> str:
"""
渲染模板
Args:
template_id: 模板ID
data: 渲染数据
locale: 语言设置
Returns:
渲染后的HTML字符串
"""
config = self.template_configs.get(template_id)
if not config:
raise ValueError(f"Template not found: {template_id}")
# 验证必填字段
self._validate_required_fields(config, data)
# 加载模板
template = self.env.get_template(config.html_template)
# 准备渲染上下文
context = {
**data,
"template_config": config,
"locale": locale,
"generation_time": datetime.now(),
}
# 渲染
return template.render(**context)
def _validate_required_fields(self, config: TemplateConfig, data: Dict):
"""验证必填字段"""
if not config.required_fields:
return
missing = [f for f in config.required_fields if f not in data or not data[f]]
if missing:
raise ValueError(f"Missing required fields: {missing}")
def get_template_schema(self, template_id: str) -> Dict:
"""获取模板的数据结构定义"""
config = self.template_configs.get(template_id)
if not config:
raise ValueError(f"Template not found: {template_id}")
return {
"template_id": template_id,
"template_type": config.template_type.value,
"required_fields": config.required_fields or [],
"optional_fields": config.optional_fields or [],
"supported_formats": config.supported_formats or ["html", "pdf"]
}
def list_templates(
self,
industry: str = None,
template_type: TemplateType = None
) -> List[TemplateConfig]:
"""列出可用模板"""
templates = list(self.template_configs.values())
if industry:
templates = [t for t in templates if t.industry == industry]
if template_type:
templates = [t for t in templates if t.template_type == template_type]
return templates
3.3 格式渲染器 (Format Renderer)
将HTML转换为各种输出格式。
# src/document/core/format_renderer.py
from abc import ABC, abstractmethod
from typing import Optional, Dict, Any
from pathlib import Path
import tempfile
import base64
class OutputFormat(Enum):
"""输出格式"""
HTML = "html"
PDF = "pdf"
WORD = "word"
MARKDOWN = "markdown"
JSON = "json"
class BaseRenderer(ABC):
"""渲染器基类"""
@abstractmethod
async def render(
self,
html_content: str,
options: Dict[str, Any] = None
) -> bytes:
"""渲染HTML为目标格式"""
pass
@property
@abstractmethod
def output_format(self) -> OutputFormat:
"""输出格式"""
pass
class PDFRenderer(BaseRenderer):
"""PDF渲染器"""
def __init__(self, wkhtmltopdf_path: str = None):
"""
初始化PDF渲染器
支持两种方式:
1. WeasyPrint(纯Python,推荐)
2. wkhtmltopdf(需要系统安装)
"""
self.wkhtmltopdf_path = wkhtmltopdf_path
@property
def output_format(self) -> OutputFormat:
return OutputFormat.PDF
async def render(
self,
html_content: str,
options: Dict[str, Any] = None
) -> bytes:
"""
渲染HTML为PDF
Options:
page_size: 页面大小 (A4, Letter等)
margin: 边距设置
header_html: 页眉HTML
footer_html: 页脚HTML
watermark: 水印文字
"""
options = options or {}
try:
# 优先使用WeasyPrint
return await self._render_with_weasyprint(html_content, options)
except ImportError:
# 回退到wkhtmltopdf
return await self._render_with_wkhtmltopdf(html_content, options)
async def _render_with_weasyprint(
self,
html_content: str,
options: Dict
) -> bytes:
"""使用WeasyPrint渲染"""
from weasyprint import HTML, CSS
from weasyprint.text.fonts import FontConfiguration
font_config = FontConfiguration()
# 基础CSS(支持中文)
base_css = CSS(string="""
@font-face {
font-family: 'SimSun';
src: local('SimSun'), local('宋体');
}
body {
font-family: 'SimSun', 'Microsoft YaHei', sans-serif;
font-size: 12pt;
line-height: 1.6;
}
@page {
size: A4;
margin: 2cm;
}
""", font_config=font_config)
# 添加水印CSS
if options.get('watermark'):
watermark_css = CSS(string=f"""
@page {{
background-image: url('data:image/svg+xml,...');
}}
""")
html = HTML(string=html_content)
pdf_bytes = html.write_pdf(
stylesheets=[base_css],
font_config=font_config
)
return pdf_bytes
class WordRenderer(BaseRenderer):
"""Word文档渲染器"""
@property
def output_format(self) -> OutputFormat:
return OutputFormat.WORD
async def render(
self,
html_content: str,
options: Dict[str, Any] = None
) -> bytes:
"""
渲染HTML为Word文档
Options:
template: Word模板路径(.dotx)
styles: 自定义样式映射
"""
from docx import Document
from docx.shared import Inches, Pt, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
from bs4 import BeautifulSoup
import io
options = options or {}
# 解析HTML
soup = BeautifulSoup(html_content, 'html.parser')
# 创建文档
if options.get('template'):
doc = Document(options['template'])
else:
doc = Document()
# 设置默认字体
style = doc.styles['Normal']
style.font.name = '宋体'
style.font.size = Pt(12)
# 转换HTML元素到Word
await self._convert_html_to_docx(soup, doc, options)
# 保存到字节流
buffer = io.BytesIO()
doc.save(buffer)
buffer.seek(0)
return buffer.read()
async def _convert_html_to_docx(
self,
soup: BeautifulSoup,
doc: Document,
options: Dict
):
"""将HTML元素转换为Word元素"""
for element in soup.body.children if soup.body else soup.children:
if element.name == 'h1':
doc.add_heading(element.get_text(), level=1)
elif element.name == 'h2':
doc.add_heading(element.get_text(), level=2)
elif element.name == 'h3':
doc.add_heading(element.get_text(), level=3)
elif element.name == 'p':
doc.add_paragraph(element.get_text())
elif element.name == 'ul':
for li in element.find_all('li'):
doc.add_paragraph(li.get_text(), style='List Bullet')
elif element.name == 'ol':
for li in element.find_all('li'):
doc.add_paragraph(li.get_text(), style='List Number')
elif element.name == 'table':
await self._convert_table(element, doc)
elif element.name == 'div':
# 递归处理div
await self._convert_html_to_docx(element, doc, options)
class MarkdownRenderer(BaseRenderer):
"""Markdown渲染器"""
@property
def output_format(self) -> OutputFormat:
return OutputFormat.MARKDOWN
async def render(
self,
html_content: str,
options: Dict[str, Any] = None
) -> bytes:
"""将HTML转换为Markdown"""
import html2text
h = html2text.HTML2Text()
h.body_width = 0 # 不自动换行
h.unicode_snob = True # 保留Unicode字符
markdown = h.handle(html_content)
return markdown.encode('utf-8')
class FormatRendererFactory:
"""渲染器工厂"""
_renderers: Dict[OutputFormat, BaseRenderer] = {}
@classmethod
def get_renderer(cls, format: OutputFormat) -> BaseRenderer:
"""获取渲染器实例"""
if format not in cls._renderers:
if format == OutputFormat.PDF:
cls._renderers[format] = PDFRenderer()
elif format == OutputFormat.WORD:
cls._renderers[format] = WordRenderer()
elif format == OutputFormat.MARKDOWN:
cls._renderers[format] = MarkdownRenderer()
else:
raise ValueError(f"Unsupported format: {format}")
return cls._renderers[format]
3.4 合规检查器 (Compliance Checker)
确保生成的文档符合行业监管要求。
# src/document/core/compliance_checker.py
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from enum import Enum
from abc import ABC, abstractmethod
class ComplianceLevel(Enum):
"""合规级别"""
PASS = "pass" # 通过
WARNING = "warning" # 警告(可继续,建议修改)
FAIL = "fail" # 失败(必须修改)
@dataclass
class ComplianceIssue:
"""合规问题"""
rule_id: str
rule_name: str
level: ComplianceLevel
message: str
field: Optional[str] = None
suggestion: Optional[str] = None
@dataclass
class ComplianceResult:
"""合规检查结果"""
passed: bool
issues: List[ComplianceIssue]
checked_rules: int
pass_rate: float
def to_dict(self) -> Dict:
return {
"passed": self.passed,
"issues": [
{
"rule_id": i.rule_id,
"level": i.level.value,
"message": i.message,
"field": i.field,
"suggestion": i.suggestion
}
for i in self.issues
],
"checked_rules": self.checked_rules,
"pass_rate": self.pass_rate
}
class ComplianceRule(ABC):
"""合规规则基类"""
@property
@abstractmethod
def rule_id(self) -> str:
pass
@property
@abstractmethod
def rule_name(self) -> str:
pass
@property
@abstractmethod
def industries(self) -> List[str]:
"""适用行业"""
pass
@abstractmethod
def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
"""执行检查,返回None表示通过"""
pass
# ==================== 通用规则 ====================
class PersonalInfoDisclosureRule(ComplianceRule):
"""个人信息披露规则"""
rule_id = "COMMON_001"
rule_name = "个人信息脱敏"
industries = ["*"] # 所有行业
def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
"""检查是否包含未脱敏的个人信息"""
import re
# 检查身份证号
id_pattern = r'\b\d{17}[\dXx]\b'
# 检查手机号
phone_pattern = r'\b1[3-9]\d{9}\b'
text = str(content)
if re.search(id_pattern, text):
return ComplianceIssue(
rule_id=self.rule_id,
rule_name=self.rule_name,
level=ComplianceLevel.FAIL,
message="文档包含未脱敏的身份证号",
suggestion="请使用mask_id过滤器进行脱敏处理"
)
if re.search(phone_pattern, text):
return ComplianceIssue(
rule_id=self.rule_id,
rule_name=self.rule_name,
level=ComplianceLevel.WARNING,
message="文档包含完整手机号,建议脱敏",
suggestion="请使用mask_phone过滤器进行脱敏处理"
)
return None
# ==================== 保险行业规则 ====================
class InsuranceRiskDisclosureRule(ComplianceRule):
"""保险风险揭示规则"""
rule_id = "INS_001"
rule_name = "风险揭示完整性"
industries = ["insurance"]
REQUIRED_DISCLOSURES = [
"投保前请仔细阅读保险条款",
"保险产品存在退保损失",
"请如实告知健康状况",
"免责条款"
]
def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
text = str(content).lower()
missing = []
for disclosure in self.REQUIRED_DISCLOSURES:
if disclosure not in text:
missing.append(disclosure)
if missing:
return ComplianceIssue(
rule_id=self.rule_id,
rule_name=self.rule_name,
level=ComplianceLevel.FAIL,
message=f"缺少必要的风险揭示内容: {', '.join(missing)}",
suggestion="请确保文档包含完整的风险揭示信息"
)
return None
class InsuranceSuitabilityRule(ComplianceRule):
"""保险适当性规则"""
rule_id = "INS_002"
rule_name = "产品适当性说明"
industries = ["insurance"]
def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
# 检查是否包含适当性说明
required_fields = ["risk_tolerance", "insurance_needs", "payment_ability"]
missing = [f for f in required_fields if f not in content]
if missing:
return ComplianceIssue(
rule_id=self.rule_id,
rule_name=self.rule_name,
level=ComplianceLevel.WARNING,
message=f"建议补充适当性评估信息: {', '.join(missing)}",
suggestion="添加用户风险承受能力、保险需求、缴费能力等信息"
)
return None
# ==================== 医疗行业规则 ====================
class MedicalDisclaimerRule(ComplianceRule):
"""医疗免责声明规则"""
rule_id = "MED_001"
rule_name = "医疗建议免责声明"
industries = ["medical"]
REQUIRED_DISCLAIMER = "以上建议仅供参考,不能替代专业医生的诊断和治疗"
def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
text = str(content)
if self.REQUIRED_DISCLAIMER not in text and "仅供参考" not in text:
return ComplianceIssue(
rule_id=self.rule_id,
rule_name=self.rule_name,
level=ComplianceLevel.FAIL,
message="医疗相关文档必须包含免责声明",
suggestion=f"请添加免责声明: '{self.REQUIRED_DISCLAIMER}'"
)
return None
class MedicalPrivacyRule(ComplianceRule):
"""医疗隐私保护规则"""
rule_id = "MED_002"
rule_name = "医疗隐私保护"
industries = ["medical"]
SENSITIVE_FIELDS = ["diagnosis", "medical_history", "medication"]
def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
# 检查敏感医疗信息是否有保护措施
has_sensitive = any(f in content for f in self.SENSITIVE_FIELDS)
has_consent = content.get("privacy_consent", False)
if has_sensitive and not has_consent:
return ComplianceIssue(
rule_id=self.rule_id,
rule_name=self.rule_name,
level=ComplianceLevel.WARNING,
message="文档包含敏感医疗信息,建议获取隐私授权",
field="privacy_consent",
suggestion="建议在文档中添加隐私保护声明并获取用户同意"
)
return None
# ==================== 理财行业规则 ====================
class FinanceRiskWarningRule(ComplianceRule):
"""理财风险警示规则"""
rule_id = "FIN_001"
rule_name = "投资风险警示"
industries = ["finance"]
REQUIRED_WARNINGS = [
"投资有风险",
"过往业绩不代表未来表现",
"请根据自身风险承受能力"
]
def check(self, content: Dict[str, Any]) -> Optional[ComplianceIssue]:
text = str(content)
missing = [w for w in self.REQUIRED_WARNINGS if w not in text]
if missing:
return ComplianceIssue(
rule_id=self.rule_id,
rule_name=self.rule_name,
level=ComplianceLevel.FAIL,
message=f"缺少必要的风险警示: {', '.join(missing)}",
suggestion="理财建议必须包含完整的风险提示"
)
return None
# ==================== 合规检查器 ====================
class ComplianceChecker:
"""合规检查器"""
def __init__(self):
self.rules: List[ComplianceRule] = [
# 通用规则
PersonalInfoDisclosureRule(),
# 保险规则
InsuranceRiskDisclosureRule(),
InsuranceSuitabilityRule(),
# 医疗规则
MedicalDisclaimerRule(),
MedicalPrivacyRule(),
# 理财规则
FinanceRiskWarningRule(),
]
def check(
self,
content: Dict[str, Any],
industry: str
) -> ComplianceResult:
"""
执行合规检查
Args:
content: 文档内容
industry: 行业标识
"""
issues = []
checked = 0
for rule in self.rules:
# 检查规则是否适用
if "*" in rule.industries or industry in rule.industries:
checked += 1
issue = rule.check(content)
if issue:
issues.append(issue)
# 判断是否通过
has_fail = any(i.level == ComplianceLevel.FAIL for i in issues)
pass_rate = (checked - len(issues)) / checked if checked > 0 else 1.0
return ComplianceResult(
passed=not has_fail,
issues=issues,
checked_rules=checked,
pass_rate=pass_rate
)
def add_rule(self, rule: ComplianceRule):
"""添加自定义规则"""
self.rules.append(rule)
四、行业插件设计
4.1 插件基类
# src/document/plugins/base.py
from abc import ABC, abstractmethod
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
@dataclass
class IndustryDocumentType:
"""行业文档类型定义"""
type_id: str
name: str
description: str
template_id: str
required_fields: List[str]
compliance_rules: List[str]
class IndustryPlugin(ABC):
"""行业插件基类"""
@property
@abstractmethod
def industry_id(self) -> str:
"""行业标识"""
pass
@property
@abstractmethod
def industry_name(self) -> str:
"""行业名称"""
pass
@property
@abstractmethod
def document_types(self) -> List[IndustryDocumentType]:
"""支持的文档类型"""
pass
@abstractmethod
async def extract_content(
self,
conversation: List[Dict]
) -> Dict[str, Any]:
"""
从对话中提取行业特定内容
Args:
conversation: 对话历史
Returns:
行业特定的结构化数据
"""
pass
@abstractmethod
def get_document_schema(
self,
document_type: str
) -> Dict[str, Any]:
"""
获取文档数据结构定义
Args:
document_type: 文档类型
Returns:
JSON Schema格式的数据结构定义
"""
pass
@abstractmethod
def validate_document_data(
self,
document_type: str,
data: Dict[str, Any]
) -> List[str]:
"""
验证文档数据
Returns:
错误消息列表,空列表表示验证通过
"""
pass
4.2 保险行业插件
# src/document/plugins/insurance/plugin.py
from typing import Dict, List, Any
from ..base import IndustryPlugin, IndustryDocumentType
class InsurancePlugin(IndustryPlugin):
"""保险行业插件"""
@property
def industry_id(self) -> str:
return "insurance"
@property
def industry_name(self) -> str:
return "保险"
@property
def document_types(self) -> List[IndustryDocumentType]:
return [
IndustryDocumentType(
type_id="insurance_proposal",
name="保险方案书",
description="根据用户需求生成的保险配置方案",
template_id="insurance/proposal",
required_fields=["user_info", "needs_analysis", "product_recommendations"],
compliance_rules=["INS_001", "INS_002"]
),
IndustryDocumentType(
type_id="needs_analysis",
name="需求分析报告",
description="用户保险需求详细分析",
template_id="insurance/needs_analysis",
required_fields=["user_profile", "risk_assessment", "coverage_gaps"],
compliance_rules=["INS_002"]
),
IndustryDocumentType(
type_id="product_comparison",
name="产品对比表",
description="多款保险产品对比分析",
template_id="insurance/product_comparison",
required_fields=["products", "comparison_dimensions"],
compliance_rules=["INS_001"]
),
IndustryDocumentType(
type_id="application_form",
name="投保单",
description="保险投保申请表",
template_id="insurance/application",
required_fields=["applicant", "insured", "product", "premium"],
compliance_rules=["INS_001", "INS_002", "COMMON_001"]
),
IndustryDocumentType(
type_id="health_disclosure",
name="健康告知书",
description="投保人健康状况告知",
template_id="insurance/health_disclosure",
required_fields=["health_questions", "declarations"],
compliance_rules=["INS_001"]
),
IndustryDocumentType(
type_id="risk_confirmation",
name="风险确认书",
description="投保风险确认签署文件",
template_id="insurance/risk_confirmation",
required_fields=["risk_items", "confirmation"],
compliance_rules=["INS_001"]
)
]
async def extract_content(
self,
conversation: List[Dict]
) -> Dict[str, Any]:
"""提取保险相关内容"""
return {
# 用户信息
"user_info": await self._extract_user_info(conversation),
# 家庭信息
"family_info": await self._extract_family_info(conversation),
# 财务状况
"financial_status": await self._extract_financial_status(conversation),
# 已有保障
"existing_coverage": await self._extract_existing_coverage(conversation),
# 保障需求
"insurance_needs": await self._extract_insurance_needs(conversation),
# 产品偏好
"product_preferences": await self._extract_product_preferences(conversation),
# 顾虑问题
"concerns": await self._extract_concerns(conversation),
# 推荐产品
"recommended_products": await self._extract_recommendations(conversation),
}
async def _extract_user_info(self, conversation: List[Dict]) -> Dict:
"""提取用户基本信息"""
# 使用LLM从对话中提取
prompt = """
从对话中提取用户基本信息,返回JSON格式:
{
"age": 年龄(数字),
"gender": 性别("male"/"female"),
"occupation": 职业,
"income_level": 收入水平("low"/"medium"/"high"),
"location": 所在城市
}
"""
# ... LLM调用
return {}
async def _extract_insurance_needs(self, conversation: List[Dict]) -> List[Dict]:
"""提取保险需求"""
prompt = """
从对话中提取用户的保险需求,返回JSON数组:
[
{
"type": 保险类型("life"/"health"/"accident"/"medical"/"education"/"pension"),
"priority": 优先级(1-5),
"coverage_amount": 建议保额,
"reason": 需要原因
}
]
"""
# ... LLM调用
return []
def get_document_schema(self, document_type: str) -> Dict[str, Any]:
"""获取文档数据结构"""
schemas = {
"insurance_proposal": {
"type": "object",
"required": ["user_info", "needs_analysis", "product_recommendations"],
"properties": {
"user_info": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"gender": {"type": "string", "enum": ["male", "female"]},
"occupation": {"type": "string"},
"annual_income": {"type": "number"}
}
},
"family_info": {
"type": "object",
"properties": {
"marital_status": {"type": "string"},
"children_count": {"type": "integer"},
"dependents": {"type": "array"}
}
},
"needs_analysis": {
"type": "object",
"properties": {
"risk_tolerance": {"type": "string", "enum": ["low", "medium", "high"]},
"coverage_gaps": {"type": "array"},
"priority_needs": {"type": "array"}
}
},
"product_recommendations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"product_type": {"type": "string"},
"coverage_amount": {"type": "number"},
"premium": {"type": "number"},
"payment_period": {"type": "integer"},
"reasons": {"type": "array"}
}
}
},
"total_premium": {"type": "number"},
"premium_ratio": {"type": "number", "description": "保费占收入比例"}
}
},
# ... 其他文档类型的schema
}
return schemas.get(document_type, {})
def validate_document_data(
self,
document_type: str,
data: Dict[str, Any]
) -> List[str]:
"""验证保险文档数据"""
errors = []
if document_type == "insurance_proposal":
# 验证保费占收入比例
if data.get("premium_ratio", 0) > 0.15:
errors.append("建议保费不超过年收入的15%")
# 验证必要信息
if not data.get("user_info", {}).get("age"):
errors.append("用户年龄信息缺失")
elif document_type == "application_form":
# 投保单必须有详细个人信息
required = ["name", "id_number", "phone", "address"]
applicant = data.get("applicant", {})
for field in required:
if not applicant.get(field):
errors.append(f"投保人{field}信息缺失")
return errors
4.3 医疗行业插件
# src/document/plugins/medical/plugin.py
from typing import Dict, List, Any
from ..base import IndustryPlugin, IndustryDocumentType
class MedicalPlugin(IndustryPlugin):
"""医疗行业插件"""
@property
def industry_id(self) -> str:
return "medical"
@property
def industry_name(self) -> str:
return "医疗健康"
@property
def document_types(self) -> List[IndustryDocumentType]:
return [
# ========== 问诊类文档 ==========
IndustryDocumentType(
type_id="consultation_summary",
name="问诊摘要",
description="在线问诊对话的结构化摘要",
template_id="medical/consultation_summary",
required_fields=["chief_complaint", "present_illness", "assessment"],
compliance_rules=["MED_001", "MED_002"]
),
IndustryDocumentType(
type_id="symptom_record",
name="症状记录单",
description="用户症状详细记录",
template_id="medical/symptom_record",
required_fields=["symptoms", "duration", "severity"],
compliance_rules=["MED_002"]
),
# ========== 评估类文档 ==========
IndustryDocumentType(
type_id="health_assessment",
name="健康评估报告",
description="综合健康状况评估",
template_id="medical/health_assessment",
required_fields=["basic_info", "health_indicators", "risk_factors", "recommendations"],
compliance_rules=["MED_001", "MED_002"]
),
IndustryDocumentType(
type_id="disease_risk_report",
name="疾病风险评估",
description="特定疾病风险预测报告",
template_id="medical/disease_risk",
required_fields=["target_disease", "risk_factors", "risk_level", "prevention_advice"],
compliance_rules=["MED_001"]
),
IndustryDocumentType(
type_id="nutrition_assessment",
name="营养评估报告",
description="个人营养状况评估",
template_id="medical/nutrition_assessment",
required_fields=["dietary_habits", "nutrient_analysis", "recommendations"],
compliance_rules=["MED_001"]
),
# ========== 计划类文档 ==========
IndustryDocumentType(
type_id="treatment_plan",
name="治疗/康复计划",
description="个性化治疗或康复方案",
template_id="medical/treatment_plan",
required_fields=["diagnosis", "treatment_goals", "interventions", "timeline"],
compliance_rules=["MED_001", "MED_002"]
),
IndustryDocumentType(
type_id="medication_guide",
name="用药指导",
description="药物使用说明和注意事项",
template_id="medical/medication_guide",
required_fields=["medications", "dosage", "precautions", "interactions"],
compliance_rules=["MED_001"]
),
IndustryDocumentType(
type_id="lifestyle_plan",
name="生活方式管理计划",
description="饮食、运动、作息等生活管理建议",
template_id="medical/lifestyle_plan",
required_fields=["current_habits", "improvement_goals", "action_plan"],
compliance_rules=["MED_001"]
),
# ========== 签署类文档 ==========
IndustryDocumentType(
type_id="informed_consent",
name="知情同意书",
description="治疗/检查前的知情同意",
template_id="medical/informed_consent",
required_fields=["procedure", "risks", "alternatives", "patient_consent"],
compliance_rules=["MED_001", "MED_002", "COMMON_001"]
),
IndustryDocumentType(
type_id="privacy_authorization",
name="隐私授权书",
description="医疗信息使用授权",
template_id="medical/privacy_authorization",
required_fields=["data_scope", "usage_purpose", "authorization_period"],
compliance_rules=["MED_002", "COMMON_001"]
),
# ========== 转诊类文档 ==========
IndustryDocumentType(
type_id="referral_letter",
name="转诊建议书",
description="建议就医/转诊的说明文档",
template_id="medical/referral_letter",
required_fields=["reason", "urgency", "recommended_specialty", "summary"],
compliance_rules=["MED_001"]
)
]
async def extract_content(
self,
conversation: List[Dict]
) -> Dict[str, Any]:
"""提取医疗相关内容"""
return {
# 主诉
"chief_complaint": await self._extract_chief_complaint(conversation),
# 现病史
"present_illness": await self._extract_present_illness(conversation),
# 症状详情
"symptoms": await self._extract_symptoms(conversation),
# 既往史
"medical_history": await self._extract_medical_history(conversation),
# 家族史
"family_history": await self._extract_family_history(conversation),
# 过敏史
"allergies": await self._extract_allergies(conversation),
# 用药史
"medications": await self._extract_medications(conversation),
# 生活习惯
"lifestyle": await self._extract_lifestyle(conversation),
# AI评估
"ai_assessment": await self._extract_ai_assessment(conversation),
# AI建议
"ai_recommendations": await self._extract_ai_recommendations(conversation),
}
async def _extract_chief_complaint(self, conversation: List[Dict]) -> str:
"""提取主诉"""
prompt = """
从对话中提取用户的主诉(主要症状/问题),用一句话概括:
格式:"[部位/症状][持续时间][主要特点]"
例如:"头痛3天,伴有恶心"
"""
# ... LLM调用
return ""
async def _extract_symptoms(self, conversation: List[Dict]) -> List[Dict]:
"""提取症状详情"""
prompt = """
从对话中提取所有提及的症状,返回JSON数组:
[
{
"symptom": 症状名称,
"location": 部位,
"duration": 持续时间,
"severity": 严重程度(1-10),
"frequency": 发作频率,
"triggers": 诱发因素,
"relieving_factors": 缓解因素,
"associated_symptoms": 伴随症状
}
]
"""
# ... LLM调用
return []
async def _extract_ai_assessment(self, conversation: List[Dict]) -> Dict:
"""提取AI评估结果"""
prompt = """
从对话中提取AI/专家的评估,返回JSON:
{
"possible_conditions": [可能的情况列表],
"severity_assessment": 严重程度评估,
"urgency_level": 紧急程度("routine"/"soon"/"urgent"/"emergency"),
"confidence_level": 置信度
}
"""
# ... LLM调用
return {}
def get_document_schema(self, document_type: str) -> Dict[str, Any]:
"""获取医疗文档数据结构"""
schemas = {
"consultation_summary": {
"type": "object",
"required": ["chief_complaint", "present_illness", "assessment"],
"properties": {
"patient_info": {
"type": "object",
"properties": {
"age": {"type": "integer"},
"gender": {"type": "string"},
"height": {"type": "number"},
"weight": {"type": "number"}
}
},
"chief_complaint": {"type": "string", "description": "主诉"},
"present_illness": {
"type": "object",
"properties": {
"onset": {"type": "string", "description": "起病情况"},
"progression": {"type": "string", "description": "病情演变"},
"symptoms": {"type": "array", "description": "症状列表"}
}
},
"medical_history": {
"type": "object",
"properties": {
"past_illnesses": {"type": "array"},
"surgeries": {"type": "array"},
"allergies": {"type": "array"},
"current_medications": {"type": "array"}
}
},
"assessment": {
"type": "object",
"properties": {
"possible_conditions": {"type": "array"},
"differential_diagnosis": {"type": "array"},
"severity": {"type": "string"},
"urgency": {"type": "string"}
}
},
"recommendations": {
"type": "object",
"properties": {
"further_tests": {"type": "array"},
"lifestyle_changes": {"type": "array"},
"follow_up": {"type": "string"},
"red_flags": {"type": "array", "description": "警示症状"}
}
},
"disclaimer": {"type": "string", "description": "免责声明"}
}
},
"health_assessment": {
"type": "object",
"required": ["basic_info", "health_indicators", "recommendations"],
"properties": {
"basic_info": {
"type": "object",
"properties": {
"age": {"type": "integer"},
"gender": {"type": "string"},
"bmi": {"type": "number"}
}
},
"health_indicators": {
"type": "array",
"items": {
"type": "object",
"properties": {
"indicator": {"type": "string"},
"value": {"type": "string"},
"status": {"type": "string", "enum": ["normal", "borderline", "abnormal"]},
"reference_range": {"type": "string"}
}
}
},
"risk_factors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"factor": {"type": "string"},
"level": {"type": "string"},
"impact": {"type": "string"}
}
}
},
"overall_score": {"type": "integer", "minimum": 0, "maximum": 100},
"recommendations": {"type": "array"}
}
},
"informed_consent": {
"type": "object",
"required": ["procedure", "risks", "patient_consent"],
"properties": {
"procedure": {
"type": "object",
"properties": {
"name": {"type": "string"},
"description": {"type": "string"},
"purpose": {"type": "string"}
}
},
"risks": {
"type": "array",
"items": {
"type": "object",
"properties": {
"risk": {"type": "string"},
"probability": {"type": "string"},
"severity": {"type": "string"}
}
}
},
"benefits": {"type": "array"},
"alternatives": {"type": "array"},
"patient_questions": {"type": "array"},
"patient_consent": {
"type": "object",
"properties": {
"understood": {"type": "boolean"},
"consent_given": {"type": "boolean"},
"signature_placeholder": {"type": "boolean"}
}
}
}
}
}
return schemas.get(document_type, {})
def validate_document_data(
self,
document_type: str,
data: Dict[str, Any]
) -> List[str]:
"""验证医疗文档数据"""
errors = []
# 所有医疗文档必须有免责声明
if not data.get("disclaimer"):
errors.append("医疗文档必须包含免责声明")
if document_type == "consultation_summary":
# 验证主诉
if not data.get("chief_complaint"):
errors.append("问诊摘要必须包含主诉")
# 验证评估
assessment = data.get("assessment", {})
if not assessment.get("possible_conditions"):
errors.append("问诊摘要应包含可能的情况评估")
elif document_type == "informed_consent":
# 知情同意必须有签名位置
consent = data.get("patient_consent", {})
if not consent.get("signature_placeholder"):
errors.append("知情同意书必须预留签名位置")
elif document_type == "medication_guide":
# 用药指导必须有用药禁忌
if not data.get("contraindications"):
errors.append("用药指导应包含禁忌症说明")
return errors
4.4 理财行业插件
# src/document/plugins/finance/plugin.py
from typing import Dict, List, Any
from ..base import IndustryPlugin, IndustryDocumentType
class FinancePlugin(IndustryPlugin):
"""理财行业插件"""
@property
def industry_id(self) -> str:
return "finance"
@property
def industry_name(self) -> str:
return "理财投资"
@property
def document_types(self) -> List[IndustryDocumentType]:
return [
IndustryDocumentType(
type_id="advisory_report",
name="理财建议书",
description="综合理财规划建议",
template_id="finance/advisory_report",
required_fields=["client_profile", "financial_goals", "recommendations"],
compliance_rules=["FIN_001", "FIN_002"]
),
IndustryDocumentType(
type_id="portfolio_plan",
name="资产配置方案",
description="投资组合配置建议",
template_id="finance/portfolio_plan",
required_fields=["risk_profile", "asset_allocation", "expected_return"],
compliance_rules=["FIN_001"]
),
IndustryDocumentType(
type_id="risk_assessment",
name="风险测评报告",
description="投资者风险承受能力评估",
template_id="finance/risk_assessment",
required_fields=["questionnaire_results", "risk_level", "suitable_products"],
compliance_rules=["FIN_002"]
),
IndustryDocumentType(
type_id="product_analysis",
name="产品分析报告",
description="理财产品详细分析",
template_id="finance/product_analysis",
required_fields=["product_info", "risk_analysis", "suitability"],
compliance_rules=["FIN_001"]
),
IndustryDocumentType(
type_id="risk_disclosure",
name="风险揭示书",
description="投资风险确认文件",
template_id="finance/risk_disclosure",
required_fields=["risk_items", "investor_confirmation"],
compliance_rules=["FIN_001", "COMMON_001"]
),
IndustryDocumentType(
type_id="investment_agreement",
name="投资协议",
description="投资服务协议",
template_id="finance/investment_agreement",
required_fields=["parties", "service_scope", "fee_structure"],
compliance_rules=["FIN_001", "FIN_002", "COMMON_001"]
)
]
async def extract_content(
self,
conversation: List[Dict]
) -> Dict[str, Any]:
"""提取理财相关内容"""
return {
# 客户基本信息
"client_profile": await self._extract_client_profile(conversation),
# 财务状况
"financial_status": await self._extract_financial_status(conversation),
# 投资目标
"investment_goals": await self._extract_investment_goals(conversation),
# 风险偏好
"risk_preference": await self._extract_risk_preference(conversation),
# 投资经验
"investment_experience": await self._extract_investment_experience(conversation),
# 现有投资
"current_investments": await self._extract_current_investments(conversation),
# 顾虑和问题
"concerns": await self._extract_concerns(conversation),
# 推荐方案
"recommendations": await self._extract_recommendations(conversation),
}
async def _extract_financial_status(self, conversation: List[Dict]) -> Dict:
"""提取财务状况"""
prompt = """
从对话中提取用户的财务状况,返回JSON:
{
"annual_income": 年收入,
"monthly_expense": 月支出,
"total_assets": 总资产,
"total_liabilities": 总负债,
"investable_assets": 可投资资产,
"emergency_fund": 应急资金
}
"""
return {}
async def _extract_investment_goals(self, conversation: List[Dict]) -> List[Dict]:
"""提取投资目标"""
prompt = """
从对话中提取用户的投资目标,返回JSON数组:
[
{
"goal": 目标描述,
"target_amount": 目标金额,
"time_horizon": 时间期限(年),
"priority": 优先级(1-5)
}
]
"""
return []
def get_document_schema(self, document_type: str) -> Dict[str, Any]:
"""获取理财文档数据结构"""
schemas = {
"advisory_report": {
"type": "object",
"required": ["client_profile", "financial_goals", "recommendations"],
"properties": {
"client_profile": {
"type": "object",
"properties": {
"age": {"type": "integer"},
"occupation": {"type": "string"},
"risk_tolerance": {"type": "string", "enum": ["conservative", "moderate", "aggressive"]}
}
},
"financial_status": {
"type": "object",
"properties": {
"income": {"type": "number"},
"expenses": {"type": "number"},
"assets": {"type": "number"},
"liabilities": {"type": "number"}
}
},
"financial_goals": {
"type": "array",
"items": {
"type": "object",
"properties": {
"goal": {"type": "string"},
"amount": {"type": "number"},
"timeline": {"type": "string"}
}
}
},
"recommendations": {
"type": "object",
"properties": {
"asset_allocation": {"type": "object"},
"product_suggestions": {"type": "array"},
"action_steps": {"type": "array"}
}
},
"risk_warnings": {"type": "array"},
"disclaimer": {"type": "string"}
}
},
"portfolio_plan": {
"type": "object",
"required": ["risk_profile", "asset_allocation"],
"properties": {
"risk_profile": {
"type": "object",
"properties": {
"risk_level": {"type": "string"},
"risk_score": {"type": "integer"},
"investment_horizon": {"type": "string"}
}
},
"asset_allocation": {
"type": "object",
"properties": {
"equity": {"type": "number", "description": "股票占比"},
"fixed_income": {"type": "number", "description": "固收占比"},
"alternatives": {"type": "number", "description": "另类投资占比"},
"cash": {"type": "number", "description": "现金占比"}
}
},
"expected_return": {
"type": "object",
"properties": {
"conservative": {"type": "number"},
"expected": {"type": "number"},
"optimistic": {"type": "number"}
}
},
"rebalancing_strategy": {"type": "string"},
"risk_warnings": {"type": "array"}
}
}
}
return schemas.get(document_type, {})
def validate_document_data(
self,
document_type: str,
data: Dict[str, Any]
) -> List[str]:
"""验证理财文档数据"""
errors = []
# 所有理财文档必须有风险警示
if not data.get("risk_warnings") and not data.get("disclaimer"):
errors.append("理财文档必须包含风险提示")
if document_type == "advisory_report":
# 验证风险匹配
risk_level = data.get("client_profile", {}).get("risk_tolerance")
recommendations = data.get("recommendations", {})
# 检查推荐产品是否与风险等级匹配
# ... 具体验证逻辑
elif document_type == "portfolio_plan":
# 验证资产配置总和为100%
allocation = data.get("asset_allocation", {})
total = sum(allocation.values())
if abs(total - 100) > 0.1:
errors.append(f"资产配置比例总和应为100%,当前为{total}%")
return errors
五、API设计
5.1 完整API定义
# src/document/api/document.py
from fastapi import APIRouter, HTTPException, BackgroundTasks
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any
from enum import Enum
from datetime import datetime
router = APIRouter(prefix="/api/v1/document", tags=["文档生成"])
# ==================== 数据模型 ====================
class OutputFormat(str, Enum):
PDF = "pdf"
WORD = "word"
HTML = "html"
MARKDOWN = "markdown"
JSON = "json"
class DocumentStatus(str, Enum):
PENDING = "pending"
PROCESSING = "processing"
COMPLETED = "completed"
FAILED = "failed"
class GenerateDocumentRequest(BaseModel):
"""生成文档请求"""
session_id: str = Field(..., description="会话ID")
device_id: str = Field(..., description="设备ID")
industry: str = Field(..., description="行业标识")
document_type: str = Field(..., description="文档类型")
output_format: OutputFormat = Field(default=OutputFormat.PDF, description="输出格式")
template_id: Optional[str] = Field(None, description="指定模板ID")
custom_data: Optional[Dict[str, Any]] = Field(None, description="自定义数据(覆盖提取数据)")
options: Optional[Dict[str, Any]] = Field(None, description="生成选项")
class Config:
schema_extra = {
"example": {
"session_id": "sess_123456",
"device_id": "dev_abcdef",
"industry": "insurance",
"document_type": "insurance_proposal",
"output_format": "pdf",
"custom_data": {
"user_info": {"name": "张三"}
}
}
}
class GenerateDocumentResponse(BaseModel):
"""生成文档响应"""
document_id: str = Field(..., description="文档ID")
status: DocumentStatus = Field(..., description="生成状态")
message: str = Field(..., description="状态消息")
download_url: Optional[str] = Field(None, description="下载链接(完成时)")
preview_url: Optional[str] = Field(None, description="预览链接")
expires_at: Optional[datetime] = Field(None, description="链接过期时间")
class DocumentPreviewRequest(BaseModel):
"""文档预览请求"""
session_id: str
device_id: str
industry: str
document_type: str
custom_data: Optional[Dict[str, Any]] = None
class DocumentPreviewResponse(BaseModel):
"""文档预览响应"""
html_content: str = Field(..., description="HTML预览内容")
extracted_data: Dict[str, Any] = Field(..., description="提取的数据")
compliance_result: Dict[str, Any] = Field(..., description="合规检查结果")
class BatchExportRequest(BaseModel):
"""批量导出请求"""
device_id: str
session_ids: Optional[List[str]] = Field(None, description="指定会话ID列表")
date_range: Optional[List[str]] = Field(None, description="日期范围 [start, end]")
industry: str
document_type: str
output_format: OutputFormat = OutputFormat.PDF
include_summary: bool = Field(default=True, description="是否包含摘要")
merge_to_single: bool = Field(default=False, description="是否合并为单个文件")
class TemplateListRequest(BaseModel):
"""模板列表请求"""
industry: Optional[str] = None
document_type: Optional[str] = None
class TemplateInfo(BaseModel):
"""模板信息"""
template_id: str
name: str
description: str
industry: str
document_type: str
supported_formats: List[str]
required_fields: List[str]
preview_image: Optional[str] = None
# ==================== API端点 ====================
@router.post("/generate", response_model=GenerateDocumentResponse)
async def generate_document(
request: GenerateDocumentRequest,
background_tasks: BackgroundTasks
):
"""
生成文档
流程:
1. 从对话中提取内容
2. 合并自定义数据
3. 合规检查
4. 渲染模板
5. 转换格式
6. 存储并返回下载链接
"""
from ..orchestrator import DocumentOrchestrator
orchestrator = DocumentOrchestrator()
# 异步生成
document_id = await orchestrator.start_generation(
session_id=request.session_id,
device_id=request.device_id,
industry=request.industry,
document_type=request.document_type,
output_format=request.output_format,
template_id=request.template_id,
custom_data=request.custom_data,
options=request.options
)
return GenerateDocumentResponse(
document_id=document_id,
status=DocumentStatus.PROCESSING,
message="文档生成中,请稍候..."
)
@router.get("/status/{document_id}", response_model=GenerateDocumentResponse)
async def get_document_status(document_id: str):
"""
查询文档生成状态
"""
from ..orchestrator import DocumentOrchestrator
orchestrator = DocumentOrchestrator()
result = await orchestrator.get_status(document_id)
if not result:
raise HTTPException(status_code=404, detail="文档不存在")
return result
@router.post("/preview", response_model=DocumentPreviewResponse)
async def preview_document(request: DocumentPreviewRequest):
"""
预览文档
返回HTML预览和提取的数据,不生成最终文件
"""
from ..orchestrator import DocumentOrchestrator
orchestrator = DocumentOrchestrator()
result = await orchestrator.preview(
session_id=request.session_id,
device_id=request.device_id,
industry=request.industry,
document_type=request.document_type,
custom_data=request.custom_data
)
return result
@router.get("/download/{document_id}")
async def download_document(document_id: str):
"""
下载文档
"""
from fastapi.responses import FileResponse
from ..storage.document_store import DocumentStore
store = DocumentStore()
doc_info = await store.get(document_id)
if not doc_info:
raise HTTPException(status_code=404, detail="文档不存在或已过期")
return FileResponse(
path=doc_info["file_path"],
filename=doc_info["filename"],
media_type=doc_info["content_type"]
)
@router.post("/batch-export")
async def batch_export(
request: BatchExportRequest,
background_tasks: BackgroundTasks
):
"""
批量导出文档
"""
from ..orchestrator import DocumentOrchestrator
orchestrator = DocumentOrchestrator()
export_id = await orchestrator.start_batch_export(
device_id=request.device_id,
session_ids=request.session_ids,
date_range=request.date_range,
industry=request.industry,
document_type=request.document_type,
output_format=request.output_format,
include_summary=request.include_summary,
merge_to_single=request.merge_to_single
)
return {
"export_id": export_id,
"status": "processing",
"message": "批量导出任务已创建"
}
@router.get("/templates", response_model=List[TemplateInfo])
async def list_templates(
industry: Optional[str] = None,
document_type: Optional[str] = None
):
"""
列出可用模板
"""
from ..core.template_engine import TemplateEngine
engine = TemplateEngine("src/document/templates")
templates = engine.list_templates(industry=industry, template_type=document_type)
return [
TemplateInfo(
template_id=t.template_id,
name=t.name,
description=t.description,
industry=t.industry,
document_type=t.template_type.value,
supported_formats=t.supported_formats or ["pdf", "html"],
required_fields=t.required_fields or []
)
for t in templates
]
@router.get("/templates/{template_id}/schema")
async def get_template_schema(template_id: str):
"""
获取模板数据结构
"""
from ..core.template_engine import TemplateEngine
engine = TemplateEngine("src/document/templates")
return engine.get_template_schema(template_id)
@router.get("/industries")
async def list_industries():
"""
列出支持的行业
"""
from ..plugins import get_all_plugins
plugins = get_all_plugins()
return [
{
"industry_id": p.industry_id,
"industry_name": p.industry_name,
"document_types": [
{
"type_id": dt.type_id,
"name": dt.name,
"description": dt.description
}
for dt in p.document_types
]
}
for p in plugins
]
@router.get("/industries/{industry}/document-types")
async def list_document_types(industry: str):
"""
列出行业支持的文档类型
"""
from ..plugins import get_plugin
plugin = get_plugin(industry)
if not plugin:
raise HTTPException(status_code=404, detail=f"行业不存在: {industry}")
return [
{
"type_id": dt.type_id,
"name": dt.name,
"description": dt.description,
"template_id": dt.template_id,
"required_fields": dt.required_fields,
"compliance_rules": dt.compliance_rules
}
for dt in plugin.document_types
]
5.2 对话摘要API
# src/document/api/summary.py
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any
router = APIRouter(prefix="/api/v1/summary", tags=["对话摘要"])
class SummaryRequest(BaseModel):
"""摘要请求"""
session_id: str
device_id: str
summary_type: str = Field(default="comprehensive", description="摘要类型: brief/comprehensive/structured")
include_recommendations: bool = Field(default=True, description="是否包含建议")
include_decisions: bool = Field(default=True, description="是否包含决策")
language: str = Field(default="zh", description="输出语言")
class ConversationSummary(BaseModel):
"""对话摘要"""
session_id: str
summary_time: str
# 基础摘要
brief_summary: str = Field(..., description="简短摘要(50字内)")
comprehensive_summary: str = Field(..., description="完整摘要(200字内)")
# 结构化信息
key_points: List[str] = Field(default=[], description="关键要点")
user_needs: List[Dict] = Field(default=[], description="用户需求")
expert_recommendations: List[Dict] = Field(default=[], description="专家建议")
decisions_made: List[str] = Field(default=[], description="已做决策")
pending_items: List[str] = Field(default=[], description="待处理事项")
# 元数据
conversation_turns: int = Field(..., description="对话轮数")
duration_minutes: int = Field(..., description="对话时长")
satisfaction_score: Optional[float] = Field(None, description="满意度评分")
class MultiSessionSummaryRequest(BaseModel):
"""多会话摘要请求"""
device_id: str
session_ids: Optional[List[str]] = None
date_range: Optional[List[str]] = None
industry: Optional[str] = None
class MultiSessionSummary(BaseModel):
"""多会话汇总"""
device_id: str
period: str
# 汇总统计
total_sessions: int
total_turns: int
total_duration_minutes: int
# 主题分布
topic_distribution: Dict[str, int]
# 需求汇总
aggregated_needs: List[Dict]
# 建议汇总
aggregated_recommendations: List[Dict]
# 决策追踪
decisions_timeline: List[Dict]
# 会话摘要列表
session_summaries: List[ConversationSummary]
@router.post("/conversation", response_model=ConversationSummary)
async def get_conversation_summary(request: SummaryRequest):
"""
获取单次对话摘要
"""
from ..core.content_extractor import ContentExtractor
extractor = ContentExtractor()
extracted = await extractor.extract(
session_id=request.session_id,
device_id=request.device_id
)
# 生成摘要
summary = await _generate_summary(extracted, request.summary_type)
return summary
@router.post("/multi-session", response_model=MultiSessionSummary)
async def get_multi_session_summary(request: MultiSessionSummaryRequest):
"""
获取多会话汇总
"""
# ... 实现
pass
@router.get("/export/{session_id}")
async def export_summary(
session_id: str,
device_id: str,
format: str = "markdown"
):
"""
导出对话摘要
支持格式: markdown, html, json, txt
"""
# ... 实现
pass
六、模板示例
6.1 保险方案书模板
<!-- templates/insurance/proposal.html.j2 -->
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<title>保险方案建议书</title>
<style>
/* 基础样式 */
body {
font-family: "SimSun", "Microsoft YaHei", sans-serif;
font-size: 12pt;
line-height: 1.8;
color: #333;
max-width: 800px;
margin: 0 auto;
padding: 40px;
}
/* 标题样式 */
h1 {
text-align: center;
font-size: 20pt;
color: #1a5490;
border-bottom: 2px solid #1a5490;
padding-bottom: 10px;
}
h2 {
font-size: 14pt;
color: #1a5490;
margin-top: 30px;
border-left: 4px solid #1a5490;
padding-left: 10px;
}
/* 信息卡片 */
.info-card {
background: #f8f9fa;
border-radius: 8px;
padding: 20px;
margin: 20px 0;
}
.info-row {
display: flex;
margin: 10px 0;
}
.info-label {
width: 120px;
color: #666;
}
.info-value {
flex: 1;
font-weight: bold;
}
/* 表格样式 */
table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
}
th, td {
border: 1px solid #ddd;
padding: 12px;
text-align: left;
}
th {
background: #1a5490;
color: white;
}
tr:nth-child(even) {
background: #f8f9fa;
}
/* 推荐产品卡片 */
.product-card {
border: 1px solid #ddd;
border-radius: 8px;
padding: 20px;
margin: 15px 0;
}
.product-name {
font-size: 14pt;
font-weight: bold;
color: #1a5490;
}
.product-type {
display: inline-block;
background: #e8f0f8;
color: #1a5490;
padding: 2px 8px;
border-radius: 4px;
font-size: 10pt;
margin-left: 10px;
}
.product-detail {
margin-top: 15px;
}
/* 风险提示 */
.risk-warning {
background: #fff3cd;
border: 1px solid #ffc107;
border-radius: 4px;
padding: 15px;
margin: 20px 0;
}
.risk-warning-title {
color: #856404;
font-weight: bold;
}
/* 签名区 */
.signature-area {
margin-top: 50px;
padding-top: 30px;
border-top: 1px dashed #ccc;
}
.signature-row {
display: flex;
justify-content: space-between;
margin-top: 30px;
}
.signature-box {
width: 45%;
}
.signature-line {
border-bottom: 1px solid #333;
height: 40px;
margin-top: 10px;
}
/* 页脚 */
.footer {
margin-top: 50px;
padding-top: 20px;
border-top: 1px solid #ddd;
font-size: 10pt;
color: #666;
text-align: center;
}
/* 打印样式 */
@media print {
body {
padding: 20px;
}
.page-break {
page-break-before: always;
}
}
</style>
</head>
<body>
<!-- 封面 -->
<h1>保险方案建议书</h1>
<div style="text-align: center; margin: 30px 0;">
<p style="font-size: 14pt;">尊敬的 <strong>{{ user_info.name | default('客户') }}</strong></p>
<p>感谢您对我们的信任,以下是根据您的需求定制的保险方案</p>
<p style="color: #666;">方案编号: {{ document_id }} | 生成日期: {{ generation_time | date_format }}</p>
</div>
<!-- 客户信息 -->
<h2>一、客户基本信息</h2>
<div class="info-card">
<div class="info-row">
<span class="info-label">姓名:</span>
<span class="info-value">{{ user_info.name | default('-') }}</span>
<span class="info-label">年龄:</span>
<span class="info-value">{{ user_info.age | default('-') }} 岁</span>
</div>
<div class="info-row">
<span class="info-label">性别:</span>
<span class="info-value">{% if user_info.gender == 'male' %}男{% elif user_info.gender == 'female' %}女{% else %}-{% endif %}</span>
<span class="info-label">职业:</span>
<span class="info-value">{{ user_info.occupation | default('-') }}</span>
</div>
<div class="info-row">
<span class="info-label">年收入:</span>
<span class="info-value">{{ user_info.annual_income | currency if user_info.annual_income else '-' }}</span>
<span class="info-label">家庭状况:</span>
<span class="info-value">{{ family_info.marital_status | default('-') }}{% if family_info.children_count %},{{ family_info.children_count }}个子女{% endif %}</span>
</div>
</div>
<!-- 需求分析 -->
<h2>二、保障需求分析</h2>
<h3>2.1 风险评估</h3>
<div class="info-card">
<div class="info-row">
<span class="info-label">风险承受能力:</span>
<span class="info-value">
{% if needs_analysis.risk_tolerance == 'low' %}保守型
{% elif needs_analysis.risk_tolerance == 'medium' %}稳健型
{% elif needs_analysis.risk_tolerance == 'high' %}进取型
{% else %}-{% endif %}
</span>
</div>
<div class="info-row">
<span class="info-label">建议保费预算:</span>
<span class="info-value">年收入的 5%-10%,约 {{ (user_info.annual_income * 0.05) | currency }} - {{ (user_info.annual_income * 0.1) | currency }}</span>
</div>
</div>
<h3>2.2 保障缺口分析</h3>
<table>
<thead>
<tr>
<th>保障类型</th>
<th>建议保额</th>
<th>现有保额</th>
<th>缺口</th>
<th>优先级</th>
</tr>
</thead>
<tbody>
{% for gap in needs_analysis.coverage_gaps | default([]) %}
<tr>
<td>{{ gap.type }}</td>
<td>{{ gap.recommended | currency }}</td>
<td>{{ gap.current | currency }}</td>
<td style="color: {% if gap.gap > 0 %}#dc3545{% else %}#28a745{% endif %};">
{{ gap.gap | currency }}
</td>
<td>{{ '★' * gap.priority }}{{ '☆' * (5 - gap.priority) }}</td>
</tr>
{% else %}
<tr>
<td colspan="5" style="text-align: center;">暂无保障缺口数据</td>
</tr>
{% endfor %}
</tbody>
</table>
<!-- 产品推荐 -->
<h2>三、推荐保险方案</h2>
{% for product in product_recommendations | default([]) %}
<div class="product-card">
<div>
<span class="product-name">{{ product.product_name }}</span>
<span class="product-type">{{ product.product_type }}</span>
</div>
<div class="product-detail">
<table>
<tr>
<td width="25%"><strong>保障额度</strong></td>
<td>{{ product.coverage_amount | currency }}</td>
<td width="25%"><strong>年缴保费</strong></td>
<td>{{ product.premium | currency }}</td>
</tr>
<tr>
<td><strong>缴费期限</strong></td>
<td>{{ product.payment_period }} 年</td>
<td><strong>保障期限</strong></td>
<td>{{ product.coverage_period | default('终身') }}</td>
</tr>
</table>
<p><strong>推荐理由:</strong></p>
<ul>
{% for reason in product.reasons | default([]) %}
<li>{{ reason }}</li>
{% endfor %}
</ul>
</div>
</div>
{% else %}
<p>暂无推荐产品</p>
{% endfor %}
<!-- 保费汇总 -->
<h2>四、保费汇总</h2>
<div class="info-card">
<table>
<thead>
<tr>
<th>产品名称</th>
<th>保障类型</th>
<th>年缴保费</th>
</tr>
</thead>
<tbody>
{% for product in product_recommendations | default([]) %}
<tr>
<td>{{ product.product_name }}</td>
<td>{{ product.product_type }}</td>
<td>{{ product.premium | currency }}</td>
</tr>
{% endfor %}
<tr style="background: #e8f0f8; font-weight: bold;">
<td colspan="2">年缴保费合计</td>
<td>{{ total_premium | currency }}</td>
</tr>
</tbody>
</table>
<p>
<strong>保费占年收入比例:</strong>
<span style="color: {% if premium_ratio <= 0.1 %}#28a745{% elif premium_ratio <= 0.15 %}#ffc107{% else %}#dc3545{% endif %};">
{{ premium_ratio | percent }}
</span>
{% if premium_ratio <= 0.1 %}
(合理范围内)
{% elif premium_ratio <= 0.15 %}
(略高,请评估是否可承受)
{% else %}
(超出建议范围,建议调整方案)
{% endif %}
</p>
</div>
<!-- 风险提示 -->
<div class="risk-warning">
<p class="risk-warning-title">⚠️ 重要提示</p>
<ol>
<li><strong>投保前请仔细阅读保险条款</strong>,了解保险责任、责任免除、犹豫期等重要内容。</li>
<li><strong>请如实告知健康状况</strong>,未如实告知可能影响理赔。</li>
<li><strong>保险产品存在退保损失</strong>,犹豫期后退保仅退还现金价值。</li>
<li>本方案仅供参考,具体以保险合同条款为准。</li>
<li>如有疑问,请咨询专业保险顾问或拨打保险公司客服热线。</li>
</ol>
</div>
<!-- 免责条款 -->
<h2>五、免责声明</h2>
<div style="font-size: 10pt; color: #666;">
<p>1. 本保险方案建议书基于您提供的信息生成,如信息有误可能影响方案的适用性。</p>
<p>2. 本建议书中的产品信息仅供参考,具体保障内容、保费费率以保险公司官方条款为准。</p>
<p>3. 投保前请充分了解产品特点,根据自身实际需求和经济能力审慎决策。</p>
<p>4. 本建议书不构成保险合同的组成部分。</p>
</div>
<!-- 签名区 -->
<div class="signature-area">
<h2>六、确认签署</h2>
<p>本人已阅读并理解上述保险方案建议书的全部内容,了解推荐产品的保障责任、费用及风险提示。</p>
<div class="signature-row">
<div class="signature-box">
<p>客户签名:</p>
<div class="signature-line"></div>
<p>日期:_____年_____月_____日</p>
</div>
<div class="signature-box">
<p>顾问签名:</p>
<div class="signature-line"></div>
<p>日期:_____年_____月_____日</p>
</div>
</div>
</div>
<!-- 页脚 -->
<div class="footer">
<p>本文档由 MBE 智能保顾系统生成</p>
<p>文档编号:{{ document_id }} | 生成时间:{{ generation_time | date_format('%Y-%m-%d %H:%M:%S') }}</p>
<p>如有疑问,请联系客服</p>
</div>
</body>
</html>
6.2 医疗问诊摘要模板
<!-- templates/medical/consultation_summary.html.j2 -->
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<title>问诊摘要</title>
<style>
body {
font-family: "SimSun", "Microsoft YaHei", sans-serif;
font-size: 12pt;
line-height: 1.8;
color: #333;
max-width: 800px;
margin: 0 auto;
padding: 40px;
}
.header {
border-bottom: 2px solid #2e7d32;
padding-bottom: 20px;
margin-bottom: 30px;
}
.header h1 {
color: #2e7d32;
margin: 0;
}
.header-meta {
color: #666;
font-size: 10pt;
margin-top: 10px;
}
.section {
margin: 25px 0;
}
.section-title {
font-size: 14pt;
color: #2e7d32;
border-left: 4px solid #2e7d32;
padding-left: 10px;
margin-bottom: 15px;
}
.patient-info {
display: grid;
grid-template-columns: repeat(4, 1fr);
gap: 15px;
background: #f5f5f5;
padding: 20px;
border-radius: 8px;
}
.patient-info-item {
display: flex;
flex-direction: column;
}
.patient-info-label {
font-size: 10pt;
color: #666;
}
.patient-info-value {
font-weight: bold;
}
.chief-complaint {
background: #e8f5e9;
padding: 15px 20px;
border-radius: 8px;
font-size: 14pt;
}
.symptom-list {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 15px;
}
.symptom-card {
background: #fff;
border: 1px solid #ddd;
border-radius: 8px;
padding: 15px;
}
.symptom-name {
font-weight: bold;
color: #2e7d32;
}
.symptom-detail {
font-size: 10pt;
color: #666;
margin-top: 5px;
}
.severity-bar {
height: 8px;
background: #eee;
border-radius: 4px;
margin-top: 10px;
overflow: hidden;
}
.severity-fill {
height: 100%;
border-radius: 4px;
}
.assessment-box {
background: #fff3e0;
border: 1px solid #ff9800;
border-radius: 8px;
padding: 20px;
}
.condition-tag {
display: inline-block;
background: #fff;
border: 1px solid #ddd;
padding: 5px 15px;
border-radius: 20px;
margin: 5px;
}
.urgency-high { color: #d32f2f; border-color: #d32f2f; }
.urgency-medium { color: #ff9800; border-color: #ff9800; }
.urgency-low { color: #4caf50; border-color: #4caf50; }
.recommendation-list {
list-style: none;
padding: 0;
}
.recommendation-item {
display: flex;
align-items: flex-start;
padding: 10px 0;
border-bottom: 1px dashed #ddd;
}
.recommendation-icon {
width: 30px;
height: 30px;
background: #e8f5e9;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
margin-right: 15px;
color: #2e7d32;
}
.red-flags {
background: #ffebee;
border: 1px solid #ef5350;
border-radius: 8px;
padding: 15px;
margin: 20px 0;
}
.red-flags-title {
color: #c62828;
font-weight: bold;
}
.disclaimer {
background: #f5f5f5;
border-radius: 8px;
padding: 20px;
margin-top: 30px;
font-size: 10pt;
color: #666;
}
.disclaimer-title {
color: #333;
font-weight: bold;
margin-bottom: 10px;
}
.footer {
margin-top: 40px;
text-align: center;
font-size: 10pt;
color: #999;
}
</style>
</head>
<body>
<!-- 头部 -->
<div class="header">
<h1>📋 在线问诊摘要</h1>
<div class="header-meta">
问诊编号:{{ session_id }} | 问诊时间:{{ consultation_time | date_format('%Y-%m-%d %H:%M') }}
</div>
</div>
<!-- 患者信息 -->
<div class="section">
<h2 class="section-title">患者信息</h2>
<div class="patient-info">
<div class="patient-info-item">
<span class="patient-info-label">年龄</span>
<span class="patient-info-value">{{ patient_info.age | default('-') }} 岁</span>
</div>
<div class="patient-info-item">
<span class="patient-info-label">性别</span>
<span class="patient-info-value">{% if patient_info.gender == 'male' %}男{% elif patient_info.gender == 'female' %}女{% else %}-{% endif %}</span>
</div>
<div class="patient-info-item">
<span class="patient-info-label">身高</span>
<span class="patient-info-value">{{ patient_info.height | default('-') }} cm</span>
</div>
<div class="patient-info-item">
<span class="patient-info-label">体重</span>
<span class="patient-info-value">{{ patient_info.weight | default('-') }} kg</span>
</div>
</div>
</div>
<!-- 主诉 -->
<div class="section">
<h2 class="section-title">主诉</h2>
<div class="chief-complaint">
{{ chief_complaint | default('未记录') }}
</div>
</div>
<!-- 症状详情 -->
<div class="section">
<h2 class="section-title">症状详情</h2>
<div class="symptom-list">
{% for symptom in present_illness.symptoms | default([]) %}
<div class="symptom-card">
<div class="symptom-name">{{ symptom.symptom }}</div>
<div class="symptom-detail">
{% if symptom.location %}部位:{{ symptom.location }} | {% endif %}
{% if symptom.duration %}持续:{{ symptom.duration }} | {% endif %}
{% if symptom.frequency %}频率:{{ symptom.frequency }}{% endif %}
</div>
{% if symptom.severity %}
<div class="severity-bar">
<div class="severity-fill" style="width: {{ symptom.severity * 10 }}%; background: {% if symptom.severity <= 3 %}#4caf50{% elif symptom.severity <= 6 %}#ff9800{% else %}#f44336{% endif %};"></div>
</div>
<div style="font-size: 9pt; color: #999; margin-top: 3px;">严重程度: {{ symptom.severity }}/10</div>
{% endif %}
</div>
{% else %}
<p>未记录详细症状</p>
{% endfor %}
</div>
</div>
<!-- 病史 -->
<div class="section">
<h2 class="section-title">相关病史</h2>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<td style="width: 25%; padding: 10px; background: #f5f5f5; font-weight: bold;">既往病史</td>
<td style="padding: 10px;">{{ medical_history.past_illnesses | join_list if medical_history.past_illnesses else '无特殊' }}</td>
</tr>
<tr>
<td style="padding: 10px; background: #f5f5f5; font-weight: bold;">过敏史</td>
<td style="padding: 10px;">{{ medical_history.allergies | join_list if medical_history.allergies else '无已知过敏' }}</td>
</tr>
<tr>
<td style="padding: 10px; background: #f5f5f5; font-weight: bold;">用药情况</td>
<td style="padding: 10px;">{{ medical_history.current_medications | join_list if medical_history.current_medications else '无' }}</td>
</tr>
</table>
</div>
<!-- AI评估 -->
<div class="section">
<h2 class="section-title">AI 健康评估</h2>
<div class="assessment-box">
<p><strong>可能的情况:</strong></p>
<div>
{% for condition in assessment.possible_conditions | default([]) %}
<span class="condition-tag">{{ condition }}</span>
{% endfor %}
</div>
<p style="margin-top: 20px;"><strong>紧急程度:</strong>
<span class="condition-tag urgency-{{ assessment.urgency | default('low') }}">
{% if assessment.urgency == 'emergency' %}🚨 紧急就医
{% elif assessment.urgency == 'urgent' %}⚠️ 尽快就医
{% elif assessment.urgency == 'soon' %}📅 建议就医
{% else %}✅ 可自我观察
{% endif %}
</span>
</p>
{% if assessment.differential_diagnosis %}
<p style="margin-top: 15px;"><strong>鉴别诊断:</strong></p>
<ul>
{% for dd in assessment.differential_diagnosis %}
<li>{{ dd }}</li>
{% endfor %}
</ul>
{% endif %}
</div>
</div>
<!-- 警示症状 -->
{% if recommendations.red_flags %}
<div class="red-flags">
<p class="red-flags-title">🚨 请注意以下警示症状</p>
<p>如果出现以下情况,请立即就医:</p>
<ul>
{% for flag in recommendations.red_flags %}
<li>{{ flag }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
<!-- 建议 -->
<div class="section">
<h2 class="section-title">健康建议</h2>
<ul class="recommendation-list">
{% if recommendations.further_tests %}
<li class="recommendation-item">
<div class="recommendation-icon">🔬</div>
<div>
<strong>建议检查</strong>
<p>{{ recommendations.further_tests | join_list }}</p>
</div>
</li>
{% endif %}
{% if recommendations.lifestyle_changes %}
<li class="recommendation-item">
<div class="recommendation-icon">🏃</div>
<div>
<strong>生活方式建议</strong>
<ul>
{% for change in recommendations.lifestyle_changes %}
<li>{{ change }}</li>
{% endfor %}
</ul>
</div>
</li>
{% endif %}
{% if recommendations.follow_up %}
<li class="recommendation-item">
<div class="recommendation-icon">📅</div>
<div>
<strong>随访建议</strong>
<p>{{ recommendations.follow_up }}</p>
</div>
</li>
{% endif %}
</ul>
</div>
<!-- 免责声明 -->
<div class="disclaimer">
<p class="disclaimer-title">⚠️ 重要声明</p>
<p>{{ disclaimer | default('以上内容由AI健康助手根据您描述的症状生成,仅供参考,不能替代专业医生的诊断和治疗。如有不适,请及时就医。') }}</p>
<ul>
<li>AI分析基于您提供的信息,可能存在局限性</li>
<li>本摘要不构成医疗诊断或治疗建议</li>
<li>请勿自行用药,需在专业医生指导下进行治疗</li>
<li>如症状持续或加重,请尽快前往正规医疗机构就诊</li>
</ul>
</div>
<!-- 页脚 -->
<div class="footer">
<p>本文档由 MBE 智能健康助手生成</p>
<p>文档编号:{{ document_id }} | 生成时间:{{ generation_time | date_format('%Y-%m-%d %H:%M:%S') }}</p>
</div>
</body>
</html>
七、实施计划
7.1 开发阶段
| 阶段 | 内容 | 工期 | 交付物 |
|---|---|---|---|
| Phase 1: 基础架构 | 核心引擎、模板引擎、格式渲染器 | 2周 | 可运行的文档生成骨架 |
| Phase 2: 内容提取 | LLM提取器、对话摘要 | 1.5周 | 对话内容结构化能力 |
| Phase 3: 保险插件 | 保险行业完整支持 | 2周 | 保险文档全流程 |
| Phase 4: 医疗插件 | 医疗行业完整支持 | 2周 | 医疗文档全流程 |
| Phase 5: 理财插件 | 理财行业完整支持 | 1.5周 | 理财文档全流程 |
| Phase 6: 合规&测试 | 合规检查、全面测试 | 1周 | 生产可用版本 |
总计:约10周
7.2 技术依赖
# requirements.txt 新增
# 模板引擎
Jinja2>=3.1.0
# PDF生成
weasyprint>=60.0
# 或 reportlab>=4.0
# Word生成
python-docx>=1.0.0
# HTML解析
beautifulsoup4>=4.12.0
html2text>=2020.1.16
# JSON Schema验证
jsonschema>=4.0.0
# 中文字体支持
fonttools>=4.0.0
7.3 部署配置
# config/document_generation.yaml
document_generation:
# 模板路径
template_path: "src/document/templates"
# 存储配置
storage:
type: "oss" # local / oss / s3
bucket: "mbe-documents"
expire_days: 30
# 渲染配置
rendering:
pdf:
engine: "weasyprint" # weasyprint / wkhtmltopdf
page_size: "A4"
default_margins: "2cm"
word:
default_font: "宋体"
default_size: 12
# LLM配置
llm:
extraction_model: "gpt-4"
max_tokens: 4000
temperature: 0.3
# 合规配置
compliance:
strict_mode: true
fail_on_warning: false
# 行业插件
plugins:
enabled:
- insurance
- medical
- finance
- education
八、集成与配置
8.1 主应用集成
文档生成模块已集成到 MBE 主应用(src/main.py),启动时自动初始化:
# main.py 中的集成代码
from src.document.api.router import router as document_router, setup_document_api
from src.document.llm_adapter import create_document_orchestrator
# 应用启动时初始化
if DOCUMENT_MODULE_AVAILABLE:
orchestrator = create_document_orchestrator()
setup_document_api(orchestrator)
# 注册路由
app.include_router(document_router, tags=["文档生成"])
8.2 LLM 服务配置
文档生成模块通过 MBELLMAdapter 适配器使用 MBE 的 LLM 服务:
# src/document/llm_adapter.py
class MBELLMAdapter:
"""将 MBE 的 LLM 服务适配到文档生成模块"""
async def generate(self, prompt, system_prompt="", ...):
# 使用 src.llm.base 中的 LLM 客户端
client = get_llm_client()
return await client.chat(system_prompt, prompt, ...)
LLM 配置通过环境变量控制(.env 文件):
# LLM 提供商配置
LLM_PROVIDER=deepseek # deepseek / qwen / doubao / openrouter
LLM_API_KEY=your_api_key
LLM_MODEL=deepseek-chat
# 弹性客户端配置(推荐启用)
MBE_USE_RESILIENT_LLM=true
8.3 已注册的 API 端点
| 端点 | 方法 | 说明 |
|---|---|---|
/api/v1/document/generate |
POST | 生成文档 |
/api/v1/document/status/{id} |
GET | 查询文档状态 |
/api/v1/document/preview |
POST | 预览文档内容 |
/api/v1/document/download/{id} |
GET | 下载文档文件 |
/api/v1/document/{id} |
DELETE | 删除文档 |
/api/v1/document/list |
GET | 列出文档 |
/api/v1/document/industries |
GET | 获取支持的行业列表 |
/api/v1/document/batch-export |
POST | 批量导出 |
/api/v1/summary/conversation |
POST | 生成对话摘要 |
/api/v1/summary/multi-session |
POST | 多会话摘要 |
/api/v1/summary/export |
POST | 导出摘要 |
/api/v1/templates |
GET | 获取模板列表 |
/api/v1/templates/{id} |
GET | 获取模板详情 |
/api/v1/templates/{id}/schema |
GET | 获取模板数据结构 |
/api/v1/templates/{id}/preview |
POST | 预览模板效果 |
8.4 代码使用示例
# 在其他模块中使用文档生成功能
from src.document.llm_adapter import create_document_orchestrator
async def generate_insurance_proposal(conversation: list, user_info: dict):
"""为用户生成保险方案建议书"""
orchestrator = create_document_orchestrator()
result = await orchestrator.generate(
session_id="session_123",
device_id="device_abc",
industry="insurance",
document_type="insurance_proposal",
output_format="pdf",
conversation=conversation,
custom_data={
"user_info": user_info,
"generation_options": {
"include_disclaimer": True,
"style": "professional"
}
}
)
return result
# 生成对话摘要
async def summarize_conversation(conversation: list):
"""生成对话摘要"""
from src.document.core.content_extractor import ContentExtractor
extractor = ContentExtractor()
content = await extractor.extract(
session_id="session_123",
conversation=conversation,
industry="general"
)
return {
"summary": content.summary,
"key_points": content.key_points,
"user_needs": content.user_needs
}
九、总结
本文档生成模块设计方案具有以下特点:
- 通用性强:通过插件架构支持多行业扩展
- 合规优先:内置行业合规检查,降低法律风险
- LLM增强:利用大模型能力自动提取和优化内容
- 格式灵活:支持PDF、Word、HTML、Markdown等多种输出
- 模板驱动:内容与样式分离,易于维护和定制
- 可追溯:文档与源对话关联,支持审计
- 无缝集成:已集成到MBE主应用,开箱即用
通过本模块,MBE平台将具备完整的"对话→理解→文档"闭环能力,为各行业提供专业的文档生成服务。