目标:
回答一个问题:
“How do you validate an AI system?”
结构:
1. Validation Scope
- Model vs System
- LLM / RAG / Agent
2. Testing Methods
(1) Benchmarking
- 标准数据集测试
(2) Scenario Testing
- stress cases
(3) Red Teaming(重点)
- adversarial prompts
- jailbreak
(4) Sensitivity Analysis
- prompt variation
- temperature变化
3. Metrics(AI特有)
- Accuracy(有限)
- Consistency
- Hallucination rate
- Robustness
4. Limitations Assessment
- known limitations
- undocumented risks
