From f3c062ce019f9b1539f54c8c3b0a47ac4c971c30 Mon Sep 17 00:00:00 2001
From: Claude-51124 <claude@51124.local>
Date: Sat, 3 Jan 2026 12:13:02 +0900
Subject: [PATCH] =?UTF-8?q?docs:=20=ED=95=98=EC=9D=B4=EB=B8=8C=EB=A6=AC?=
 =?UTF-8?q?=EB=93=9C=20=EC=9D=98=EB=8F=84=20=EB=B6=84=EB=A5=98=20=EC=84=B1?=
 =?UTF-8?q?=EB=8A=A5=20=EB=B9=84=EA=B5=90=20=ED=85=8C=EC=8A=A4=ED=8A=B8=20?=
 =?UTF-8?q?=EA=B2=B0=EA=B3=BC=20=EB=AC=B8=EC=84=9C=ED=99=94?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- FastPath 49.6%, 제로샷 임베딩 23.4%, 병행 비교 49.6% 테스트 결과 기록
- Multi-centroid 방식 및 Few-shot LLM 프롬프트 개선 방향 제시
- 계획 문서 및 리서치 문서 업데이트
---
 ...251017_intent_analysis_improvement_plan.md |  41 ++++---
 .../research/intent_classification/README.md  |  15 ++-
 ...�_의도_분류_성능_비교_테스트.md | 106 ++++++++++++++++++
 3 files changed, 146 insertions(+), 16 deletions(-)
 create mode 100644 journey/troubleshooting/260103_하이브리드_의도_분류_성능_비교_테스트.md

diff --git a/journey/plans/archive/251017_intent_analysis_improvement_plan.md b/journey/plans/archive/251017_intent_analysis_improvement_plan.md
index 5894988..d2d9725 100644
--- a/journey/plans/archive/251017_intent_analysis_improvement_plan.md
+++ b/journey/plans/archive/251017_intent_analysis_improvement_plan.md
@@ -43,32 +43,43 @@
 
 ## 미구현: 하이브리드 시스템
 
-### 제안 구조
+**테스트 결과 (2026-01-03)**: `troubleshooting/260103_하이브리드_의도_분류_성능_비교_테스트.md`
+- FastPath: 49.6% (72ms)
+- 제로샷 임베딩: 23.4% (80ms) - 개선 필요
+- 병행 비교: 49.6% (154ms) - 의미 없음
+
+### 개선된 구조
 ```
 사용자 메시지
   ↓
 1단계: 정규식 FastPath (명확한 패턴)
-  ↓ 실패
-2단계: 임베딩 후보 축소 (Top-3)
+  ↓ 실패 또는 확신도 < 0.9
+2단계: Multi-centroid 임베딩 (Top-3 후보 생성)
   ↓ 확신도 < 0.7
-3단계: LLM 제로샷 분류
+3단계: Few-shot LLM 분류 (Top-3 후보 + 예시)
 ```
 
 ### 필요 작업
 
-**1. SemanticIntentClassifier 구현**
-- 파일: `app/services/brain/semantic_classifier.py`
-- intent_prototypes 테이블 활용
-- 임베딩 유사도로 Top-3 후보 선택
+**1. Multi-centroid 방식 도입**
+- 파일: `scripts/seed_intent_runtime.py` 확장
+- 각 intent별 5-10개 예시 문장으로 K-means centroid 생성
+- `seed_calendar_event_samples.py` 방식 참고
 
-**2. LLM 폴백**
-- Top-3 후보를 LLM에 전달
-- 확신도 < 0.5 시 CLARIFY
+**2. Intent Prototypes DB 초기화**
+- 768d Ko-SRoBERTa 기준으로 재임베딩
+- intent_prototypes 테이블 version=2로 저장
+- dimension mismatch 해결
 
-**3. 성능 최적화**
-- 정규식: 80% 케이스 (< 10ms)
-- 임베딩: 15% 케이스 (< 200ms)
-- LLM: 5% 케이스 (1-2s)
+**3. Few-shot LLM 프롬프트 개선**
+- 파일: `app/services/llm/intent_parser.py`
+- Top-3 후보를 활용한 Few-shot 예시 추가
+- Gemini 프롬프트 설계 원칙(`313_Gemini_프롬프트_설계_원칙.md`) 적용
+
+**4. 성능 목표**
+- FastPath: 80% 케이스 (< 10ms) - 유지
+- Multi-centroid embedding: 75%+ 정확도 (< 200ms) - 목표
+- Few-shot LLM: 5% 케이스 (1-2s) - 최적화
 
 ---
 
diff --git a/journey/research/intent_classification/README.md b/journey/research/intent_classification/README.md
index 7b9e9af..8736a7f 100644
--- a/journey/research/intent_classification/README.md
+++ b/journey/research/intent_classification/README.md
@@ -44,6 +44,7 @@
 ### 3. Few-shot Learning with User Feedback
 - **트렌드**: 소량의 사용자 피드백으로 빠른 적응
 - **로빙 적용**: 재학습 배치에서 few-shot learning 기법 활용 검토
+- **프롬프트 개선**: Gemini 프롬프트 설계 원칙(`book/300_architecture/313_Gemini_프롬프트_설계_원칙.md`) - Few-shot 예시가 제로샷보다 효과적
 
 ## 로빙 프로젝트 적용 현황
 
@@ -62,12 +63,22 @@
 - [ ] Few-shot learning 기법 적용
 - [ ] 암묵적 피드백 통합 (conversation_service 연동)
 
+### 성능 비교 테스트 (2026-01-03)
+- [x] FastPath vs 제로샷 임베딩 vs 병행 비교 테스트 ✅
+- **결과**: FastPath 49.6%, 제로샷 임베딩 23.4%, 병행 비교 49.6%
+- **개선 필요**: Multi-centroid 방식 도입, Few-shot LLM 프롬프트 개선
+- 상세: `troubleshooting/260103_하이브리드_의도_분류_성능_비교_테스트.md`
+
 ## 구현 문서
 
 - [재학습 파이프라인 설계](./retraining_pipeline_plan.md)
 - [Active Learning 쿼리 전략](./active_learning_query_strategy.md)
 - [암묵적 피드백 수집](./implicit_feedback.md)
 
+## 성능 비교 테스트
+
+- [하이브리드 의도 분류 성능 비교](../troubleshooting/260103_하이브리드_의도_분류_성능_비교_테스트.md): FastPath vs 제로샷 임베딩 vs 병행 비교 (2026-01-03)
+
 ## 관련 연구 분야
 
 - [Memory/Classification](../memory/classification/): BERT 임베딩, 하이브리드 LLM-ML 분류
@@ -83,5 +94,7 @@
 
 ---
 
-**업데이트**: 2025-11-17 - HITL 피드백 파이프라인 구현 완료, 논문 근거 추가
+**업데이트**:
+- 2025-11-17: HITL 피드백 파이프라인 구현 완료, 논문 근거 추가
+- 2026-01-03: 하이브리드 의도 분류 성능 비교 테스트 결과 추가, Few-shot LLM 프롬프트 개선 방향 추가
 
diff --git a/journey/troubleshooting/260103_하이브리드_의도_분류_성능_비교_테스트.md b/journey/troubleshooting/260103_하이브리드_의도_분류_성능_비교_테스트.md
new file mode 100644
index 0000000..240eff3
--- /dev/null
+++ b/journey/troubleshooting/260103_하이브리드_의도_분류_성능_비교_테스트.md
@@ -0,0 +1,106 @@
+# 하이브리드 의도 분류 성능 비교 테스트
+
+**날짜**: 2026-01-03
+**작성자**: Auto
+**관련 파일**: `rb8001/scripts/test_intent_classification_comparison.py`, `rb8001/app/services/brain/decision_engine.py`, `rb8001/app/services/brain/semantic_classifier.py`
+
+---
+
+## 문제 상황
+
+하이브리드 의도 분류 시스템 개선을 위해 FastPath(정규식), 제로샷 임베딩, 병행 비교 3가지 방법의 성능을 비교 검증 필요.
+
+---
+
+## 테스트 방법
+
+**테스트 데이터**: 141개 질문
+- `tests/data/intent_eval_samples.json`: 95개
+- `tests/data/intent_eval_challenge.json`: 28개
+- 실패한 질문 패턴: 18개
+
+**테스트 방법**:
+1. FastPath만: `DecisionEngine.analyze_intent()` (정규식 기반)
+2. 제로샷 임베딩만: `SemanticIntentClassifier.top_k()` + confidence
+3. 병행 비교: FastPath와 임베딩 모두 실행 후 confidence 비교하여 선택
+
+---
+
+## 테스트 결과
+
+| 방법 | 정확도 | 평균 응답 시간 | 정확도 순위 |
+|------|--------|---------------|------------|
+| FastPath만 | 49.6% (70/141) | 72.0ms | 1위 |
+| 제로샷 임베딩만 | 23.4% (33/141) | 80.4ms | 3위 |
+| 병행 비교 | 49.6% (70/141) | 153.6ms | 1위 |
+
+---
+
+## 문제 분석
+
+### 1. 제로샷 임베딩 성능 저하 (23.4%)
+
+**원인**:
+- `intent_prototypes` DB 미초기화 또는 384d→768d 차원 불일치
+- 단일 description 기반 prototype으로 유사도 계산 부정확
+- IntentType enum 값과 intent_prototypes의 intent 이름 불일치
+
+**리서치 기준 대비**: 75% 정확도 달성 가능 (multi-centroid 방식)
+
+### 2. 병행 비교 실패
+
+**원인**:
+- 제로샷 임베딩 정확도가 낮아 FastPath 결과만 선택됨
+- 두 결과가 다를 때 confidence 차이 0.3 이상 기준이 너무 높음
+- 대부분 케이스에서 FastPath가 항상 선택되어 병행 비교 의미 없음
+
+### 3. FastPath 한계 (49.6%)
+
+**주요 오류 패턴**:
+- "핀테크 업계 오늘 기사 검색" → web_search (실제: news_fetch)
+- "어제 받은 메일 두 줄로 정리" → email_read (실제: email_summary)
+- 유사한 intent 구분 어려움
+
+---
+
+## 개선 방향
+
+### 1. Multi-centroid 방식 도입
+- 각 intent별 5-10개 실제 예시 문장으로 K-means centroid 생성
+- `seed_calendar_event_samples.py` 방식으로 모든 intent 확장
+- intent_prototypes DB에 version=2로 저장
+
+### 2. Few-shot LLM 프롬프트 개선
+- Top-3 임베딩 후보를 활용한 Few-shot 예시 추가
+- Gemini 프롬프트 설계 원칙(`313_Gemini_프롬프트_설계_원칙.md`) 적용
+- XML 구조화된 프롬프트로 예시 블록 제공
+
+### 3. 하이브리드 3단계 최적화
+- FastPath(명확 패턴) → multi-centroid embedding(Top-3) → Few-shot LLM
+- multi-centroid 정확도 개선 후 병행 비교 재검토
+
+---
+
+## 교훈
+
+### 프로덕션 테스트의 중요성
+- 리서치 문서의 이론적 성능과 실제 성능 차이 확인 필수
+- intent_prototypes DB 초기화 상태와 차원 일치 여부 사전 확인
+
+### Few-shot Prompting 활용
+- 제로샷 LLM보다 Few-shot 예시가 효과적 (`313_Gemini_프롬프트_설계_원칙.md`)
+- Top-3 후보를 활용한 컨텍스트 제공으로 정확도 향상 가능
+
+### 단계적 개선 전략
+- FastPath 유지 (명확 패턴 처리, 49.6% 정확도)
+- multi-centroid embedding으로 75%+ 달성 후 병행 비교 적용
+
+---
+
+## 참고
+
+- 계획 문서: `journey/plans/archive/251017_intent_analysis_improvement_plan.md`
+- 리서치 문서: `journey/research/intent_classification/README.md`
+- 프롬프트 설계 원칙: `book/300_architecture/313_Gemini_프롬프트_설계_원칙.md`
+- 테스트 결과: `rb8001/tests/results/intent_classification_comparison.json`
+