감정 시스템 로드맵 현실화 - 이미 학습된 7개 감정 모델 반영

2025-08-12 23:10:21 +09:00 · 2025-08-12 23:10:21 +09:00 · f46bdfe399
commit f46bdfe399
parent cd0f9f5648
2 changed files with 378 additions and 73 deletions
--- a/plans/250808_감정시스템_현실적용_5단계_로드맵.md
+++ b/plans/250808_감정시스템_현실적용_5단계_로드맵.md
@ -12,31 +12,35 @@

 ---

-## Phase 1: 최소 기능 구현
+## Phase 1: 7개 감정 기본 구현 (이미 학습 완료)

 ### 목표
-"5개 기본정서로 감정 인식이 작동하는 최소 시스템"
+"이미 학습된 7개 한국어 감정 모델을 skill-embedding에 통합"

 ### 구현 범위
 ```python
-# 기본정서만 구현
-BASIC_EMOTIONS = [
-    "joy",      # 기쁨
-    "sadness",  # 슬픔  
-    "anger",    # 분노
-    "fear",     # 두려움
-    "disgust"   # 혐오
+# AI Hub 데이터로 학습 완료된 7개 감정
+EMOTIONS = [
+    "fear",      # 공포 (기본정서)
+    "surprise",  # 놀람 (기본정서)
+    "anger",     # 분노 (기본정서)
+    "sadness",   # 슬픔
+    "neutral",   # 중립
+    "happiness", # 행복  
+    "disgust"    # 혐오
 ]
+# 모델 성능: F1 56.3%, Temperature Scaling 1.232

-# 단순 엔트로피 계산
+# 엔트로피 계산 (7개 감정)
 def calculate_entropy(probs: List[float]) -> float:
-    """5개 확률값으로 엔트로피 계산"""
+    """7개 감정 확률값으로 엔트로피 계산"""
    return -sum(p * log(p) for p in probs if p > 0)
 ```

 ### 기술 스택
- **임베딩**: 기존 skill-embedding 서비스 활용 (포트 8502)
- **저장**: 기존 ChromaDB 활용
+- **감정 모델**: klue/bert-base 기반 (이미 학습 완료)
+- **임베딩**: 기존 skill-embedding 서비스 확장 (포트 8515)
+- **저장**: 기존 ChromaDB 활용 (메타데이터에 감정 추가)
 - **의사결정**: ε-greedy (ε=0.1)
 - **기존 코드**: rb10508_micro의 memory/storage.py 재사용

@ -45,10 +49,11 @@ def calculate_entropy(probs: List[float]) -> float:
 - 정확도: 사용자 평가 3.5/5.0
 - 메모리: 200MB 이내

-### 데이터 준비
- 감정당 100개 샘플 (총 500개)
- Gemini로 초기 라벨 생성
- 수동 검증 20%
+### 데이터 준비 (완료)
+- AI Hub 한국어 대화 데이터셋 38,594개 샘플
+- 7개 감정 균형 분포
+- 학습/검증/테스트 분할 완료
+- 클래스 가중치 적용

 ### 검증 방법
 ```bash
@ -63,18 +68,19 @@ curl -w "@curl-format.txt" http://localhost:8503/analyze
 ```

 ### 산출물
- [ ] skill-embedding 서비스에 감정 분석 엔드포인트 추가
- [ ] 5개 감정 프로토타입 정의
- [ ] 기본 엔트로피 계산기
- [ ] 기존 ChromaDB 통합 코드
- [ ] 최소 테스트 데이터 (100개)
+- [x] 7개 감정 모델 학습 완료 (training_emotion)
+- [ ] skill-embedding 서비스에 /analyze_emotion 엔드포인트 추가
+- [ ] Temperature Scaling 적용 (1.232)
+- [ ] 엔트로피 계산기 구현
+- [ ] ChromaDB 메타데이터 통합
+- [ ] rb10508_micro 연동

 ---

-## Phase 2: 성능 최적화
+## Phase 2: 성능 최적화 및 통합

 ### 목표
-"응답시간 200ms 달성 및 캐싱 시스템 구축"
+"ONNX 변환, 캐싱 구현, rb10508_micro 완전 통합"

 ### 최적화 전략
 ```python
@ -115,77 +121,84 @@ stats.sort_stats('cumulative').print_stats(10)
 - 캐시 적중률: 30%

 ### 산출물
- [ ] LRU 캐시 시스템
+- [ ] ONNX 모델 변환 (442MB → 150MB)
+- [ ] LRU 캐시 시스템 (5분 TTL)
 - [ ] 배치 처리 API
- [ ] ChromaDB 인덱스 최적화
- [ ] 성능 모니터링 대시보드
- [ ] 프로파일링 리포트
+- [ ] ChromaDB 감정 메타데이터 인덱싱
+- [ ] 성능 모니터링 (Grafana)
+- [ ] rb10508_micro 감정 기반 응답 톤 조정

 ---

-## Phase 3: 사회기능 감정 추가
+## Phase 3: 감정 패턴 분석 및 개인화

 ### 목표
-"9개 감정으로 확장하고 2헤드 구조 도입"
+"장기 감정 패턴 추적, 사용자별 감정 프로파일 구축"

-### 확장 감정
+### 감정 패턴 분석
 ```python
-# 사회기능 추가
-SOCIAL_EMOTIONS = [
-    "anxiety",       # 불안
-    "envy",          # 질투
-    "embarrassment", # 당혹
-    "ennui"          # 권태
-]
-
-# 2헤드 병렬 처리
-async def two_head_analysis(text: str, context: dict):
-    basic_task = analyze_basic(text)      # 100ms 목표
-    social_task = analyze_social(text, context)  # 300ms 목표
+# 시간별 감정 추적
+class EmotionTracker:
+    def __init__(self, user_id: str):
+        self.user_id = user_id
+        self.history = []  # 시계열 감정 데이터
    
-    basic, social = await asyncio.gather(basic_task, social_task)
+    def track(self, emotion_result: dict):
+        """감정 결과를 시계열로 저장"""
+        self.history.append({
+            "timestamp": datetime.now(),
+            "emotions": emotion_result["emotions"],
+            "dominant": emotion_result["dominant"],
+            "entropy": emotion_result["entropy"]
+        })
    
-    # 동적 가중치 계산
-    w = calculate_weight(len(text), context)
-    return w * basic + (1-w) * social
+    def get_pattern(self, period: str = "day"):
+        """일/주/월 단위 감정 패턴 분석"""
+        # 시간대별 주요 감정
+        # 감정 변화 추이
+        # 엔트로피 패턴
+        return analyze_temporal_pattern(self.history, period)
 ```

-### 데이터 확장
- 새 감정당 200개 샘플 추가
- 총 1,300개 라벨 데이터
- 크라우드소싱 활용 검토
+### 개인화 전략
+- 사용자별 감정 프로파일 생성
+- 감정 응답 히스토리 학습
+- 개인별 감정 임계값 조정
+- 엔트로피 특이점 활용 (창발적 응답)

-### Thompson Sampling 도입
+### 엔트로피 기반 의사결정
 ```python
-class ThompsonSampler:
+class EntropyBasedDecision:
    def __init__(self):
-        self.alpha = np.ones(9)  # 성공 횟수
-        self.beta = np.ones(9)   # 실패 횟수
+        self.entropy_threshold = 2.5  # 특이점 임계값
    
-    def sample(self):
-        """베타 분포에서 샘플링"""
-        return np.random.beta(self.alpha, self.beta)
+    def should_be_creative(self, entropy: float) -> bool:
+        """높은 엔트로피일 때 창의적 응답"""
+        return entropy > self.entropy_threshold
    
-    def update(self, action, reward):
-        """결과에 따라 파라미터 업데이트"""
-        if reward > 0:
-            self.alpha[action] += 1
-        else:
-            self.beta[action] += 1
+    def adjust_response(self, response: str, emotion_result: dict):
+        """감정에 따른 응답 톤 조정"""
+        if emotion_result["dominant"] == "sadness":
+            return make_empathetic(response)
+        elif emotion_result["dominant"] == "anger":
+            return make_calm(response)
+        elif self.should_be_creative(emotion_result["entropy"]):
+            return make_creative(response)
+        return response
 ```

 ### 성능 목표
- 기본정서: 100ms
- 사회기능: 300ms
- 통합 응답: 350ms
- 정확도: 4.0/5.0
+- 패턴 분석: 일 1회 배치 처리
+- 프로파일 업데이트: 실시간
+- 감정 히스토리: 30일 보관
+- 개인화 정확도: 70% 이상

 ### 산출물
- [ ] 9개 감정 프로토타입
- [ ] 2헤드 병렬 처리 시스템
- [ ] Thompson Sampling 구현
- [ ] 1,300개 라벨 데이터
- [ ] A/B 테스트 결과
+- [ ] 감정 패턴 분석기
+- [ ] 사용자 감정 프로파일 DB
+- [ ] 엔트로피 기반 의사결정 모듈
+- [ ] 시계열 감정 시각화
+- [ ] 개인화 응답 전략

 ---

--- a/plans/250812_감정시스템_Phase1_실행계획.md
+++ b/plans/250812_감정시스템_Phase1_실행계획.md
@ -0,0 +1,292 @@
+# 감정 시스템 Phase 1 실행 계획
+
+작성일: 2025년 8월 12일  
+작성자: Claude (51123 서버)  
+상태: 실행 준비 완료
+
+## 1. 현황 분석 결과
+
+### 1.1 기존 자산
+- **학습 완료 모델**: training_emotion에 7개 감정 모델 (klue/bert-base)
+- **모델 성능**: F1 56.3%, Temperature Scaling 1.232
+- **인프라**: skill-embedding 서비스 운영 중 (FastAPI, ONNX Runtime)
+- **여유 자원**: CPU 0.05%, 메모리 873MB/2GB
+
+### 1.2 기술 스택 확인
+```
+✅ transformers 4.45.2 설치됨
+✅ FastAPI 구조 확립
+✅ ONNX Runtime 지원
+✅ ChromaDB 메타데이터 저장 가능
+```
+
+## 2. 감정 모델 상세
+
+### 2.1 7개 감정 체계
+```python
+emotions = [
+    'fear',      # 공포 (즉발적 기본정서)
+    'surprise',  # 놀람 (즉발적 기본정서)
+    'anger',     # 분노 (즉발적 기본정서)
+    'sadness',   # 슬픔 (감정적 반응)
+    'happiness', # 행복 (감정적 반응)
+    'disgust',   # 혐오 (감정적 반응)
+    'neutral'    # 중립 (균형 상태)
+]
+```
+
+### 2.2 로빙 철학과의 연결
+- **기본정서 (100ms)**: fear, surprise, anger → 즉발 반응
+- **사회기능 (500ms)**: sadness, happiness, disgust → 숙고된 반응
+- **엔트로피 특이점**: 높은 엔트로피 시 창발적 응답
+
+## 3. 구현 아키텍처
+
+### 3.1 서비스 확장 (Option A - 선택)
+```
+skill-embedding (포트 8515)
+├── /embed (기존)
+└── /analyze_emotion (신규)
+    ├── 입력: text, user_id (optional)
+    ├── 처리: 7개 감정 분석
+    └── 출력: emotions, dominant, entropy, confidence
+```
+
+### 3.2 API 설계
+```json
+// Request
+POST /analyze_emotion
+{
+    "text": "오늘 프로젝트가 실패했어요",
+    "user_id": "optional_for_caching"
+}
+
+// Response
+{
+    "emotions": {
+        "fear": 0.15,
+        "surprise": 0.05,
+        "anger": 0.25,
+        "sadness": 0.35,
+        "neutral": 0.10,
+        "happiness": 0.05,
+        "disgust": 0.05
+    },
+    "dominant": "sadness",
+    "entropy": 2.31,
+    "confidence": 0.35,
+    "processing_time_ms": 87
+}
+```
+
+## 4. 실행 계획
+
+### Phase 1-A: 기본 통합 (Day 1-2)
+
+#### 51123 서버 작업
+```bash
+# 1. 모델 준비
+sudo mkdir -p /opt/models/emotion
+sudo cp -r /path/to/training_emotion/outputs/aihub-7emotions-complete/* \
+         /opt/models/emotion/
+sudo chown -R admin:admin /opt/models/emotion
+```
+
+#### 로컬 개발자 작업
+```python
+# 2. emotion_analyzer.py 작성
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+import numpy as np
+
+class EmotionAnalyzer:
+    def __init__(self, model_path="/opt/models/emotion"):
+        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
+        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
+        self.temperature = 1.232  # from calibration
+        self.emotions = ['fear', 'surprise', 'anger', 'sadness', 
+                        'neutral', 'happiness', 'disgust']
+    
+    async def analyze(self, text: str) -> dict:
+        inputs = self.tokenizer(text, return_tensors="pt", 
+                               max_length=512, truncation=True)
+        
+        with torch.no_grad():
+            logits = self.model(**inputs).logits
+            probs = torch.softmax(logits / self.temperature, dim=-1)
+        
+        emotion_scores = {
+            emotion: float(probs[0][i]) 
+            for i, emotion in enumerate(self.emotions)
+        }
+        
+        dominant = max(emotion_scores, key=emotion_scores.get)
+        entropy = self._calculate_entropy(list(emotion_scores.values()))
+        
+        return {
+            "emotions": emotion_scores,
+            "dominant": dominant,
+            "entropy": entropy,
+            "confidence": emotion_scores[dominant]
+        }
+    
+    def _calculate_entropy(self, probs):
+        probs = np.array(probs)
+        probs = probs[probs > 0]
+        return -np.sum(probs * np.log(probs))
+```
+
+#### 51124 서버 작업
+```yaml
+# 3. docker-compose.yml 수정
+services:
+  skill-embedding:
+    volumes:
+      - /opt/models:/opt/models:ro
+    environment:
+      - EMOTION_MODEL_PATH=/opt/models/emotion
+```
+
+### Phase 1-B: 최적화 (Day 3-4)
+
+#### ONNX 변환
+```bash
+# 로컬 개발자
+python convert_to_onnx.py \
+    --model_path /opt/models/emotion \
+    --output_path /opt/models/emotion-onnx \
+    --optimize
+```
+
+#### 캐싱 구현
+```python
+from functools import lru_cache
+import hashlib
+
+class CachedEmotionAnalyzer(EmotionAnalyzer):
+    @lru_cache(maxsize=1000)
+    def _analyze_cached(self, text_hash: str):
+        # 실제 분석 로직
+        pass
+    
+    async def analyze(self, text: str, user_id: str = None):
+        text_hash = hashlib.md5(text.encode()).hexdigest()
+        cache_key = f"{user_id}:{text_hash}" if user_id else text_hash
+        return self._analyze_cached(cache_key)
+```
+
+### Phase 1-C: 통합 (Day 5)
+
+#### rb10508_micro 연동
+```python
+# Slack 메시지 처리 시
+async def process_slack_message(text: str, user_id: str):
+    # 1. 감정 분석
+    emotion_result = await call_emotion_api(text, user_id)
+    
+    # 2. ChromaDB 메타데이터 추가
+    metadata = {
+        "user_id": user_id,
+        "timestamp": datetime.now().isoformat(),
+        "emotions": emotion_result["emotions"],
+        "dominant_emotion": emotion_result["dominant"],
+        "emotional_entropy": emotion_result["entropy"]
+    }
+    
+    # 3. 응답 톤 조정
+    response_tone = adjust_tone_by_emotion(emotion_result["dominant"])
+    
+    return generate_response(text, tone=response_tone)
+```
+
+## 5. 성능 목표 및 측정
+
+### 5.1 목표 지표
+- **응답시간**: < 200ms (캐시 미스), < 50ms (캐시 히트)
+- **정확도**: 체감 정확도 > 70%
+- **메모리**: < 1.5GB 총 사용량
+- **처리량**: > 10,000 요청/일
+
+### 5.2 측정 방법
+```bash
+# 성능 테스트
+curl -w "@curl-format.txt" \
+     -X POST http://localhost:8515/analyze_emotion \
+     -H "Content-Type: application/json" \
+     -d '{"text":"테스트 메시지"}'
+
+# 부하 테스트
+locust -f tests/load_test.py --users 10 --spawn-rate 2
+```
+
+## 6. 리스크 관리
+
+### 6.1 잠재 위험
+| 위험 요소 | 영향도 | 대응 방안 |
+|---------|-------|----------|
+| 모델 크기 (442MB) | 중 | ONNX 변환으로 150MB로 축소 |
+| 메모리 부족 | 중 | 모델 공유, 캐싱 최적화 |
+| 낮은 정확도 (56.3%) | 낮 | confidence 임계값 설정 |
+| Cold start | 낮 | 서비스 시작 시 프리로드 |
+
+### 6.2 폴백 전략
+- confidence < 0.3일 때 neutral로 분류
+- 엔트로피 > 2.8일 때 "복잡한 감정" 표시
+- 오류 시 감정 분석 스킵하고 진행
+
+## 7. 작업 분담
+
+### 51123 서버 (시스템 관리)
+- [ ] 모델 파일 /opt/models 배치
+- [ ] 배포 파이프라인 설정
+- [ ] 시스템 모니터링
+- [ ] 문서화
+
+### 로컬 개발자 (코드 구현)
+- [ ] emotion_analyzer.py 작성
+- [ ] FastAPI 엔드포인트 추가
+- [ ] ONNX 변환 스크립트
+- [ ] rb10508_micro 연동
+
+### 51124 서버 (서비스 운영)
+- [ ] Docker 이미지 빌드
+- [ ] 서비스 배포
+- [ ] 로그 모니터링
+- [ ] 성능 측정
+
+## 8. 검증 계획
+
+### 8.1 단위 테스트
+```python
+def test_emotion_analysis():
+    analyzer = EmotionAnalyzer()
+    result = analyzer.analyze("정말 기쁜 하루였어요!")
+    assert result["dominant"] == "happiness"
+    assert result["confidence"] > 0.5
+```
+
+### 8.2 통합 테스트
+- Slack 메시지 → 감정 분석 → ChromaDB 저장
+- 다양한 감정 텍스트 100개 테스트
+- 응답 시간 측정
+
+## 9. 다음 단계 (Phase 2 준비)
+
+- 감정 기반 대화 전략 수립
+- 장기 감정 패턴 분석
+- 사용자별 감정 프로파일
+- 멀티모달 감정 분석 (이미지, 음성)
+
+## 10. 성공 기준
+
+✅ Phase 1 완료 조건:
+1. 7개 감정 분석 API 정상 작동
+2. 평균 응답시간 < 200ms
+3. rb10508_micro 통합 완료
+4. ChromaDB 메타데이터 저장 확인
+5. 100개 테스트 케이스 통과
+
+---
+
+**시작일**: 2025년 8월 13일 (예정)  
+**완료일**: 2025년 8월 17일 (목표)