Claude-51124 22557e7132 docs: 오래된 트러블슈팅 아카이브 및 구조 정리

- 7-8월 초기 구축 문서 12개를 _archive/troubleshooting/2025_07-08_initial_setup/로 이동
- book/300_architecture/390_human_in_the_loop_intent_learning.md를 journey/research/intent_classification/로 이동 (개발 여정 문서)
- 빈 폴더 제거 (journey/assets/*)

2025-11-17 14:06:05 +09:00

3.0 KiB

Raw Blame History

Active Learning 쿼리 전략

작성일: 2025-11-17
목적: 라벨링 우선순위 결정을 통한 리뷰 큐 효율화

개요

Active Learning의 핵심은 불확실성이 높은 샘플을 우선적으로 라벨링하여 적은 데이터로도 모델 성능을 빠르게 개선하는 것입니다. 로빙 프로젝트에서는 리뷰 큐에 진입한 항목들을 우선순위에 따라 정렬하여 관리자가 효율적으로 라벨링할 수 있도록 지원합니다.

구현된 전략

1. Uncertainty Sampling (엔트로피 기반)

원리: 예측 분포의 엔트로피가 높을수록 모델이 불확실해하는 케이스입니다.

계산 방법:

entropy = -Σ(p * log2(p))  # 정규화된 엔트로피
uncertainty_score = entropy / max_entropy

특징:

엔트로피가 높을수록 (0.0 ~ 1.0) 불확실성이 높음
여러 의도에 비슷한 확률이 분산된 경우 우선순위 높음
예: calendar_query: 0.4, calendar_event: 0.35, document_analysis: 0.25 → 높은 불확실성

2. Margin Sampling (마진 기반)

원리: 1위와 2위 의도의 점수 차이가 작을수록 모델이 애매하게 판단하는 케이스입니다.

계산 방법:

margin = top1_score - top2_score
margin_score = 1.0 - margin  # 마진이 작으면 높은 점수

특징:

마진이 작을수록 (0.0에 가까울수록) 불확실성이 높음
두 의도가 거의 동일한 확률을 가진 경우 우선순위 높음
예: calendar_query: 0.45 vs calendar_event: 0.40 → 작은 마진

3. Confidence 기반 (기본)

원리: 예측 confidence가 낮을수록 불확실한 케이스입니다.

계산 방법:

confidence_score = 1.0 - predicted_confidence

특징:

가장 단순한 전략
절대적인 확신도가 낮은 경우 우선순위 높음

구현 위치

코드 구조

app/brain/active_learning.py
├── calculate_entropy()              # 엔트로피 계산
├── calculate_uncertainty_score()    # Uncertainty Score 계산
├── calculate_margin_score()          # Margin Score 계산
└── prioritize_review_queue()        # 우선순위 정렬

app/state/intent_review_repository.py
└── get_review_queue()               # priority_strategy 파라미터 추가

API 사용 예시

# Uncertainty sampling으로 정렬
queue = get_review_queue(
    db=session,
    status="pending",
    priority_strategy="uncertainty"
)

# Margin sampling으로 정렬
queue = get_review_queue(
    db=session,
    status="pending",
    priority_strategy="margin"
)

테스트 커버리지

tests/test_active_learning_query_strategy.py:

✅ 높은/낮은 엔트로피 케이스
✅ 작은/큰 마진 케이스
✅ 우선순위 정렬 검증
✅ 빈 데이터 처리

참고 문헌

PT4AL (2022): Self-Supervised Pretext Tasks for Active Learning
Sharma et al. (2015): Active Learning with Rationales for Text Classification

업데이트: 2025-11-17 - Uncertainty/Margin Sampling 구현 완료

3.0 KiB Raw Blame History

Active Learning 쿼리 전략

개요

구현된 전략

1. Uncertainty Sampling (엔트로피 기반)

2. Margin Sampling (마진 기반)

3. Confidence 기반 (기본)

구현 위치

코드 구조

API 사용 예시

테스트 커버리지

참고 문헌

3.0 KiB

Raw Blame History