open: 답변 품질 트러블 — 장황/무관 문서 혼입/점수 노출

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 09:16:29 +09:00 · 2026-03-22 09:16:29 +09:00 · d12e88a77d
commit d12e88a77d
parent 21b0d5b9ce
1 changed files with 33 additions and 0 deletions
--- a/journey/troubleshooting/260322_companyx_rag_답변품질_장황_무관문서혼입_점수노출.md
+++ b/journey/troubleshooting/260322_companyx_rag_답변품질_장황_무관문서혼입_점수노출.md
@ -0,0 +1,33 @@
+---
+type: troubleshooting
+tags: [companyx, rag, grounding, answer-quality, rb8001]
+status: open
+opened_date: 2026-03-22
+severity: high
+root_cause:
+---
+
+# 260322 Company X RAG 답변 품질 — 장황·무관 문서 혼입·점수 노출
+
+## 현상
+
+실제 Slack 응답에서 확인된 3가지 문제:
+
+1. **청크 원문 덤프** — 요약 없이 chunk_text를 그대로 노출. 사용자가 읽을 수 없음
+2. **무관 문서 혼입** — "아크로셀 정기주총 서류" 질문에 기술보증기금 협약서가 근거로 노출
+3. **점수 정보 노출** — `score 1.00, vec 0.77, kw 0.00` 같은 내부 지표가 사용자에게 보임
+
+## 직접 원인 (코드 기준)
+
+- `_build_evidence_lines()`: chunk_text[:180]을 그대로 넣고 점수를 함께 표시
+- `_select_top_results()`: relevance_score 순 상위 5개를 무조건 포함, 질문 적합도 재검증 없음
+- LLM 프롬프트: 근거 문서 목록을 LLM이 생성하는 게 아니라 코드가 하드코딩으로 붙임
+
+## 관련 파일
+
+- `rb8001/app/services/companyx_grounding_service.py`: `_build_evidence_lines()`, `_build_grounded_response()`
+- `DOCS/skills/companyx-rag/SKILL.md`: Response Shape 계약
+
+## 관련 문서
+
+- [260321 하이브리드 검색 품질 개선 계획](../plans/260321_하이브리드검색_품질개선_계획.md)