troubleshooting: add /api/text full-text path and LangGraph full-text-first flow after OCR reindex

2025-10-22 00:38:57 +09:00 · 2025-10-22 00:38:57 +09:00 · 0c1f302e0b
commit 0c1f302e0b
parent f9342c8279
1 changed files with 4 additions and 0 deletions
--- a/troubleshooting/251021_admin_slack_doc_analysis_pipeline_langgraph.md
+++ b/troubleshooting/251021_admin_slack_doc_analysis_pipeline_langgraph.md
@ -34,6 +34,10 @@
  - 파일: `rb8001/app/router/thread_doc_cache.py`, `rb8001/app/router/slack_handler.py`
 - PDF 추출 품질 개선: 품질 휴리스틱(len, garbage_ratio, unique_chars)로 저품질 텍스트 시 강제 OCR(pytesseract) 후 재청킹/인덱싱, 메타 기록(ocr_used, quality)
  - 파일: `skill-rag-file/app/api/upload.py`, `skill-rag-file/app/services/text_extractor.py`
 - 텍스트 직접 조회: `/api/text/{document_id}`로 전체 본문+메타 제공 → rb8001 LangGraph가 업로드 직후 doc_id 고정 후 본문 직접 분석(스니펫은 보조)
  - 파일: `skill-rag-file/app/api/text.py`, `skill-rag-file/app/main.py`
 - LangGraph 보강: 업로드 직후 `/api/reindex`(force_ocr)→`/api/text/{doc}`→검색 순으로 대기 처리, 분석은 full‑text 우선
  - 파일: `rb8001/app/pipelines/langgraph_document.py`, `rb8001/app/router/slack_handler.py`
 - (선행) 윤리 노출/모델 고정/메모리 위생:
  - 윤리 설명 사용자 비노출, 공정성 오탐 감소, 정적 대안 제거.
  - `gemini-2.5-flash-lite` 단일 모델 고정 및 중복 초기화 제거.