From 087bd1a650ac4aae189925e2ef34a69142e7a0d1 Mon Sep 17 00:00:00 2001 From: Claude-51124 Date: Sat, 22 Nov 2025 14:31:36 +0900 Subject: [PATCH] docs: note 768d embedding update and chroma dimension mismatch --- ...‹จ์ผํ™”_๊ธฐ๋ฐ˜_ํ†ตํ•ฉ_๋ถ„๋ฅ˜_์‹œ์Šคํ…œ.md | 4 +++- ...earch_์ฝœ๋“œ๋ฉ”์ผ_tdd_ํ…Œ์ŠคํŠธ_๊ณ„ํš.md | 2 +- .../2025_ko_sroberta_runtime_eval.md | 3 +-- ...pybell80_skill-embedding์„œ๋น„์Šค๊ตฌ์ถ•.md | 4 +++- ...happybell80_chromadb_dimension_mismatch.md | 24 +++++++++++++++++++ 5 files changed, 32 insertions(+), 5 deletions(-) create mode 100644 journey/troubleshooting/251122_happybell80_chromadb_dimension_mismatch.md diff --git a/journey/ideas/250815_์ž„๋ฒ ๋”ฉ_๋‹จ์ผํ™”_๊ธฐ๋ฐ˜_ํ†ตํ•ฉ_๋ถ„๋ฅ˜_์‹œ์Šคํ…œ.md b/journey/ideas/250815_์ž„๋ฒ ๋”ฉ_๋‹จ์ผํ™”_๊ธฐ๋ฐ˜_ํ†ตํ•ฉ_๋ถ„๋ฅ˜_์‹œ์Šคํ…œ.md index 798a5ef..a0e3a75 100644 --- a/journey/ideas/250815_์ž„๋ฒ ๋”ฉ_๋‹จ์ผํ™”_๊ธฐ๋ฐ˜_ํ†ตํ•ฉ_๋ถ„๋ฅ˜_์‹œ์Šคํ…œ.md +++ b/journey/ideas/250815_์ž„๋ฒ ๋”ฉ_๋‹จ์ผํ™”_๊ธฐ๋ฐ˜_ํ†ตํ•ฉ_๋ถ„๋ฅ˜_์‹œ์Šคํ…œ.md @@ -5,6 +5,8 @@ ์ƒํƒœ: ์•„์ด๋””์–ด โ†’ ์‹คํ—˜ ์˜ˆ์ • ๊ด€๋ จ: ๊ฐ์ • ์‹œ์Šคํ…œ, ์œค๋ฆฌ ์‹œ์Šคํ…œ, ์ž„๋ฒ ๋”ฉ ์„œ๋น„์Šค +> 2025-11-22 ์—…๋ฐ์ดํŠธ: ํ˜„์žฌ ์šด์˜ ์ž„๋ฒ ๋”ฉ ์„œ๋น„์Šค๋Š” ko-sroberta 768์ฐจ์›์œผ๋กœ ์ „ํ™˜๋˜์—ˆ์œผ๋ฉฐ, ๋ณธ ๋ฌธ์„œ์˜ 384์ฐจ์› ๊ฐ€์ •์€ ์—ญ์‚ฌ ๊ธฐ๋ก์œผ๋กœ๋งŒ ์ฐธ๊ณ . ์ฐจ์› ๋ถˆ์ผ์น˜ ๋Œ€์‘์€ [251122_happybell80_chromadb_dimension_mismatch.md] ์ฐธ๊ณ . + ## ๊ฐœ์š” ํ˜„์žฌ ๋ถ„๋ฆฌ๋œ 3๊ฐœ ๋ชจ๋ธ(์ž„๋ฒ ๋”ฉ, ๊ฐ์ •, ์œค๋ฆฌ)์„ ๋‹จ์ผ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ๋กœ ํ†ตํ•ฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ 67% ์ ˆ๊ฐ, ์†๋„ 3๋ฐฐ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•˜๋Š” ์•„ํ‚คํ…์ฒ˜์ž…๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ๊ธฐ์–ต ์ €์žฅ, ๊ฐ์ • ๋ถ„๋ฅ˜, ์œค๋ฆฌ ํŒ๋‹จ์„ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. @@ -588,4 +590,4 @@ def standardize_embedding(embedding): *"์ฐจ์›๋ณด๋‹ค ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋งˆ์ง„์ด๋‹ค"* -**๋‹ค์Œ ๋‹จ๊ณ„**: ๋งˆ์ง„ ๊ธฐ๋ฐ˜ ์—์Šค์ปฌ๋ ˆ์ด์…˜ ํŒŒ์ผ๋Ÿฟ ํ…Œ์ŠคํŠธ \ No newline at end of file +**๋‹ค์Œ ๋‹จ๊ณ„**: ๋งˆ์ง„ ๊ธฐ๋ฐ˜ ์—์Šค์ปฌ๋ ˆ์ด์…˜ ํŒŒ์ผ๋Ÿฟ ํ…Œ์ŠคํŠธ diff --git a/journey/plans/251110_gemini_file_search_์ฝœ๋“œ๋ฉ”์ผ_tdd_ํ…Œ์ŠคํŠธ_๊ณ„ํš.md b/journey/plans/251110_gemini_file_search_์ฝœ๋“œ๋ฉ”์ผ_tdd_ํ…Œ์ŠคํŠธ_๊ณ„ํš.md index e730bb1..c139912 100644 --- a/journey/plans/251110_gemini_file_search_์ฝœ๋“œ๋ฉ”์ผ_tdd_ํ…Œ์ŠคํŠธ_๊ณ„ํš.md +++ b/journey/plans/251110_gemini_file_search_์ฝœ๋“œ๋ฉ”์ผ_tdd_ํ…Œ์ŠคํŠธ_๊ณ„ํš.md @@ -45,6 +45,7 @@ ## 2. ์ž„๋ฒ ๋”ฉ ์ฐจ์› ํ˜ธํ™˜์„ฑ ### ํ˜„์žฌ ์‹œ์Šคํ…œ +- **2025-11-22 ์—…๋ฐ์ดํŠธ**: skill-embedding์€ ko-sroberta 768์ฐจ์›์œผ๋กœ ์šด์˜ ์ค‘์ด๋ฉฐ, ๊ธฐ์กด 384์ฐจ์› ์ปฌ๋ ‰์…˜๊ณผ ํ˜ผ์šฉํ•˜๋ฉด ์ฐจ์› ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์‹ค์ œ ์šด์˜ ์ƒํƒœ์™€ ๋ถˆ์ผ์น˜ํ•œ ์•„๋ž˜ 384์ฐจ์› ๊ฐ€์ •์€ ์ฐธ์กฐ์šฉ์ด๋ฉฐ, ๋Œ€์‘ ๋ฐฉ์•ˆ์€ [251122_happybell80_chromadb_dimension_mismatch.md] ์ฐธ๊ณ . - skill-embedding: multilingual-MiniLM-L12-v2 (384์ฐจ์›) - ChromaDB ์ปฌ๋ ‰์…˜: skill_rag_file_{team_id}_documents @@ -214,4 +215,3 @@ **์ž‘์„ฑ**: Claude Code, 2025-11-10 **์ƒํƒœ**: ๊ณ„ํš ๋‹จ๊ณ„ (๊ตฌํ˜„ ์ „) **์ฐธ๊ณ **: research/rag/251110_gemini_file_search_api_ํ…Œ์ŠคํŠธ_๋ฐ_์ฝœ๋“œ๋ฉ”์ผ_๊ฐœ์„ ๋ฐฉ์•ˆ_ํ‰๊ฐ€.md - diff --git a/journey/research/memory/embedding_search/2025_ko_sroberta_runtime_eval.md b/journey/research/memory/embedding_search/2025_ko_sroberta_runtime_eval.md index 7586a6e..750185b 100644 --- a/journey/research/memory/embedding_search/2025_ko_sroberta_runtime_eval.md +++ b/journey/research/memory/embedding_search/2025_ko_sroberta_runtime_eval.md @@ -11,7 +11,7 @@ refs: # Ko-SRoBERTa ์ž„๋ฒ ๋”ฉ ์ „ํ™˜ ์‚ฌ์ „ ๊ฒ€์ฆ ## 1. ๋ฐฐ๊ฒฝ -- ํ˜„์žฌ 8515 `skill-embedding`์€ multilingual MiniLM-L12-v2 ONNX(384d) ๊ธฐ๋ฐ˜์œผ๋กœ, [370๋ฒˆ ๋ฌธ์„œ]์—์„œ ์ •์˜ํ•œ ์ค‘์•™ ์ž„๋ฒ ๋”ฉ ์„œ๋น„์Šค ๊ตฌ์กฐ๋ฅผ ๋”ฐ๋ฅธ๋‹ค. +- **2025-11-22 ์—…๋ฐ์ดํŠธ**: 8515 `skill-embedding`์ด ko-sroberta(multitask) 768d๋กœ ๋ฐฐํฌ๋จ. ๊ธฐ์กด 384d ๊ฐ€์ •์€ ๋” ์ด์ƒ ์œ ํšจํ•˜์ง€ ์•Š์œผ๋ฉฐ, 384d ์ปฌ๋ ‰์…˜๊ณผ ํ˜ผ์šฉ ์‹œ ์ฐจ์› ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ [251122_happybell80_chromadb_dimension_mismatch.md] ์ฐธ๊ณ . - Intent/runtime ๊ณ ๋„ํ™” ๊ณ„ํš(251017 ๋ฌธ์„œ)๊ณผ Vector Memory ์•„ํ‚คํ…์ฒ˜(330๋ฒˆ ๋ฌธ์„œ)์—์„œ๋Š” ํ•œ๊ตญ์–ด ํŠนํ™” SentenceTransformer ์ฑ„ํƒ์„ ๊ณ ๋ คํ•˜๊ณ  ์žˆ์–ด, Ko-SRoBERTa(multitask, 768d)๋ฅผ ํ›„๋ณด ๋ชจ๋ธ๋กœ ์„ ์ •ํ–ˆ๋‹ค. - ์‹ค์ œ ์ „ํ™˜ ์ „, ์˜๋„ ๋ถ„๋ฅ˜ยท์ฝœ๋“œ๋ฉ”์ผ IRยทSemanticIntentClassifier ํ๋ฆ„์—์„œ ์ •๋Ÿ‰ ๋น„๊ต๊ฐ€ ํ•„์š”ํ•ด ๋ณธ ๋ฆฌํฌํŠธ๋ฅผ ์ž‘์„ฑํ–ˆ๋‹ค. @@ -67,4 +67,3 @@ refs: - ์‹ ๊ทœ `tests/data/intent_eval_calendar.json` 10๋ฌธ์žฅ ๊ธฐ์ค€ accuracy **100%**, avg 25.0โ€ฏms. - `PYTHONPATH=. pytest tests/test_intent_entity_skill_comprehensive.py -k intent_classification_coverage` โ†’ 22 testcases all pass, ์ผ์ • ๋ฌธ์žฅ 2๊ฑด๋„ `calendar_event` ํŒ์ •. - ๋‚จ์€ TODO: coldmail/vector ๋ฐ์ดํ„ฐ ์žฌ์ž„๋ฒ ๋”ฉ ์ž๋™ํ™”, SemanticIntentClassifier threshold๋ฅผ ์šด์˜ ๋ชจ๋‹ˆํ„ฐ๋ง์— ๋…ธ์ถœ. - diff --git a/journey/troubleshooting/250805_happybell80_skill-embedding์„œ๋น„์Šค๊ตฌ์ถ•.md b/journey/troubleshooting/250805_happybell80_skill-embedding์„œ๋น„์Šค๊ตฌ์ถ•.md index 83cfb52..ce4a03c 100644 --- a/journey/troubleshooting/250805_happybell80_skill-embedding์„œ๋น„์Šค๊ตฌ์ถ•.md +++ b/journey/troubleshooting/250805_happybell80_skill-embedding์„œ๋น„์Šค๊ตฌ์ถ•.md @@ -4,6 +4,8 @@ **์ž‘์—…์ž**: happybell80 & Claude **๊ด€๋ จ ์„œ๋ฒ„**: 51124 (skill-embedding ์„œ๋น„์Šค) +> 2025-11-22 ์—…๋ฐ์ดํŠธ: ํ˜„์žฌ skill-embedding์€ ko-sroberta(multitask) 768์ฐจ์›์œผ๋กœ ์ „ํ™˜๋จ. ์ดํ•˜ 384์ฐจ์› ์„ค์ •์€ ์ดˆ๊ธฐ ๊ตฌ์ถ• ๊ธฐ๋ก์ด๋ฉฐ, ์ฐจ์› ๋ถˆ์ผ์น˜ ๋Œ€์‘์€ [251122_happybell80_chromadb_dimension_mismatch.md] ์ฐธ๊ณ . + ## ์˜ค์ „ 10์‹œ 30๋ถ„ ### ์ž„๋ฒ ๋”ฉ ์„œ๋น„์Šค ๋ถ„๋ฆฌ ๊ฒฐ์ • @@ -518,4 +520,4 @@ class Settings(BaseSettings): 23. **์ ์ ˆํ•œ ๋กœ๊ทธ ๋ ˆ๋ฒจ ์„ค์ •** - LOG_LEVEL=INFO๋กœ ํ•„์š”ํ•œ ๋กœ๊ทธ๋งŒ ๊ธฐ๋ก - DEBUG ํ™˜๊ฒฝ๋ณ€์ˆ˜์™€ LOG_LEVEL ๊ตฌ๋ถ„ ํ•„์š” - - ํ”„๋กœ๋•์…˜์—์„œ๋Š” INFO ๋ ˆ๋ฒจ ๊ถŒ์žฅ \ No newline at end of file + - ํ”„๋กœ๋•์…˜์—์„œ๋Š” INFO ๋ ˆ๋ฒจ ๊ถŒ์žฅ diff --git a/journey/troubleshooting/251122_happybell80_chromadb_dimension_mismatch.md b/journey/troubleshooting/251122_happybell80_chromadb_dimension_mismatch.md new file mode 100644 index 0000000..9653960 --- /dev/null +++ b/journey/troubleshooting/251122_happybell80_chromadb_dimension_mismatch.md @@ -0,0 +1,24 @@ +# ChromaDB ์ฐจ์› ๋ถˆ์ผ์น˜๋กœ RAG ๊ฒ€์ƒ‰ ์‹คํŒจ + +- **๋‚ ์งœ**: 2025-11-22 +- **์ž‘์„ฑ์ž**: happybell80 +- **๊ด€๋ จ ์„œ๋น„์Šค**: skill-rag-file (8508), skill-embedding (8515), rb8001 +- **๊ด€๋ จ ๋ฌธ์„œ**: 370_์ž„๋ฒ ๋”ฉ_์„œ๋น„์Šค_๋ถ„๋ฆฌ_์•„ํ‚คํ…์ฒ˜.md, 2025_ko_sroberta_runtime_eval.md + +## ์ƒํ™ฉ +- skill-embedding์ด ko-sroberta(multitask) 768์ฐจ์›์œผ๋กœ ๊ต์ฒด๋œ ์ƒํƒœ์—์„œ ๊ธฐ์กด ChromaDB ์ปฌ๋ ‰์…˜์ด 384์ฐจ์› ์„ค์ •์œผ๋กœ ๋‚จ์•„ ์žˆ์Œ. +- ์‹ค์ œ ํŒŒ์ผ ํ…Œ์ŠคํŠธ(`scripts/test_ir_extraction_real_file.py dc1da3f6-... 7944...`) ๊ฒฐ๊ณผ IR ์ง€ํ‘œ๊ฐ€ ๋ชจ๋‘ `N/A`. +- skill-rag-file ๋กœ๊ทธ์— `Collection expecting embedding with dimension of 384, got 768`๊ฐ€ ๋ฐ˜๋ณต ๊ธฐ๋ก. + +## ์›์ธ +1. ์ด์ „ MiniLM(384d)๋กœ ์ƒ์„ฑ๋œ ์ปฌ๋ ‰์…˜์— ์ƒˆ ์ž„๋ฒ ๋”ฉ(768d)์„ ์กฐํšŒ/์‚ฝ์ž…ํ•˜๋ฉด์„œ ์ฐจ์› ๋ถˆ์ผ์น˜ ๋ฐœ์ƒ. +2. ์—…๋กœ๋“œ ์งํ›„ ํ˜ธ์ถœ ํƒ€์ด๋ฐ ์ด์Šˆ๊ฐ€ ๋ณด์กฐ์ ์œผ๋กœ ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋‚˜, ์ฐจ์› ๋ถˆ์ผ์น˜ ๋•Œ๋ฌธ์— ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ 0๊ฑด์œผ๋กœ ๊ณ ์ •. + +## ์กฐ์น˜/๊ณ„ํš +- **์žฌ์ธ๋ฑ์‹ฑ ํ•„์š”**: `skill_rag_file_79441171-3951-4870-beb8-916d07fe8be5_documents` ๋“ฑ ๊ธฐ์กด ์ปฌ๋ ‰์…˜์„ 768์ฐจ์›์œผ๋กœ ์žฌ์ƒ์„ฑ ํ›„ ํ•ด๋‹น document_id๋ฅผ ์žฌ์ธ๋ฑ์‹ฑ. +- ์žฌ์ธ๋ฑ์‹ฑ ์ „ ์ž„์‹œ ์šฐํšŒ ์—†์Œ. ๊ธฐ์กด 384d ์ปฌ๋ ‰์…˜ ์œ ์ง€ ์‹œ ๊ฒ€์ƒ‰ 0๊ฑด โ†’ IR `N/A`. +- ์žฌ์ธ๋ฑ์‹ฑ ํ›„ `scripts/test_ir_extraction_real_file.py `๋กœ ๊ฒ€์ฆ ์˜ˆ์ •. + +## ๊ตํ›ˆ +- ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ ๊ต์ฒด ์‹œ ๊ธฐ์กด ๋ฒกํ„ฐ DB ์ฐจ์›๊ณผ ์ผ๊ด€์„ฑ ํ™•์ธ ํ•„์ˆ˜(๋ฐฐํฌ ์ „ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜/dual-write ๊ณ„ํš). +- RAG ๊ฒฐ๊ณผ 0๊ฑด์ด ์ง€์†๋˜๋ฉด ChromaDB ์ฐจ์›/๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ถ€ํ„ฐ ์ ๊ฒ€ํ•˜๊ณ , ๋กœ๊ทธ์— dimension mismatch๊ฐ€ ์—†๋Š”์ง€ ํ™•์ธ.