Skip to content

[πŸ† μ΅œμš°μˆ˜μƒ] 2026 IBM Hackathon κ°•λ¦‰μ›μ£ΌλŒ€xκ°•μ›λŒ€

Notifications You must be signed in to change notification settings

IBM2026-Team6/AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

λ°œν‘œ λŒ€λ³Έ μžλ™ 생성 및 ν‚€μ›Œλ“œ 좔적 μ‹œμŠ€ν…œ

λ°œν‘œ μŠ¬λΌμ΄λ“œμ™€ 참고자료λ₯Ό 기반으둜 μŠ¬λΌμ΄λ“œλ³„ λ°œν‘œ λŒ€λ³Έμ„ μžλ™ μƒμ„±ν•˜κ³ , μƒμ„±λœ λŒ€λ³Έμ΄ ν‚€μ›Œλ“œλ₯Ό μ–Όλ§ˆλ‚˜ μ»€λ²„ν•˜λŠ”μ§€ μΆ”μ ν•˜λŠ” μ‹œμŠ€ν…œμž…λ‹ˆλ‹€.

μ£Όμš” κΈ°λŠ₯

1. λŒ€λ³Έ μžλ™ 생성 (main.py)

  • RAG 기반 μ°Έκ³ λ¬Έμ„œ 검색 (LangChain + ChromaDB)
  • IBM Watsonx λ˜λŠ” Upstage Solar LLM 선택 μ‚¬μš©
  • μŠ¬λΌμ΄λ“œλ³„ 핡심 ν‚€μ›Œλ“œ μΆ”μΆœ (μ˜΅μ…˜)

2. ν‚€μ›Œλ“œ 좔적 (run_tracker.py)

  • μƒμ„±λœ λŒ€λ³Έμ˜ ν‚€μ›Œλ“œ 컀버리지 뢄석
  • 3κ°€μ§€ λ§€μΉ­ 방식:
    • hybrid: token + sentence κ²°ν•© (κΈ°λ³Έ, 졜고 μ„±λŠ₯)
    • token: ν˜•νƒœμ†Œ 기반 λ§€μΉ­ (빠름)
    • sentence: λ¬Έμž₯ μœ μ‚¬λ„ 기반 λ§€μΉ­ (μ •ν™•)
  • ν‚€μ›Œλ“œ μ •κ·œν™”: ν•˜μ΄ν”ˆ/μ–Έλ”μŠ€μ½”μ–΄λ₯Ό 곡백으둜 μΉ˜ν™˜ (κΈ°λ³Έ ON)
  • μž„λ² λ”© 선택:
    • Transformer (둜컬, 빠름, κΈ°λ³Έκ°’)
    • Upstage API (μ •ν™•, API 호좜 ν•„μš”)

μ„€μΉ˜

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

ν•„μˆ˜ νŒ¨ν‚€μ§€

  • Python 3.11.9
  • langchain, langchain-openai, langchain-ibm
  • chromadb
  • sentence-transformers
  • konlpy (ν•œκ΅­μ–΄ ν˜•νƒœμ†Œ 뢄석, μ˜΅μ…˜)

ν™˜κ²½ λ³€μˆ˜ μ„€μ • (.env)

# IBM Watsonx
API_KEY=your_ibm_api_key
PROJECT_ID=your_ibm_project_id
IBM_CLOUD_URL=https://us-south.ml.cloud.ibm.com

# Upstage
UPSTAGE_API_KEY=your_upstage_api_key
UPSTAGE_API_URL=https://api.upstage.ai/v1/solar

μ‚¬μš©λ²•

1단계: λŒ€λ³Έ 생성

# IBM Watsonx μ‚¬μš©
python main.py --api ibm --extractor y

# Upstage Solar μ‚¬μš©
python main.py --api upstage --extractor y

좜λ ₯:

  • outputs/paper_scripts.md: μŠ¬λΌμ΄λ“œλ³„ λŒ€λ³Έ
  • outputs/paper_keywords.txt: μŠ¬λΌμ΄λ“œλ³„ ν‚€μ›Œλ“œ

2단계: ν‚€μ›Œλ“œ 좔적

# κΈ°λ³Έ (hybrid + μ •κ·œν™” ON, Transformer μž„λ² λ”©)
python run_tracker.py

# token λͺ¨λ“œ (μ •κ·œν™” OFF μ˜ˆμ‹œ)
python run_tracker.py -m token --normalize n

# sentence λͺ¨λ“œ + Upstage API
python run_tracker.py -m sentence --api y

좜λ ₯:

  • outputs/paper_coverage_analysis.txt: μŠ¬λΌμ΄λ“œλ³„ 컀버리지 뢄석

μ„€μ • (config.py)

경둜 μ„€μ •

docs_root = "./docs"           # λ¬Έμ„œ 폴더
out_dir = "./outputs"           # κ²°κ³Ό μ €μž₯ 폴더

RAG μ„€μ •

persist_dir = "./chroma_db"     # 벑터 DB μ €μž₯ 경둜
chunk_size = 400                # 청크 크기
chunk_overlap = 50              # 청크 μ˜€λ²„λž©
top_k = 3                       # 검색 κ²°κ³Ό 개수

ν‚€μ›Œλ“œ 좔적 μ„€μ •

lexical_threshold = 0.5         # 토큰 λ§€μΉ­ μž„κ³„κ°’
semantic_threshold = 0.65       # λ¬Έμž₯ μœ μ‚¬λ„ μž„κ³„κ°’

μž„λ² λ”© λͺ¨λΈ

# Transformer (둜컬)
transformer_model = "paraphrase-multilingual-MiniLM-L12-v2"

# Upstage (API)
embedding_model = "solar-embedding-1-large"

ν”„λ‘œμ νŠΈ ꡬ쑰

.
β”œβ”€β”€ docs/                       # λ¬Έμ„œ 폴더
β”‚   β”œβ”€β”€ paper.pdf              # λ°œν‘œ μŠ¬λΌμ΄λ“œ
β”‚   └── report.pdf             # μ°Έκ³  λ¬Έμ„œ
β”œβ”€β”€ outputs/                    # κ²°κ³Ό 파일
β”‚   β”œβ”€β”€ paper_scripts.md       # μƒμ„±λœ λŒ€λ³Έ
β”‚   β”œβ”€β”€ paper_keywords.txt     # μΆ”μΆœλœ ν‚€μ›Œλ“œ
β”‚   └── paper_coverage_analysis.txt  # 컀버리지 뢄석
β”œβ”€β”€ tracker/                    # ν‚€μ›Œλ“œ 좔적 λͺ¨λ“ˆ
β”‚   β”œβ”€β”€ keyword_tracker.py     # 토큰 기반 좔적
β”‚   └── sentence_tracker.py    # λ¬Έμž₯ μœ μ‚¬λ„ 좔적
β”œβ”€β”€ main.py                     # λŒ€λ³Έ 생성
β”œβ”€β”€ run_tracker.py             # ν‚€μ›Œλ“œ 좔적
β”œβ”€β”€ config.py                   # μ„€μ •
└── requirements.txt            # μ˜μ‘΄μ„±

λ§€μΉ­ 방식 비ꡐ

방식 정확도 속도 νŠΉμ§•
token 89% 빠름 ν˜•νƒœμ†Œ 기반, μ •ν™•ν•œ 단어 λ§€μΉ­
sentence 82% 느림 의미 기반, λ¬Έλ§₯ 이해
hybrid 95.9% 쀑간 token + sentence κ²°ν•©, κΈ°λ³Έ μ •κ·œν™”(곡백 μΉ˜ν™˜)

ꢌμž₯: ν”„λ ˆμ  ν…Œμ΄μ…˜ μΆ”μ μ—λŠ” python run_tracker.py (hybrid + μ •κ·œν™” ON) μ‚¬μš©


μ£Όμš” ν•¨μˆ˜

tracker/keyword_tracker.py

  • normalize_text(): ν…μŠ€νŠΈ μ •κ·œν™”
  • tokenize_simple_ko_en(): ν•œμ˜ 토큰화
  • token_overlap_score(): 토큰 μ˜€λ²„λž© 점수 계산
  • parse_keywords_from_file(): ν‚€μ›Œλ“œ 파일 νŒŒμ‹±
  • parse_scripts_by_slide(): λŒ€λ³Έ 파일 νŒŒμ‹±

tracker/sentence_tracker.py

  • SentenceMatcher: λ¬Έμž₯ μœ μ‚¬λ„ λ§€μΉ­ 클래슀
  • find_matches(): λ¬Έμž₯μ—μ„œ ν‚€μ›Œλ“œ λ§€μΉ­
  • get_coverage_for_slide(): μŠ¬λΌμ΄λ“œ 컀버리지 계산

run_tracker.py

  • main_analysis(): ν‚€μ›Œλ“œ 좔적 메인 둜직

λΌμ΄μ„ΌμŠ€

MIT License

About

[πŸ† μ΅œμš°μˆ˜μƒ] 2026 IBM Hackathon κ°•λ¦‰μ›μ£ΌλŒ€xκ°•μ›λŒ€

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •