Skip to content

seonwoo1218/python-lda-topic-modeling

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

37 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

python-lda-topic-modeling

ํ•œ๊ตญ์–ด ํ† ํ”ฝ๋ชจ๋ธ๋ง(Topic Modeling)์„ ์œ„ํ•œ python ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ๋ง์„ ์œ„ํ•ด Gensim ์„, ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด knolpy ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

1. ์ฃผ์š”๊ธฐ๋Šฅ

  1. ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ preprocessing.py

    • knolpy์˜ Okt(Open-Korean-text) ๊ธฐ๋ฐ˜ ๋ช…์‚ฌํ™” (์ปค์Šคํ…€ ์‚ฌ์ „ ์ถ”๊ฐ€ ๊ฐ€๋Šฅ)
    • ์‚ฌ์ „ ๊ธฐ๋ฐ˜ ๋ถˆ์šฉ์–ด ์ œ๊ฑฐ (๋ถˆ์šฉ์–ด ์‚ฌ์ „์€ stopwords/stopwordlist.txt์— 1์ค„์— 1๋‹จ์–ด์”ฉ ์ž‘์„ฑ)
    • 1๊ธ€์ž ๋‹จ์–ด ์ œ๊ฑฐ
    • ์ ๊ฒŒ ๋“ฑ์žฅํ•œ ๋‹จ์–ด ์ œ๊ฑฐ
  2. ๋นˆ๋„์ˆ˜ ๋ถ„์„ frequency_analysis.py

    • ํ† ํฐ(๋‹จ์–ด)์™€ ๋นˆ๋„์ˆ˜๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ œ์‹œ -> csv๋กœ ์ €์žฅ
    • ์›Œ๋“œํด๋ผ์šฐ๋“œ -> png๋กœ ์ €์žฅ
  3. ํ† ํ”ฝ ๊ฐฏ์ˆ˜ k ์ •ํ•˜๊ธฐ lda_explore_topic_number.py

    • ํ† ํ”ฝ ๊ฐฏ์ˆ˜ ๋ฒ”์œ„๋ฅผ ์ง€์ •, ๊ฐ ๊ฐฏ์ˆ˜๋ณ„ lda modeling์„ ์ˆ˜ํ–‰
    • ๊ฐ ๋ชจ๋ธ์˜ ํ˜ผ๋ž€๋„(perplexity)์™€ ์‘์ง‘๋„(coherence)๋ฅผ ๊ณ„์‚ฐ
    • ๊ฒฐ๊ณผ๊ฐ’์„ csv๋กœ, ์ด๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ png๋กœ ์ €์žฅ
  4. LDA lda.py

    • ์ „์ฒ˜๋ฆฌ๋œ ๋ถ„์„ ๋Œ€์ƒ ๋ฌธ์„œ์˜ ๋ง๋ญ‰์น˜(corpus)์™€ ๋”•์…”๋„ˆ๋ฆฌ(dictionary)์˜ ์ƒ์„ฑ, ์ €์žฅ
    • ์ง€์ •๋œ ํ† ํ”ฝ ๊ฐฏ์ˆ˜์˜ LDA ๋ชจ๋ธ ์ƒ์„ฑ
    • ๊ฐ ๋ฌธ์„œ์˜ ํ† ํ”ฝ ๋ถ„ํฌ์ œ์‹œ -> csv๋กœ ์ €์žฅ
    • ์ƒ์„ฑ๋œ LDA ๋ชจ๋ธ์˜ ์‹œ๊ฐํ™” -> html๋กœ ์ €์žฅ
  5. ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ฅธ ํ† ํ”ฝ ๋…ผ์˜ ์ถ”์„ธ๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ํšŒ๊ท€๋ถ„์„ ๊ธฐ๋ฐ˜ ์‹œ๊ณ„์—ด ๋ถ„์„ lda_hot_and_cold.py

    • ๊ฐ ๋ฌธ์„œ์˜ ํ† ํ”ฝ ๋ณ„ ฮธ ๊ฐ’ ๊ณ„์‚ฐ
    • y(ฮธ) = ax(time) + b ์˜ ์„ ํ˜• ํšŒ๊ท€๋ถ„์„
    • Hot & Cold ํ† ํ”ฝ ์ œ์‹œ
  6. ๊ธฐํƒ€ ๊ธฐ๋Šฅ๋“ค...

    • okt ์‚ฌ์ „์— ์ปค์Šคํ…€ ์‚ฌ์ „(๋ช…์‚ฌ, ์˜คํƒˆ์ž) ์ถ”๊ฐ€ custom_okt/okt_add_custom_dict.py

2. ์‹คํ–‰ํ™˜๊ฒฝ

์ฃผ์š” ํŒจํ‚ค์ง€ ๋ฒ„์ „

  • ์ฝ”๋“œ๋ฅผ ํ…Œ์ŠคํŠธํ–ˆ๋˜ ๋ฒ„์ „์ž„
  • python == 3.10.9
  • gensim == 4.3.0
  • knolpy == 0.6.0
  • pandas == 1.5.2
  • pyldavis == 3.3.1
  • statsmodels == 0.13.5
  • wordcloud == 1.9.3

Java

3. ์‚ฌ์šฉ๋ฐฉ๋ฒ•

  1. ์ฝ”๋“œ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ , ์ฝ”๋“œ๋ฅผ ์‹คํ–‰์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค(์ƒ๋‹จ ์‹คํ–‰ํ™˜๊ฒฝ ์ฐธ์กฐ)
  2. raw ๋ฐ์ดํ„ฐ์˜ ์ž…๋ ฅ โ†’ ์—‘์…€ํŒŒ์ผ(.xlsx)
  3. ์ƒ๋‹จ์˜ _setting() ์—์„œ ์„ธ๋ถ€ ์„ค์ •์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Œ
  4. ๊ฒฐ๊ณผ๋ฌผ์ด ์ƒ์„ฑ๋  ํด๋”๋Š” ๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด๋‘ฌ์•ผ ํ•จ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%