Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于进行data cleaning之后生成的很多个excel文件,和下一步进行LDA+超参中使用的文件是同一批吗 #21

Open
takahaxiran opened this issue Mar 18, 2023 · 4 comments

Comments

@takahaxiran
Copy link

代码上写的路径:F:\A数据比赛\正式流程及文件\2.数据处理\正文信息\正文分词分月_关键词\
但是请问分月操作在哪,爬取的几百个excel评论文件怎么跨越到了正文分词分月这里呢

@stay-leave
Copy link
Owner

是同一个数据。要先合并成一个Excel,然后根据时间又切分为若干Excel,然后分词,转为TXT即可

@takahaxiran
Copy link
Author

合并之后,之前的博文id直接丢弃了吗,再手动按月份切分?

@stay-leave
Copy link
Owner

对,之后正文就是做lda分析了

@takahaxiran
Copy link
Author

感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants