Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for parsing custom corpus #47

Open
oveddan opened this issue Oct 4, 2019 · 4 comments
Open

Support for parsing custom corpus #47

oveddan opened this issue Oct 4, 2019 · 4 comments

Comments

@oveddan
Copy link

oveddan commented Oct 4, 2019

It would be great if you supported parsing a custom corpus that a user can provide, that is already formatted in an argument that is recognizable, and can feed into createDataset.py
For example, if you could pass a text file that looks like:

message: How's it going today?
response: It's going alright
message: What's for dinner tonight?
response: Chicken baked with cheese

@adeshpande3
Copy link
Owner

Yeah totally agree with that. Things kinda busy for me on the life end, but PRs are welcome 🙏

@caraneel
Copy link

caraneel commented Oct 5, 2019

On this note-- is there any way to see what the format is of the document that is being fed to createDataset.py? I realize the actual files contain personal info, I just want to know the structure of it so I can recreate it. The fbchat-archive-parser doesn't work for my message data, so I want to re-create the file that would have resulted from running fbcap ./messages.htm > fbMessages.txt

@adeshpande3
Copy link
Owner

Actually now that I think back to this project (it's been a while for me), I think the fbMessages.txt file is actually pretty similar to the format @oveddan was talking about (correct me if I'm wrong though). #28 Basically just username: message on each line and then you should enter your username here (https://github.com/adeshpande3/Facebook-Messenger-Bot/blob/master/createDataset.py#L7)

@oveddan
Copy link
Author

oveddan commented Oct 5, 2019

ok so what you're saying is, it expects a file in the format of:

somePerson: hi how's it going
someOtherPerson: ok
somePerson: what's for dinner tonight?
someOtherPerson: chicken on rice

Would it work if given a file like this? It would be great if there a sample file format for fb messages somewhere in this repo, considering the fb message gathering repo is obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants