-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistical feature #44
base: master
Are you sure you want to change the base?
Conversation
…. Database not yet populating
…nd finished some of the underlying implementation
…test is likely working properly
adding regular expression re import
hey - that's great - but that's a lot of commits to review in one PR - in the future let's do pull request for smaller and more homogeneous chunck when possible - testing the stuff on my machine now ! |
I can split this into backend and frontend, if that's easier! (Commented with the wrong account previously.) |
Some notes: You can test the API request directly by querying http://localhost:5000/json/collection-significance. Parameters to send are specified at http://localhost:5000/json/collection-significance/help I commented out some print statements in statistics.py. It might help to uncomment those to track exactly what's going on. |
I found that the widget does not always (actually, now never) show up after the significance test is finished - @neelsomani @Sharabesh did you observed this as well ? I created a "10-vision-papers" and a "10-audio-papers" collection and tried to test the difference, we should get "vision" blobs in the vision-audio and "audio" blob in the audio-vision. We probably need to have some kind of stupid dump (the x,y,z that are significant and their p-value / z-score) in some temporary text file to test this. |
Hmm, that seems to imply that the API endpoint is failing. Did you try pulling the latest version? I noticed that something was wrong shortly after I pushed - then I realized that my correction was incorrect, so I pushed again. If you don't have the latest version, that might explain it. I tested the latest version right before I left for work. |
Not going to bother deleting/readding that comment again... |
We probably don't want to dump a file to the server itself, but we can dump to a database. I think the best solution is to check your Javascript console, and/or uncomment the print statements in the Python code. Because the endpoint is asynchronous, it's not going to throw errors in the same way (i.e., on certain failures, rather than returning an invalid response code, it will just hang indefinitely). |
Alright, I just replicated what you did, and I found that there were no significant results returned by the API endpoint. (i.e., the call wasn't hanging/there was no error; it just didn't return any results.) I've added a |
With regards to significance values, if you set the threshold and width to some arbitrarily high value, you can see results, but under defaults it looks like results weren't showing up for the majority of the ones I tested. @jbpoline did you think was an issue with implementation or a case where we should update the default values? |
If you do a test with 30 articles with the query "vision" against the whole database, then results will show up with the current threshold; that's why I moved it back down from the arbitrarily high value. So it's more likely an issue with the implementation rather than the threshold (or neither, and this is just the correct result). |
testing it again - I had pulled though... will uncomment to see what's going on... |
ok - I get "no significant results" - which is a bit strange when comparing the "audio" versus "vision" - will dig into this ... |
Hi @jbpoline, aside from making the endpoint, we might want to look into this issue you brought up in July (where "audio" vs. "vision" gave no significant results), if it hasn't already been resolved. |
No description provided.