Skip to content

Draft: TXT2KG w/ hotpot_qa.py and tech_qa.py examples #9846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 327 commits into from

Conversation

puririshi98
Copy link
Contributor

@puririshi98 puririshi98 commented Dec 11, 2024

closed, new version: #9992

@puririshi98 puririshi98 changed the title Draft: TXT2KG w/ hotpot_qa.py` example for precision estimation Draft: TXT2KG w/ hotpot_qa.py example for precision estimation Dec 11, 2024
@puririshi98
Copy link
Contributor Author

@Kh4L reviews welcome

puririshi98 added a commit that referenced this pull request Dec 12, 2024
improve #9846

---------

Co-authored-by: riship <riship@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copy link
Contributor

@Kh4L Kh4L left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks, but LGTM!

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 🚀 I may have missed some past PRs that were already merged, but I'm sharing my thoughts here anyway :)

I think we might want to make sure that anything goes under torch_geometric/ is general enough, well documented, well tested so that it's reusable by users outside these example scripts. If these utils under torch_geometric/ are not general enough, users might end up copying the code into their own scripts instead of directly relying on torch_geometric/.

If the intention is for these LLMs+GNNs additions to serve as reference implementations (that users can tweak as needed), it'd make more sense to include them as examples rather than integrating them into torch_geometric/.

@puririshi98
Copy link
Contributor Author

puririshi98 commented Jan 7, 2025

"I think we might want to make sure that anything goes under torch_geometric/ is general enough, well documented, well tested so that it's reusable by users outside these example scripts"

I totally agree btw, this is still a draft. i aim to have full docstrings etc and polish this up once its ready, just wanted an initial review. i will circle back when it is fully ready still have further work to get done before im ready

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 21.60804% with 156 lines in your changes missing coverage. Please review.

Project coverage is 85.93%. Comparing base (5fb2a8e) to head (87d7b70).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
torch_geometric/nn/nlp/txt2kg.py 14.92% 114 Missing ⚠️
torch_geometric/nn/nlp/llm_judge.py 26.31% 28 Missing ⚠️
torch_geometric/loader/rag_loader.py 7.69% 12 Missing ⚠️
torch_geometric/nn/models/g_retriever.py 0.00% 1 Missing ⚠️
torch_geometric/nn/nlp/llm.py 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (21.60%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9846      +/-   ##
==========================================
+ Coverage   85.86%   85.93%   +0.06%     
==========================================
  Files         490      492       +2     
  Lines       32432    32621     +189     
==========================================
+ Hits        27847    28032     +185     
- Misses       4585     4589       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

puririshi98 and others added 25 commits January 27, 2025 15:06
no major diff but no decrease either

---------

Co-authored-by: riship <riship@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@puririshi98
Copy link
Contributor Author

accidentally merged wrong pr into this closing, making new

puririshi98 added a commit that referenced this pull request Apr 2, 2025
Successor to
[9666](#9666), this:
- ~~updates the documentation to show how to utilize GNN RAG and~~(now
handled by separate branch)
- updates WebQSP to help serve as a toy example for LargeGraphIndexer.
- fixes issues with LargeGraphIndexer running out of memory by
introducing a default batch size and multithreading ability

~~currently blocked by a bug that causes the g_retriever.py example to
get 1% less accuracy.~~ Bug is due to a fp32 precision issue related to
batch kernels in Huggingface's transformers. Performance difference is
too inconsequential to require a fix.

may also be the cause of low retrieval precision in
#9846

---------

Co-authored-by: Zack Aristei <zaristei@zaristei-nvidia.client.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zachary Aristei <zaristei@gmail.com>
Co-authored-by: Rishi Puri <puririshi98@berkeley.edu>
Co-authored-by: Rishi Puri <riship@nvidia.com>
Kh4L pushed a commit to Kh4L/pytorch_geometric that referenced this pull request Apr 15, 2025
improve pyg-team#9846

---------

Co-authored-by: riship <riship@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Kh4L pushed a commit to Kh4L/pytorch_geometric that referenced this pull request Apr 15, 2025
Successor to
[9666](pyg-team#9666), this:
- ~~updates the documentation to show how to utilize GNN RAG and~~(now
handled by separate branch)
- updates WebQSP to help serve as a toy example for LargeGraphIndexer.
- fixes issues with LargeGraphIndexer running out of memory by
introducing a default batch size and multithreading ability

~~currently blocked by a bug that causes the g_retriever.py example to
get 1% less accuracy.~~ Bug is due to a fp32 precision issue related to
batch kernels in Huggingface's transformers. Performance difference is
too inconsequential to require a fix.

may also be the cause of low retrieval precision in
pyg-team#9846

---------

Co-authored-by: Zack Aristei <zaristei@zaristei-nvidia.client.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zachary Aristei <zaristei@gmail.com>
Co-authored-by: Rishi Puri <puririshi98@berkeley.edu>
Co-authored-by: Rishi Puri <riship@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants