Draft: `TXT2KG` w/ `hotpot_qa.py` and `tech_qa.py` examples #9846

puririshi98 · 2024-12-11T22:06:02Z

closed, new version: #9992

puririshi98 · 2024-12-11T22:07:28Z

@Kh4L reviews welcome

improve #9846 --------- Co-authored-by: riship <riship@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Kh4L

A few nitpicks, but LGTM!

torch_geometric/nn/nlp/txt2kg.py

torch_geometric/utils/rag/backend_utils.py

examples/llm/hotpot_qa.py

torch_geometric/nn/nlp/txt2kg.py

akihironitta

Great work! 🚀 I may have missed some past PRs that were already merged, but I'm sharing my thoughts here anyway :)

I think we might want to make sure that anything goes under torch_geometric/ is general enough, well documented, well tested so that it's reusable by users outside these example scripts. If these utils under torch_geometric/ are not general enough, users might end up copying the code into their own scripts instead of directly relying on torch_geometric/.

If the intention is for these LLMs+GNNs additions to serve as reference implementations (that users can tweak as needed), it'd make more sense to include them as examples rather than integrating them into torch_geometric/.

torch_geometric/nn/nlp/llm.py

torch_geometric/nn/nlp/txt2kg.py

puririshi98 · 2025-01-07T01:21:05Z

"I think we might want to make sure that anything goes under torch_geometric/ is general enough, well documented, well tested so that it's reusable by users outside these example scripts"

I totally agree btw, this is still a draft. i aim to have full docstrings etc and polish this up once its ready, just wanted an initial review. i will circle back when it is fully ready still have further work to get done before im ready

codecov · 2025-01-10T05:09:45Z

Codecov Report

Attention: Patch coverage is 21.60804% with 156 lines in your changes missing coverage. Please review.

Project coverage is 85.93%. Comparing base (5fb2a8e) to head (87d7b70).
Report is 9 commits behind head on master.

Files with missing lines	Patch %	Lines
torch_geometric/nn/nlp/txt2kg.py	14.92%	114 Missing ⚠️
torch_geometric/nn/nlp/llm_judge.py	26.31%	28 Missing ⚠️
torch_geometric/loader/rag_loader.py	7.69%	12 Missing ⚠️
torch_geometric/nn/models/g_retriever.py	0.00%	1 Missing ⚠️
torch_geometric/nn/nlp/llm.py	0.00%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (21.60%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9846      +/-   ##
==========================================
+ Coverage   85.86%   85.93%   +0.06%     
==========================================
  Files         490      492       +2     
  Lines       32432    32621     +189     
==========================================
+ Hits        27847    28032     +185     
- Misses       4585     4589       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for more information, see https://pre-commit.ci

…eometric into rebase-txt2kg

for more information, see https://pre-commit.ci

…eometric into rebase-txt2kg

for more information, see https://pre-commit.ci

…eometric into rebase-txt2kg

for more information, see https://pre-commit.ci

…eometric into rebase-txt2kg

no major diff but no decrease either --------- Co-authored-by: riship <riship@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

puririshi98 · 2025-01-29T21:45:28Z

accidentally merged wrong pr into this closing, making new

Successor to [9666](#9666), this: - ~~updates the documentation to show how to utilize GNN RAG and~~(now handled by separate branch) - updates WebQSP to help serve as a toy example for LargeGraphIndexer. - fixes issues with LargeGraphIndexer running out of memory by introducing a default batch size and multithreading ability ~~currently blocked by a bug that causes the g_retriever.py example to get 1% less accuracy.~~ Bug is due to a fp32 precision issue related to batch kernels in Huggingface's transformers. Performance difference is too inconsequential to require a fix. may also be the cause of low retrieval precision in #9846 --------- Co-authored-by: Zack Aristei <zaristei@zaristei-nvidia.client.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <zaristei@gmail.com> Co-authored-by: Rishi Puri <puririshi98@berkeley.edu> Co-authored-by: Rishi Puri <riship@nvidia.com>

improve pyg-team#9846 --------- Co-authored-by: riship <riship@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Successor to [9666](pyg-team#9666), this: - ~~updates the documentation to show how to utilize GNN RAG and~~(now handled by separate branch) - updates WebQSP to help serve as a toy example for LargeGraphIndexer. - fixes issues with LargeGraphIndexer running out of memory by introducing a default batch size and multithreading ability ~~currently blocked by a bug that causes the g_retriever.py example to get 1% less accuracy.~~ Bug is due to a fp32 precision issue related to batch kernels in Huggingface's transformers. Performance difference is too inconsequential to require a fix. may also be the cause of low retrieval precision in pyg-team#9846 --------- Co-authored-by: Zack Aristei <zaristei@zaristei-nvidia.client.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <zaristei@gmail.com> Co-authored-by: Rishi Puri <puririshi98@berkeley.edu> Co-authored-by: Rishi Puri <riship@nvidia.com>

puririshi98 requested review from rusty1s and akihironitta December 11, 2024 22:06

puririshi98 requested review from wsad1 and EdisonLeeeee as code owners December 11, 2024 22:06

github-actions bot added nn example utils labels Dec 11, 2024

puririshi98 mentioned this pull request Dec 11, 2024

Draft: torch_geometric.nn.nlp.TXT2KG and examples/hotpot_qa.py for recall/precision eval #9728

Closed

puririshi98 changed the title ~~Draft: TXT2KG w/ hotpot_qa.py` example for precision estimation~~ Draft: TXT2KG w/ hotpot_qa.py example for precision estimation Dec 11, 2024

puririshi98 mentioned this pull request Dec 11, 2024

Improve system prompt for TXT2KG #9848

Merged

puririshi98 added a commit that referenced this pull request Dec 12, 2024

Improve system prompt for TXT2KG (#9848)

747f0ea

improve #9846 --------- Co-authored-by: riship <riship@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Kh4L suggested changes Dec 13, 2024

View reviewed changes

torch_geometric/nn/nlp/txt2kg.py Outdated Show resolved Hide resolved

torch_geometric/nn/nlp/txt2kg.py Show resolved Hide resolved

torch_geometric/utils/rag/backend_utils.py Outdated Show resolved Hide resolved

examples/llm/hotpot_qa.py Outdated Show resolved Hide resolved

Kh4L suggested changes Dec 15, 2024

View reviewed changes

torch_geometric/nn/nlp/txt2kg.py Outdated Show resolved Hide resolved

akihironitta requested changes Dec 28, 2024

View reviewed changes

torch_geometric/nn/nlp/llm.py Outdated Show resolved Hide resolved

torch_geometric/nn/nlp/txt2kg.py Outdated Show resolved Hide resolved

torch_geometric/nn/nlp/txt2kg.py Outdated Show resolved Hide resolved

puririshi98 mentioned this pull request Jan 7, 2025

Large Graph Indexer WebQSP Refactor #9806

Merged

puririshi98 requested review from mananshah99 and a team as code owners January 9, 2025 20:15

github-actions bot added loader dataset data labels Jan 9, 2025

puririshi98 added 7 commits January 9, 2025 22:43

speedup indexing

703c83d

speedup indexing

538eeba

speedup indexing

f1a3e7a

speedup indexing

9be12dc

speedup indexing

3a0bee2

speedup indexing

1cc5091

debug

845f186

puririshi98 and others added 25 commits January 27, 2025 15:06

drafting

15fe2bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

941b745

for more information, see https://pre-commit.ci

drafting

473577d

Merge branch 'rebase-txt2kg' of https://github.com/pyg-team/pytorch_g…

0c664f1

…eometric into rebase-txt2kg

[pre-commit.ci] auto fixes from pre-commit.com hooks

055527b

for more information, see https://pre-commit.ci

drafting

77d5951

Merge branch 'rebase-txt2kg' of https://github.com/pyg-team/pytorch_g…

db4a24a

…eometric into rebase-txt2kg

drafting

d99a23c

drafting

c4400f1

drafting

10da58b

[pre-commit.ci] auto fixes from pre-commit.com hooks

b977e9a

for more information, see https://pre-commit.ci

cleaning

5240ca0

Merge branch 'rebase-txt2kg' of https://github.com/pyg-team/pytorch_g…

4a2e86b

…eometric into rebase-txt2kg

cleaning

c3654c8

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b8f00f

for more information, see https://pre-commit.ci

cleaning

7bc4bcb

[pre-commit.ci] auto fixes from pre-commit.com hooks

f6c81f1

for more information, see https://pre-commit.ci

cleaning

906e82c

Merge branch 'rebase-txt2kg' of https://github.com/pyg-team/pytorch_g…

cf6162c

…eometric into rebase-txt2kg

cleaning

a2ea3cd

cleaning

8883238

cleaning

11194ee

cleaning

ea15dc9

cleaning

87d7b70

Nim sent transformer and better comments for senttrans (#9990)

5d9b34e

no major diff but no decrease either --------- Co-authored-by: riship <riship@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

puririshi98 closed this Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: `TXT2KG` w/ `hotpot_qa.py` and `tech_qa.py` examples #9846

Draft: `TXT2KG` w/ `hotpot_qa.py` and `tech_qa.py` examples #9846

Uh oh!

puririshi98 commented Dec 11, 2024 •

edited

Loading

Uh oh!

puririshi98 commented Dec 11, 2024

Uh oh!

Kh4L left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akihironitta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puririshi98 commented Jan 7, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jan 10, 2025 •

edited

Loading

Uh oh!

puririshi98 commented Jan 29, 2025

Uh oh!

Uh oh!

Draft: TXT2KG w/ hotpot_qa.py and tech_qa.py examples #9846

Draft: TXT2KG w/ hotpot_qa.py and tech_qa.py examples #9846

Uh oh!

Conversation

puririshi98 commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puririshi98 commented Dec 11, 2024

Uh oh!

Kh4L left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akihironitta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puririshi98 commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

puririshi98 commented Jan 29, 2025

Uh oh!

Uh oh!

Draft: `TXT2KG` w/ `hotpot_qa.py` and `tech_qa.py` examples #9846

Draft: `TXT2KG` w/ `hotpot_qa.py` and `tech_qa.py` examples #9846

puririshi98 commented Dec 11, 2024 •

edited

Loading

puririshi98 commented Jan 7, 2025 •

edited

Loading

codecov bot commented Jan 10, 2025 •

edited

Loading