LinkNeighborLoader training edge sampling/leave in information #8295
Replies: 2 comments 1 reply
-
This is the right observation. For this, you would need to take care that |
Beta Was this translation helpful? Give feedback.
-
@rusty1s Can you explain more how to use
Here is my results:
3 Update: It's my fault when I pass all |
Beta Was this translation helpful? Give feedback.
-
Good day all. I have a question concerning the details behind some of the mini-batch loaders, in particular LinkNeighborLoader.In this case my task is link prediction in the heterogeneous setting.
As a relatively new user, I am not familiar with some of the details behind these minibatchers. LinkNeighborLoader looks like it is an exxpanded version of NieghborSampler for the nodes involved in edges allows you to sample edges from the graphs. It seems like we get a batch of edges, then sample the neighbors of the graph in the same way the GraphSage work was done in 2018.
In my case I am using an Undirected heterogeneous graph, (all edges contain rev-edges) as well.
My questions are 2-fold in the simple case. I ran a simple experiment using the following . We are seeking to make a prediction on a training edge, which i denote p(L).
so we set up a loader for training
When we grab a batch from this using
we notice an overlap in the global edge ids of these objects, that is that the intersection of inpid and eid, is significant and nonempty.
This is a problem for my link prediction as I would of course in application never have this information a priori.
furthermore we observe the same behavior in the training loop
In this case it seems incorrect to leave in that training edge when prediction on the training edge, as any function f(G) -> p(L) would be susceptible to over-fit to this edges existence in training. To this end I would like to implement my batch loader to sample edges from my training data, but not include these edges in the message passing for a type of GCN. Im sure there is some simple way to do this, I think im just to new to the problem.
Since I am working in the heterogeneous setting, I have rev_edges left in the graph. These rev_edges would also contain the leave in information. Again when we sample from some edge we want to predict on is it possible we could sample from this rev_edge (obviously in this case my sampled neighborhood is larger the the first hop, as it would require bi-directionality to find this).
I am sure I am missing something fundamental about how LinkNeighborLoader works, thanks all for your time.
Beta Was this translation helpful? Give feedback.
All reactions