wrong tensor arrange for conv #1

inspirit · 2022-06-14T21:35:49Z

Hey!
looks like there is a mess in tensors arrangements in few place here:

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 140 in 610ebf2

q, k, v = rearrange_many((q, k, v), 'b (h d) n -> b h n d', h = self.heads)

does not make sense to go to 'b h n d' last dim is channels dim

than next

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 148 in 610ebf2

    
           projs = rearrange_many(projs.split(self.heads // self.groups, dim = 1), 'b h n d -> (b h) n d')

we split heads into groups and merge with batch and try to apply convolution that expects channels as second dim which is seq-len in our case now.

there is also

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 122 in 610ebf2

ds_convs.append(CausalDepthwiseConv1d(inner_dim, kernel_size))

setup of convolution layer to expect full inner_dim as input channels somehow

inspirit · 2022-06-18T17:08:51Z

grouped_sims never used actually :)

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 165 in 610ebf2

grouped_sims = torch.cat(grouped_sims, dim = 1)

also Alibi impl here has an issue with backward call complaining about missing/overwritten gradient, after replacing it with another impl from x-transformers repo it works fine.

causal d-convs should work on dim_head instead of inner_dim or may be on (heads//groups) * dim_head its not clear from the paper, and if we refer to Primer impl they seem to run d-convs after projecting to q-k-v but before splitting to heads...

lucidrains · 2022-06-19T02:37:10Z

@inspirit oh hey! haha, this repository was far from ready, but thanks for giving it an early review

d3944ef do you want to see if most (all?) the issues were addressed?

lucidrains · 2022-06-19T02:37:30Z

i wasn't able to replicate the ALiBi error, so if you have a short script i can run, i can definitely fix that as well!

inspirit · 2022-06-19T06:29:25Z

It seems the issue with ALiBi is due to few network forward calls before backward call, specifically i have

p0 = net(x)
with torch.no_grad():
    p = net(y)
loss = ....
loss.backward()

inspirit · 2022-06-19T09:31:44Z

mask needs to be moved to correct device here:

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 177 in a09cba2

causal_mask = torch.ones((i, j), dtype = torch.bool).triu(j - i + 1)

i also think its good to have it optional in case we dont need causual masking?

lucidrains · 2022-06-19T14:49:18Z

mask needs to be moved to correct device here:

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 177 in a09cba2

causal_mask = torch.ones((i, j), dtype = torch.bool).triu(j - i + 1)

i also think its good to have it optional in case we dont need causual masking?

sure! done in the latest!

so i'm still not seeing the ALiBi error

below is what i'm running

import torch
from tranception_pytorch import Tranception

model = Tranception(
    dim = 512,
    depth = 6,
    heads = 8,
    dim_head = 64
)

amino_acids = torch.randint(0, 21, (1, 512))

logits = model(amino_acids)

with torch.no_grad():
    _ = model(amino_acids)
    _ = model(amino_acids)
    _ = model(amino_acids)

logits.sum().backward()

inspirit · 2022-06-19T14:52:25Z

i noticed that you apply alibi here inside forward:

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 52 in 128400a

return qk_sim + self.bias[..., :i, :j]

however the interface of function assume it will not apply bias but will return it instead for manual addition in attention class:

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 61 in 128400a

return bias

and here we follow this idea:

tranception-pytorch/tranception_pytorch/tranception_pytorch.py

Line 174 in 128400a

    
           grouped_sims = [(alibi(sim_group) + sim_group) for alibi, sim_group in zip(self.learned_alibi_pos_biases, grouped_sims)]

manually adding returned bias

Update: yup, that seems to be an issue, need to return bias or dont accumulate it inside attention class

lucidrains · 2022-06-19T15:42:36Z

@inspirit thank you Eugene! bc50024

lucidrains · 2022-06-19T21:33:17Z

@inspirit the repository is still missing the retrieved MSA's contribution to the prediction but i'll get to that later next week!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong tensor arrange for conv #1

wrong tensor arrange for conv #1

inspirit commented Jun 14, 2022

inspirit commented Jun 18, 2022

lucidrains commented Jun 19, 2022

lucidrains commented Jun 19, 2022

inspirit commented Jun 19, 2022 •

edited

Loading

inspirit commented Jun 19, 2022 •

edited

Loading

lucidrains commented Jun 19, 2022

inspirit commented Jun 19, 2022 •

edited

Loading

lucidrains commented Jun 19, 2022

lucidrains commented Jun 19, 2022

wrong tensor arrange for conv #1

wrong tensor arrange for conv #1

Comments

inspirit commented Jun 14, 2022

inspirit commented Jun 18, 2022

lucidrains commented Jun 19, 2022

lucidrains commented Jun 19, 2022

inspirit commented Jun 19, 2022 • edited Loading

inspirit commented Jun 19, 2022 • edited Loading

lucidrains commented Jun 19, 2022

inspirit commented Jun 19, 2022 • edited Loading

lucidrains commented Jun 19, 2022

lucidrains commented Jun 19, 2022

inspirit commented Jun 19, 2022 •

edited

Loading

inspirit commented Jun 19, 2022 •

edited

Loading

inspirit commented Jun 19, 2022 •

edited

Loading