Multihead einmix #240

Aceticia · 2023-02-10T18:37:08Z

Aceticia
Feb 10, 2023

Hi, I'm working on an architecture that involves of lot of multi-head operations. For example, I would like to apply linear layers on a tensor of size batch, partition 1, partition 2, hiddens. Here I want all slices that have the same index for partition 1 and partition 2 to be applied the same weight matrix of size hiddens, hiddens, and vice versa. What I do now is just initialize a bunch of weight and bias tensor and use torch.addmm. It works but is very ugly code.

I'd love to use einmix here but it looks like this functionality is not present / not trivial to work out from the doc. Is this correct?

arogozhnikov · 2023-02-10T21:49:48Z

arogozhnikov
Feb 10, 2023
Maintainer

mmm, requirement is unclear. What should be done to slices that have different partition1 and partition2?

So far I understand you requirement as:

take diagonal (coinciding partitions)
einsum(x, 'b p p c -> b p c')
apply EinMix('b p c -> b p d', weight_shape='p c d', ...)

1 reply

Aceticia Feb 11, 2023
Author

Sorry about the confusion. My intention is to contain the following code in a single operation:

# Creating a moduledict
n1, n2 , c = 8,  16, 64
mixes = nn.ModuleDict({})
for p1 in range(n1):
    for p2 in range(n2):
        mixes[f"{p1}-{p2}"] = EinMix("b c -> b c0", weight_shape="c c0", bias_shape="c0", c0=c, c=c)

# In forward, using a mock input
b = 8
mock_input = torch.randn(b, n1, n2, c)
for p1 in range(n1):
    for p2 in range(n2):
        mock_input[:, p1, p2] = mixes[f"{p1}-{p2}"](mock_input[:, p1, p2])

I'm wondering whether there is a way to write the above code in something like the following:

# Creating an einmix
mix = EinMix("b n1 n2 c -> b n1 n2 c0", weight_shape="c c0", bias_shape="c0", c0=c, c=c, repeat_weight_dimension="n1 n2")

# Forward
b = 8
mock_input = torch.randn(b, n1, n2, c)
mock_input = mix(mock_input)

Please let me know if there is still any confusions. Thank you for the fast reply!

Aceticia · 2023-02-11T21:58:01Z

Aceticia
Feb 11, 2023
Author

From your response I'm guessing I can do the following, is this correct?

a = torch.randn(8, 6, 4, 32)
m = EinMix("b n1 n2 c -> b n1 n2 c0", weight_shape="n1 n2 c c0", bias_shape="c0", c=32, c0=32, n1=6, n2=4)
assert m(a).shape == torch.Size([8, 6, 4, 32])

0 replies

arogozhnikov · 2023-02-11T22:27:05Z

arogozhnikov
Feb 11, 2023
Maintainer

Yes, that looks close. Maybe you also want bias_shape = 'n1 n2 c0' so bias was individual for every partition.

…

On Sat, 11 Feb 2023, 13:58 Xujin Chris Liu, ***@***.***> wrote: From your response I'm guessing I can do the following, is this correct? a = torch.randn(8, 6, 4, 32)m = EinMix("b n1 n2 c -> b n1 n2 c0", weight_shape="n1 n2 c c0", bias_shape="c0", c=32, c0=32, n1=6, n2=4)assert m(a).shape == torch.Size([8, 6, 4, 32]) — Reply to this email directly, view it on GitHub <#240 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABQGVWYT3J2Z527SNDLJGILWXADPHANCNFSM6AAAAAAUYDPGLI> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

Aceticia Feb 11, 2023
Author

Ah you are right. Thank you so much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multihead einmix #240

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Multihead einmix #240

Aceticia Feb 10, 2023

Replies: 3 comments · 2 replies

arogozhnikov Feb 10, 2023 Maintainer

Aceticia Feb 11, 2023 Author

Aceticia Feb 11, 2023 Author

arogozhnikov Feb 11, 2023 Maintainer

Aceticia Feb 11, 2023 Author

Aceticia
Feb 10, 2023

Replies: 3 comments 2 replies

arogozhnikov
Feb 10, 2023
Maintainer

Aceticia Feb 11, 2023
Author

Aceticia
Feb 11, 2023
Author

arogozhnikov
Feb 11, 2023
Maintainer

Aceticia Feb 11, 2023
Author