Default upsample kernel size #2187

aylward · 2020-02-19T17:54:55Z

aylward
Feb 19, 2020
Maintainer

HERE the default upsample kernel size is 3, which is fine when num_res_units is not 0, however when num_res_units is 0 (which is the default) I believe a better default for up_kernel_size is 4 (at least in the 2-D case)

Issue reported by @HastingsGreer @brad-t-moore @floryst

ericspod · 2020-03-06T14:30:02Z

ericspod
Mar 6, 2020
Maintainer

An up_kernel_size value of 4 isn't currently compatible with how we calculate the needed padding value for convolutions. This will in UNet cause output volumes to be 1 too small in every spatial dimension. Is there a specific reasoning behind kernel size 4 for transposed convolutions?

The following will induce this error:

net = monai.networks.nets.UNet(
    dimensions=3,
    in_channels=1,
    num_classes=1,
    channels=(16, 32, 64),
    strides=(2, 2),
    num_res_units=0,
    up_kernel_size=4
)

net(torch.rand(5, 1, 96, 96, 96))

0 replies

HastingsGreer · 2020-03-07T00:33:53Z

HastingsGreer
Mar 7, 2020

In my experience, when each resolution in the upsampling half of the U-Net consists of only a concatenation and a stride 2 kernel size 3 convolution, artifacts often arise in the image because some pixels don't depend on enough pixels in the previous layer. Specifically, 1/9 of each pixel in each layer only depends on a single pixel in the previous layer. Because each of these pixels has another pixel that depends only on it, a pixel in the final layer that depends only on a 1 pixel wide 'column' that extends all the way up to the 1x1 layer, 4 pixels that each come from a pixel in the 2x2 layer, etc. These of course also depend on the information coming over from the downsampling half through the skip connections, but it hurts performance that they cannot model interactions in the upsampling half.

0 replies

ericspod · 2020-03-25T15:34:13Z

ericspod
Mar 25, 2020
Maintainer

The issue with even kernel sizes is that it's much harder to produce an output of the expected size because of the need for asymmetric padding of the input volume. Currently with odd kernels for transpose convolutions we automatically choose a padding such that the output dimensions are the input dimensions multiplied by the stride. For regular convolutions the output dimensions are the input dimensions divided by the stride and rounded. This produces regular behaviour in the UNet which lets us define it programmatically. If we use even sized kernels, to do this we need to produce an oversized output and crop it as needed, so the Convolution class would now appear as such:

def expand_dimensionally(value,dimensions):
    if not isinstance(value,np.ndarray):
        value=np.asarray([value]*dimensions)
        
    return value


def same_clipping(dimensions,kernel_size,strides,dilation):
    kernel_size = expand_dimensionally(kernel_size,dimensions)
    
    if np.min(kernel_size%2)!=0: # all odd kernel sizes
        return None
    
    clip_dims=[slice(None),slice(None)]
    
    strides=expand_dimensionally(strides,dimensions)
    dilation=expand_dimensionally(dilation,dimensions)
    
    clip_dims+=[slice(0,-1 if k%2==0  and d%2==1 else None) for k,s,d in zip(kernel_size,strides,dilation)]
   
    return clip_dims

class Convolution(nn.Module):

    def __init__(self, dimensions, in_channels, out_channels, strides=1, kernel_size=3, instance_norm=True, dropout=0,
                 dilation=1, bias=True, conv_only=False, is_transposed=False):
        super().__init__()
        self.dimensions = dimensions
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.is_transposed = is_transposed
        self.net=nn.Sequential()

        padding = same_padding(kernel_size, dilation)
        normalize_type = get_normalize_type(dimensions, instance_norm)
        conv_type = get_conv_type(dimensions, is_transposed)
        drop_type = get_dropout_type(dimensions)
        
        self.clip_dims=same_clipping(dimensions,kernel_size,strides,dilation)

        if is_transposed:
            out_padding=strides - 1
            
            if np.any(kernel_size%2==0):
                padding=[p-1 for p in expand_dimensionally(padding,dimensions)]
            
            conv = conv_type(in_channels, out_channels, kernel_size, strides, padding, out_padding, 1, bias, dilation)
        else:
            conv = conv_type(in_channels, out_channels, kernel_size, strides, padding, dilation, bias=bias)

        self.net.add_module("conv", conv)

        if not conv_only:
            self.net.add_module("norm", normalize_type(out_channels))
            if dropout > 0:  
                self.net.add_module("dropout", drop_type(dropout))

            self.net.add_module("prelu", nn.modules.PReLU())
            
    def forward(self,x):
        out=self.net(x)
        if self.clip_dims:
            out=out[self.clip_dims]
        return out

What we're doing differently here is choosing a clip value self.clip_dims then cropping the the output volume to the expected size. Does this seem like a good idea or does cropping the output have negative consequences?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default upsample kernel size #2187

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Default upsample kernel size #2187

aylward Feb 19, 2020 Maintainer

Replies: 3 comments

ericspod Mar 6, 2020 Maintainer

HastingsGreer Mar 7, 2020

ericspod Mar 25, 2020 Maintainer

aylward
Feb 19, 2020
Maintainer

ericspod
Mar 6, 2020
Maintainer

HastingsGreer
Mar 7, 2020

ericspod
Mar 25, 2020
Maintainer