Overview

The torchvision package encapsulates a number of typical DL models such as ResNet, VGG and YOLO, which are enlisted in different specs. I came across this problem with some of my seniors so I started to discuss with them.

The ResNet model consists of these parts:

tables

Skip connections allows the neural network to go deeper without getting problems like vanished gradients. For example, conv2 block of ResNet-50, three convolutional layers with 64 kernels are stacked together. Be noted that skip connections are effective between every 2 or 3 convolutional layers (or, within the [ ]). Therefore, for instance, the ResNet-50 model has 3+4+6+3=16 skip connections.

Downsampling is the act implemented on skip connections. Usually there is only an "add" operation, however, there can be more options – e.g. add some simple layers to match the input/output sizes.

Moreover, be noted that max pooling only appears in the first "preprocessing" layers. From my opinion that means several structures can be removed from there and be done with it.

To summarize, here is the computational graph for ResNet-34 compared to non-residual models. Pay attention to the range of skip connections, and also the residual block for any other parts of the screen. Also be noted that several other parts are included in the target.

For more details, please continue reading. The most important part is at the last section: ResNet model.

img

Code Review

Basic Components

As shown in the table, 3×3 convolutional layers are the basic component of this network. Therefore, we have to use the following component.

def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)



Be noted that:

  • in_planes and out_planes both means the number of dimensions.
  • bias is disabled for 3×3 networks. Why?

Residual Blocks for ResNet-18 and ResNet-34

The basic blocks are used by ResNet-18 and ResNet-34.

class BasicBlock(nn.Module):
    # Basic facts: the basic block DOES NOT expand the dimension
    expansion = 1
 
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        # the 3x3 convolution transforms the dimension
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        
        # inplace: modify the input directly instead of gen another result inst
        self.relu = nn.ReLU(inplace=True)  
        
        # downsample is customized
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride
 
    def forward(self, x):
        residual = x
 
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
 
		# why without ReLU ??
        out = self.conv2(out)
        out = self.bn2(out)
 
        if self.downsample is not None:
            residual = self.downsample(x)
 
        out += residual
        out = self.relu(out)
 
		return out


Remaining question: the basic block consists of a ful conv1 series and a conv2 series WITHOUT ReLU activation. Why??

Residual Blocks for ResNet-50 and so

The bottleneck is used by ResNet-50 and ResNets with more layers.

class Bottleneck(nn.Module):
    expansion = 4
 
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        
        # the last conv of its kind do the expansion
        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        
        self.downsample = downsample
        self.stride = stride
 
    def forward(self, x):
        residual = x
 
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
 
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
 
        out = self.conv3(out)
        out = self.bn3(out)
 
        if self.downsample is not None:
            residual = self.downsample(x)
 
        out += residual
        out = self.relu(out)
 
        return out

Remaining question: the basic block consists of a conv1 series and a conv2 series WITHOUT ReLU activation. Why??

The ResNet class

The initialier

The initializer defines several facts of the native ResNet:

  • Softmax result range: 1000.
  • Initial output dimension: 64
  • padding: equivalent to using same padding.

The arguments explained:

  • block: either BasicBlock or Bottleneck instance
  • layers: the custom downsample layers for the skip connection.
class ResNet(nn.Module):
 
    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)
 
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
 
    

Overridden forward(input_)

The .view() function can be something like Flatten()(input_) in Keras. Check out the structure table again and remember the commonality of the model, there are always 4 layers.

 
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
 
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
 
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
 
        return x

Look at the graph again and find it out.

tables

Since for many of the classic models, the input size is always 224x224, let’s deduct the size for each of the layers:

  • since the padding is equivalent to same, only the stride determines the output size.
  • conv reduced the input side length to by half
  • so does the max pool

Why do we need the max pool to change the size. Let’s continue with it to answer this question.

  • Since the blocks DO NOT include any pooling, there will be no downsample inside the [ ] structure.
  • However, between every layers. The output size is decreased to 1/4 while the dimension doubles. That is why we can apply the learning of the identity function – they are of the same size!
  • This shows that the downsampling happens inside the blocks actually. However, due to skip connections, the gradient problem can be greatly relieved to allow better accuracy and better exploit of the depth.

_make_layer(): auto-completion of downsample

This is the most tricky part of the official implementation. Let’s look at the parameters of this method first:

  • block: either BasicBlock or Bottleneck instance
  • planes: input dimension
  • blocks: customized layers
def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )
 
        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))
 
        return nn.Sequential(*layers)

Why do we need to replace the downsample? Because when switching the layers, we will notice that both the input size and the dimensions mismatch the previous layer (for example, from 64, 64, 256 to 128, 128, 512). We can easily change an input to an output of any dimensions, however, the residual block requires the input to match the same size and dimension of output, which is an exception when switching layers. Therefore, we need a downsample to make the input and the output consistent.

In Keras, instead of using conditional branch to add this downsample, they make another type of blocks.

Here’s the code for ResNet-50 in Keras. Note that the identity blocks are equivalent to bottleneck and the convolutional blocks apply a downsample connection to the block. Thus allowing better connections.

  # code for layer-1
  x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
  x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
  x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')


def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2,
                                                                          2)):
  filters1, filters2, filters3 = filters
  if K.image_data_format() == 'channels_last':
    bn_axis = 3
  else:
    bn_axis = 1
  conv_name_base = 'res' + str(stage) + block + '_branch'
  bn_name_base = 'bn' + str(stage) + block + '_branch'

  x = Conv2D(
      filters1, (1, 1), strides=strides, name=conv_name_base + '2a')(
          input_tensor)
  x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
  x = Activation('relu')(x)

  x = Conv2D(
      filters2, kernel_size, padding='same', name=conv_name_base + '2b')(
          x)
  x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
  x = Activation('relu')(x)

  x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
  x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)

  shortcut = Conv2D(
      filters3, (1, 1), strides=strides, name=conv_name_base + '1')(
          input_tensor)
  shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + '1')(shortcut)

  x = layers.add([x, shortcut])
  x = Activation('relu')(x)
  return x

Final

How each type of ResNet is constructed

def resnet18(pretrained=False, **kwargs):
    """Constructs a ResNet-18 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
    return model
 
 
def resnet34(pretrained=False, **kwargs):
    """Constructs a ResNet-34 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
    return model
 
 
def resnet50(pretrained=False, **kwargs):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
    return model
 
 
def resnet101(pretrained=False, **kwargs):
    """Constructs a ResNet-101 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
    return model
 
 
def resnet152(pretrained=False, **kwargs):
    """Constructs a ResNet-152 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
    return model

Leave a comment

Your email address will not be published. Required fields are marked *