## Overview

The `torchvision`

package encapsulates a number of typical DL models such as ResNet, VGG and YOLO, which are enlisted in different specs. I came across this problem with some of my seniors so I started to discuss with them.

The ResNet model consists of these parts:

**Skip connections** allows the neural network to go deeper without getting problems like vanished gradients. For example, `conv2`

block of ResNet-50, three convolutional layers with 64 kernels are stacked together. Be noted that **skip connections** are effective between **every 2 or 3 convolutional layers (or, within the [ ])**. Therefore, for instance, the ResNet-50 model has 3+4+6+3=16 skip connections.

**Downsampling** is the act implemented on skip connections. Usually there is only an "add" operation, however, there can be more options – e.g. add some simple layers to match the input/output sizes.

Moreover, be noted that `max pooling`

only appears in the first "preprocessing" layers. From my opinion that means several structures can be removed from there and be done with it.

To summarize, here is the computational graph for ResNet-34 compared to non-residual models. Pay attention to the range of skip connections, and also the residual block for any other parts of the screen. Also be noted that several other parts are included in the target.

For more details, please continue reading. The most important part is at the last section: ResNet model.

## Code Review

### Basic Components

As shown in the table, 3×3 convolutional layers are the basic component of this network. Therefore, we have to use the following component.

```
def conv3x3(in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)
```

Be noted that:

`in_planes`

and`out_planes`

both means the number of dimensions.`bias`

is disabled for 3×3 networks. Why?

### Residual Blocks for ResNet-18 and ResNet-34

The basic blocks are used by ResNet-18 and ResNet-34.

```
class BasicBlock(nn.Module):
# Basic facts: the basic block DOES NOT expand the dimension
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
# the 3x3 convolution transforms the dimension
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
# inplace: modify the input directly instead of gen another result inst
self.relu = nn.ReLU(inplace=True)
# downsample is customized
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
# why without ReLU ??
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
```

Remaining question: the basic block consists of a ful conv1 series and a conv2 series **WITHOUT ReLU activation**. Why??

### Residual Blocks for ResNet-50 and so

The bottleneck is used by ResNet-50 and ResNets with more layers.

```
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
# the last conv of its kind do the expansion
self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
```

Remaining question: the basic block consists of a conv1 series and a conv2 series **WITHOUT ReLU activation**. Why??

### The ResNet class

#### The initialier

The initializer defines several facts of the native ResNet:

- Softmax result range: 1000.
- Initial output dimension: 64
`padding`

: equivalent to using`same`

padding.

The arguments explained:

`block`

: either`BasicBlock`

or`Bottleneck`

instance`layers`

: the custom downsample layers for the skip connection.

```
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
self.inplanes = 64
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1)
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
```

#### Overridden `forward(input_)`

The `.view()`

function can be something like `Flatten()(input_)`

in Keras. Check out the structure table again and remember the commonality of the model, there are always 4 layers.

```
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
```

Look at the graph again and find it out.

Since for many of the classic models, the input size is always `224x224`

, let’s deduct the size for each of the layers:

- since the padding is equivalent to
`same`

, only the stride determines the output size. `conv`

reduced the input side length to by half- so does the
`max pool`

Why do we need the `max pool`

to change the size. Let’s continue with it to answer this question.

- Since the blocks DO NOT include any pooling, there will be no downsample inside the [ ] structure.
- However, between every layers. The output size is decreased to 1/4 while the dimension doubles. That is why we can apply the learning of the identity function – they are of the same size!
- This shows that the downsampling happens inside the blocks actually. However, due to skip connections, the gradient problem can be greatly relieved to allow better accuracy and better exploit of the depth.

#### _make_layer(): auto-completion of downsample

This is the most tricky part of the official implementation. Let’s look at the parameters of this method first:

`block`

: either`BasicBlock`

or`Bottleneck`

instance`planes`

: input dimension`blocks`

: customized layers

```
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
```

Why do we need to replace the downsample? Because when switching the layers, we will notice that both the input size and the dimensions mismatch the previous layer (for example, from 64, 64, 256 to 128, 128, 512). We can easily change an input to an output of any dimensions, however, the residual block requires the input to match the same size and dimension of output, which is an exception when switching layers. Therefore, we need a downsample to make the input and the output consistent.

In Keras, instead of using conditional branch to add this downsample, they make another type of blocks.

Here’s the code for ResNet-50 in Keras. Note that the identity blocks are equivalent to `bottleneck`

and the `convolutional blocks`

apply a downsample connection to the block. Thus allowing better connections.

```
# code for layer-1
x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2,
2)):
filters1, filters2, filters3 = filters
if K.image_data_format() == 'channels_last':
bn_axis = 3
else:
bn_axis = 1
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
x = Conv2D(
filters1, (1, 1), strides=strides, name=conv_name_base + '2a')(
input_tensor)
x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
x = Activation('relu')(x)
x = Conv2D(
filters2, kernel_size, padding='same', name=conv_name_base + '2b')(
x)
x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
x = Activation('relu')(x)
x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)
shortcut = Conv2D(
filters3, (1, 1), strides=strides, name=conv_name_base + '1')(
input_tensor)
shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + '1')(shortcut)
x = layers.add([x, shortcut])
x = Activation('relu')(x)
return x
```

#### Final

How each type of ResNet is constructed

```
def resnet18(pretrained=False, **kwargs):
"""Constructs a ResNet-18 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
return model
def resnet34(pretrained=False, **kwargs):
"""Constructs a ResNet-34 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
return model
def resnet50(pretrained=False, **kwargs):
"""Constructs a ResNet-50 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
return model
def resnet101(pretrained=False, **kwargs):
"""Constructs a ResNet-101 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
return model
def resnet152(pretrained=False, **kwargs):
"""Constructs a ResNet-152 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
return model
```