Torch手札4:灰度图着色核心代码小记

Posted by Kriz on 2017-06-08

这一部分主要介绍一下modelstrain这两个核心代码文件的内容。这两个文件我在自己的项目里做了适配化的修改,在这篇文章里只提供各参考项目的代码。如果遇到不太懂的地方,可以先在之前的系列中寻找答案:

Torch手札1:安装及基础使用(以XOR问题为例)

Torch手札2:一小时Torch入门

Torch手札3:拆两个图像处理的轮子

models

仍然一步一步来做拆解。

首先是引库,建立一个空表,命名为M。

1
2
3
4
require 'nn'
require 'ShaveImage'
local M = {}

接下来,定义一个函数build_conv_block,用来构建一个卷积块(包含卷积及一系列操作的多个层)。

1
2
3
local function build_conv_block(dim, padding_type)
-- network
end

建立一个容器conv_block,做一次padding后进行卷积操作,并接一层BN:

1
2
3
4
5
6
7
8
9
10
11
local conv_block = nn.Sequential()
local p = 0
if padding_type == 'reflect' then
conv_block:add(nn.SpatialReflectionPadding(1, 1, 1, 1))
elseif padding_type == 'replicate' then
conv_block:add(nn.SpatialReplicationPadding(1, 1, 1, 1))
elseif padding_type == 'zero' then
p = 1
end
conv_block:add(nn.SpatialConvolution(dim, dim, 3, 3, 1, 1, p, p))
conv_block:add(nn.SpatialBatchNormalization(dim))

激活函数:

1
conv_block:add(nn.ReLU(true))

再做一次卷积,接一次BN:

1
2
3
4
5
6
7
8
if padding_type == 'reflect' then
conv_block:add(nn.SpatialReflectionPadding(1, 1, 1, 1))
elseif padding_type == 'replicate' then
conv_block:add(nn.SpatialReplicationPadding(1, 1, 1, 1))
end
conv_block:add(nn.SpatialConvolution(dim, dim, 3, 3, 1, 1, p, p))
conv_block:add(nn.SpatialBatchNormalization(dim))
return conv_block

这就是完整的一个卷积块的内容了。

接下来定义残差块。

这个函数中的主体是一个Sequential容器,它由concatnn.CAddTable两部分一前一后组成。

而concat是一个ConcatTable容器,由一个conv_block和一个传递输入的nn.Identity或修正图像尺寸的nn.ShaveImage,总共两个Module构成。

1
2
3
4
5
6
7
8
9
10
11
12
13
local function build_res_block(dim, padding_type)
local conv_block = build_conv_block(dim, padding_type)
local res_block = nn.Sequential()
local concat = nn.ConcatTable()
concat:add(conv_block)
if padding_type == 'none' or padding_type == 'reflect-start' then
concat:add(nn.ShaveImage(2))
else
concat:add(nn.Identity())
end
res_block:add(concat):add(nn.CAddTable())
return res_block
end

可以配合这张经典的示意图来理解:

rn

最后这一部分就是网络模型建立相关了。这里对各个层次的调用进行定义,实际的网络构建是在train.lua中完成的。首先贴出完整的代码来整体浏览一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
function M.build_model(opt)
local arch = opt.arch:split(',')
local prev_dim = 1
local model = nn.Sequential()
for i, v in ipairs(arch) do
local first_char = string.sub(v, 1, 1)
local layer, next_dim
local needs_relu = true
local needs_bn = true
if first_char == 'c' then
-- Convolution
local f = tonumber(string.sub(v, 2, 2)) -- filter size
local p = (f - 1) / 2 -- padding
local s = tonumber(string.sub(v, 4, 4)) -- stride
next_dim = tonumber(string.sub(v, 6))
if opt.padding_type == 'reflect' then
model:add(nn.SpatialReflectionPadding(p, p, p, p))
p = 0
elseif opt.padding_type == 'replicate' then
model:add(nn.SpatialReplicationPadding(p, p, p, p))
p = 0
elseif padding_type == 'none' then
p = 0
end
layer = nn.SpatialConvolution(prev_dim, next_dim, f, f, s, s, p, p)
elseif first_char == 'f' then
-- Full convolution
local f = tonumber(string.sub(v, 2, 2)) -- filter size
local p = (f - 1) / 2 -- padding
local s = tonumber(string.sub(v, 4, 4)) -- stride
local a = s - 1 -- adjustment
next_dim = tonumber(string.sub(v, 6))
layer = nn.SpatialFullConvolution(prev_dim, next_dim,
f, f, s, s, p, p, a, a)
elseif first_char == 'd' then
-- Downsampling (strided convolution)
next_dim = tonumber(string.sub(v, 2))
layer = nn.SpatialConvolution(prev_dim, next_dim, 3, 3, 2, 2, 1, 1)
elseif first_char == 'U' then
-- Nearest-neighbor upsampling
next_dim = prev_dim
local scale = tonumber(string.sub(v, 2))
layer = nn.SpatialUpSamplingNearest(scale)
elseif first_char == 'u' then
-- Learned upsampling (strided full-convolution)
next_dim = tonumber(string.sub(v, 2))
layer = nn.SpatialFullConvolution(prev_dim, next_dim, 3, 3, 2, 2, 1, 1, 1, 1)
elseif first_char == 'C' then
-- Non-residual conv block
next_dim = tonumber(string.sub(v, 2))
layer = build_conv_block(next_dim, opt.padding_type)
needs_bn = false
needs_relu = true
elseif first_char == 'R' then
-- Residual (non-bottleneck) block
next_dim = tonumber(string.sub(v, 2))
layer = build_res_block(next_dim, opt.padding_type)
needs_bn = false
needs_relu = true
end
model:add(layer)
if i == #arch then
needs_relu = false
needs_bn = false
end
if needs_bn then
model:add(nn.SpatialBatchNormalization(next_dim))
end
if needs_relu then
model:add(nn.ReLU(true))
end
prev_dim = next_dim
end
model:add(nn.Tanh())
return model
end

这里传入的参数opt实际上是经过cmd:parse的内容,而cmd是一个torch.CmdLine()。其中opt.arch是具体的网络构架,以逗号来拼接不同的部分,每个部分以缩略码的形式来表示。

参考这里使用的网络默认值,可以有对格式更清晰的理解。具体每个片段代表什么,会在之后提及。

1
cmd:option('-arch', 'c9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-2')

那么,重新审视一遍这段代码。

1
2
3
local arch = opt.arch:split(',')
local prev_dim = 1
local model = nn.Sequential()

第一句很清晰了,根据逗号来分割字符串,再根据每个片段来拼组网络。第二句定义了传入数据的深度,第三句建立了一个总的网络容器。

1
2
3
4
local first_char = string.sub(v, 1, 1)
local layer, next_dim
local needs_relu = true
local needs_bn = true

循环内部,这里的v指第i个片段的value。第一行取每个片段的首字符用来判断层的属性,后面几句用来定义层内容、是否需要接一步激活函数和BN。

接下来就是实际的层判断部分了,这里只看用到的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
if first_char == 'c' then
-- Convolution
local f = tonumber(string.sub(v, 2, 2)) -- filter size
local p = (f - 1) / 2 -- padding
local s = tonumber(string.sub(v, 4, 4)) -- stride
next_dim = tonumber(string.sub(v, 6))
if opt.padding_type == 'reflect' then
model:add(nn.SpatialReflectionPadding(p, p, p, p))
p = 0
elseif opt.padding_type == 'replicate' then
model:add(nn.SpatialReplicationPadding(p, p, p, p))
p = 0
elseif padding_type == 'none' then
p = 0
end
layer = nn.SpatialConvolution(prev_dim, next_dim, f, f, s, s, p, p)

小写c代表单独的卷积层。之后几句判断卷积层的属性,这里以“c9s1-32”为例,即代表该卷积层的滤波核大小为9,pad为4,步长为1,输出深度为32。

1
2
3
4
elseif first_char == 'd' then
-- Downsampling (strided convolution)
next_dim = tonumber(string.sub(v, 2))
layer = nn.SpatialConvolution(prev_dim, next_dim, 3, 3, 2, 2, 1, 1)

小写d是下采样步骤,这里使用步长大于1的卷积过程来完成。小写字母d后接的数字代表输出深度。

1
2
3
4
5
6
elseif first_char == 'R' then
-- Residual (non-bottleneck) block
next_dim = tonumber(string.sub(v, 2))
layer = build_res_block(next_dim, opt.padding_type)
needs_bn = false
needs_relu = true

大写R是残差块,同样字母接一个输出深度数字。最后接一个激活函数。

1
2
3
4
elseif first_char == 'u' then
-- Learned upsampling (strided full-convolution)
next_dim = tonumber(string.sub(v, 2))
layer = nn.SpatialFullConvolution(prev_dim, next_dim, 3, 3, 2, 2, 1, 1, 1, 1)

小写u代表上采样操作,这里使用的是非常高大上的反卷积代替全连接。卷积核尺寸3*3,步长2*2,输入和输出都补一圈pad。

1
return M

最后这里返回了M,因此如果进行操作

1
local models = require 'models'

就会直接初始化models为一个M实例了。

train

最开始的部分定义了网络结构和各种参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
require 'torch'
require 'optim'
require 'image'
require 'DataLoader'
local utils = require 'utils'
local models = require 'models'
local cmd = torch.CmdLine()
-- Generic options
cmd:option('-arch', 'c9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-2')
cmd:option('-h5_file', 'coco.h5')
cmd:option('-padding_type', 'reflect-start')
cmd:option('-resume_from_checkpoint', '')
-- Optimization
cmd:option('-num_iterations', 50000)
cmd:option('-max_train', -1)
cmd:option('-batch_size', 30)
cmd:option('-learning_rate', 1e-3)
cmd:option('-lr_decay_every', 3000)
cmd:option('-lr_decay_factor', 0.5)
-- Checkpointing
cmd:option('-checkpoint_name', 'checkpoint')
cmd:option('-checkpoint_every', 1000)
cmd:option('-num_val_batches', 10)
-- Backend options
cmd:option('-gpu', 0)
cmd:option('-use_cudnn', 1)
cmd:option('-backend', 'cuda', 'cuda|opencl')

这个文件被th的时候会调用main函数。main函数有点复杂,一点一点来看:

1
2
3
4
local opt = cmd:parse(arg)
-- Figure out the backend
local dtype, use_cudnn = utils.setup_gpu(opt.gpu, opt.backend, opt.use_cudnn == 1)

传入的opt是对cmd进行解析后的数据,上面的每个参数都可以被调用,就像opt.num_iterations这样。

utils是一个工具包,setup_gpu函数的定义是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
function M.setup_gpu(gpu, backend, use_cudnn)
local dtype = 'torch.FloatTensor'
if gpu >= 0 then
if backend == 'cuda' then
require 'cutorch'
require 'cunn'
cutorch.setDevice(gpu + 1)
dtype = 'torch.CudaTensor'
if use_cudnn then
require 'cudnn'
cudnn.benchmark = true
end
elseif backend == 'opencl' then
require 'cltorch'
require 'clnn'
cltorch.setDevice(gpu + 1)
dtype = torch.Tensor():cl():type()
use_cudnn = false
end
else
use_cudnn = false
end
return dtype, use_cudnn
end

总得来说就是只有gpu置正数,有cuda的情况下才能开use_cudnn。dtype就是Tensor的type,具体的不管了【

回到正题,接着过代码:

1
2
3
4
5
6
7
8
9
10
11
12
-- Build the model
local model = nil
if opt.resume_from_checkpoint ~= '' then
print('Loading checkpoint from ' .. opt.resume_from_checkpoint)
model = torch.load(opt.resume_from_checkpoint).model:type(dtype)
else
print('Initializing model from scratch')
model = models.build_model(opt):type(dtype)
end
if use_cudnn then cudnn.convert(model, cudnn) end
model:training()
print(model)

这一段创建网络模型,没什么要讲的。

1
2
3
4
5
6
7
local loader = DataLoader(opt)
local params, grad_params = model:getParameters()
local criterion = nn.MSECriterion():type(dtype)
local optim_state = {learningRate=opt.learning_rate}
local train_loss_history = {}
local val_loss_history = {}
local val_loss_history_ts = {}

从这里开始就准备要喂数据了,实例化一个DataLoader。因为optim需要的数据格式是一维Tensor,所以使用getParameters把输入数据变成拉平的格式。

在定义循环迭代之前,还要先封装一下优化方法需要用到的f函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
local function f(x)
assert(x == params)
grad_params:zero()
-- x is y value, y is uv value
local x, y = loader:getBatch('train')
x, y = x:type(dtype), y:type(dtype)
-- Run model forward
local out = model:forward(x)
local grad_out = nil
-- This is a bit of a hack: if we are using reflect-start padding and the
-- output is not the same size as the input, lazily add reflection padding
-- to the start of the model so the input and output have the same size.
if opt.padding_type == 'reflect-start' and x:size(3) ~= out:size(3) then
local ph = (x:size(3) - out:size(3)) / 2
local pw = (x:size(4) - out:size(4)) / 2
local pad_mod = nn.SpatialReflectionPadding(pw, pw, ph, ph):type(dtype)
model:insert(pad_mod, 1)
out = model:forward(x)
end
local loss = criterion:forward(out,y)
grad_out = criterion:backward(out, y)
-- Run model backward
model:backward(x, grad_out)
return loss, grad_params
end

之后就可以进入循环部分了。在每个迭代中,做一步adam优化,并记录下损失值:

1
2
3
local epoch = t / loader.num_minibatches['train']
local _, loss = optim.adam(f, params, optim_state)
table.insert(train_loss_history, loss[1])

每隔一定次数的迭代,需要放测试集来计算一下损失值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
if t % opt.checkpoint_every == 0 then
-- Check loss on the validation set
loader:reset('val')
model:evaluate()
local val_loss = 0
print 'Running on validation set ... '
local val_batches = opt.num_val_batches
for j = 1, val_batches do
local x, y = loader:getBatch('val')
x, y = x:type(dtype), y:type(dtype)
local out = model:forward(x)
val_loss = val_loss + criterion:forward(out,y)
end
val_loss = val_loss / val_batches
print(string.format('val loss = %f', val_loss))
table.insert(val_loss_history, val_loss)
table.insert(val_loss_history_ts, t)
model:training()
-- Save a JSON checkpoint
local checkpoint = {
opt=opt,
train_loss_history=train_loss_history,
val_loss_history=val_loss_history,
val_loss_history_ts=val_loss_history_ts
}
local filename = string.format('%s.json', opt.checkpoint_name)
paths.mkdir(paths.dirname(filename))
utils.write_json(filename, checkpoint)
-- Save a torch checkpoint; convert the model to float first
model:clearState()
if use_cudnn then
cudnn.convert(model, nn)
end
model:float()
checkpoint.model = model
filename = string.format('%s.t7', opt.checkpoint_name)
torch.save(filename, checkpoint)
-- Convert the model back
model:type(dtype)
if use_cudnn then
cudnn.convert(model, cudnn)
end
params, grad_params = model:getParameters()
end

然后生成一次权重。

最后再定期更新一下学习率:

1
2
3
4
if opt.lr_decay_every > 0 and t % opt.lr_decay_every == 0 then
local new_lr = opt.lr_decay_factor * optim_state.learningRate
optim_state = {learningRate = new_lr}
end