Crowd Counting Baseline

We set up a strong baseline for crowd counting easy to follow and implement. Following tricks are adopted to improve the performance. We are continuing with this work.

Experiments are conducted on ShanghaiTech PartA dataset.

Baseline Network Architecture

First 13 layers of VGG-16 without batch-norm followed by upsample and conv layers to get the same size density maps as input images. In the backend, SE blocks is adopted. Swish activation replaces ReLU. Figure will be added soon.

Mult-Adds is calculated using 1*3*224*224 as input.

Use full images and MSE loss function to train.

VGG16-13 not converge.

When applying pyramids in the whole decoder, the model converge to a bad local minimum, so only first 2 layers use pyramids.

Problems to be issued:

When training U-VGG, loss converge to 1.1e-4. and MAE ~240. Try (1)use SGD (2)random init instead of vgg16 and simplify. but they do not work.
When training CSRNet, if init the parameters randomly instead of using pretrained, converge to MAE ~330.
If use first 13 layers of VGG16 in M_VGG, downsample = 8, training loss converges to ~0.0030 and do not reduce. MAE is 86.4.
When adding 3 upsample layers into CSRNet so that to regress 1 size density maps, loss converges to ~1.218e-5, and MAE > 200. When adding 2 upsample layers, , loss converges to ~1.8e-4, and MAE > 120. When adding 1 upsample layer, loss converges to ~2e-3, and MAE ~94. Reducing the lr will work.
When use ResNet-50, the outputs seem to be quite close to each other.

Model	MAE	RMSE	PSNR	SSIM	Params	Mult-Adds
ResNet-50 + decoder	80.6	130.1			11.6G	29.6G
Res2Net-50 + decoder					11.8G	29.7G
InceptionV3 + decoder	119.4	170.5
VGG16-10 + decoder(CSRNet)	72.6	114.8	22.66	0.70	16.3M	30.6G
VGG16-10 + decoder(1,3,5,7 filter)	70.9	110.8			13.0M	28.0G
VGG16-10 + decoder(1,2,3,6 dilation)	74.7	113.1			11.4M	26.8G
VGG16-10 + decoder(depth pyramid)	74.2	112.5			12.0M	27.3G
VGG16-13 + decoder	86.4	125.1
VGG16-10 + decoder, 1 size	>200	>300
VGG16-10 + decoder, 1/2 size	119.4	192.9
VGG16-10 + decoder, 1/4 size	94
VGG16-10 + Dense	72.5	113.8			13.0M	28.8G
VGG16-10 + DenseRes	72.1	116.0			10.6M	29.6G
VGG16-10 + Res	74.4	113.3			16.3M	30.6G
VGG16-13 + decoder + swish	73.9	117.8
VGG16-13 + decoder + se	74.8	119.7

Select model: VGG16-10 + decoder(CSRNet)

Augmentation

When calculating PSNR and SSIM, different resolutions lead to different result. For original size density maps, the value of each pixel is quite small so that PSNR and SSIM is bigger than 1/8 size.

$Loss = L_{MSE}+100downsampleL_{C}$

Strategy	MAE	RMSE	PSNR	SSIM	PSNR(1/8)	SSIM(1/8)	Time/epoch
0.3$\times$	62.9	99.7	58.61	0.9869	22.51	0.62	0.33$\times$
0.4$\times$	64.8	95.8	58.44	0.9864	22.35	0.61	0.39$\times$
0.5$\times$	64.3	100.4	58.27	0.9861	22.18	0.58	0.43$\times$
0.6$\times$	64.4	98.8	58.36	0.9862	22.27	0.60	0.51$\times$
0.7$\times$	64.8	99.7	58.16	0.9858	22.09	0.56	0.61$\times$
0.8$\times$	64.2	99.6	58.23	0.9858	22.15	0.60	0.71$\times$
0.9$\times$	66.5	100.3	58.12	0.9856	22.04	0.58	0.86$\times$
Original	67.7	103.1	58.02	0.9854	21.94	0.56	1.00$\times$
fixed	64.9	101.2	58.34	0.9862	22.26	0.62	1.20$\times$
fixed+random	63.8	101.1	58.39	0.9865	22.30	0.64	2.43$\times$
mixed	68.7	106.5	58.01	0.9854	21.92	0.56	5.01$\times$

Map Size

	MAE	RMSE	PSNR	SSIM	PSNR(1/8)	SSIM(1/8)
1	62.9	99.7	58.61	0.9869	22.51	0.62
$\frac{1}{2}$	62.0	95.4	46.47	0.9416	22.42	0.62
$\frac{1}{4}$	62.0	93.0	34.38	0.8197	22.35	0.61
$\frac{1}{4}, L_{MSE}+400downsampleL_{C}$	61.4	92.6
$\frac{1}{4}, L_{MSE}+1000downsampleL_{C}$	60.0	92.6
$\frac{1}{4}, L_{MSE}+25downsampleL_{C}$	63.5	93.4
$\frac{1}{8}$	63.0	95.6	22.24	0.6122	22.24	0.61


SHB
qnrf	95.3	166.9

Loss Function

Size = 1

	MAE	RMSE
$L_{MSE}$
$L_{MSE}+100downsampleL_{C}$	62.9	99.7
$DMS-SSIM$	71.7	108.9
$MS-SSIM$	69.8	104.4
$L_{SA}+L_{SC}$	Not converge
1	71.9	110.0
2	70.6	112.9
3	71.6	110.2
4	69.9	109.2
5	69.6	105.5

Learning Objective

	MAE	RMSE	PSNR	SSIM	Params(M)
Density Map					23.45
Density Map + Soft Attention Map					23.53
Density Map + Hard Attention Map					23.53

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
modeling		modeling
utils		utils
README.md		README.md
train_generic.py		train_generic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowd Counting Baseline

Baseline Network Architecture

Augmentation

Map Size

Loss Function

Learning Objective

About

Releases

Packages

Languages

rongliangzi/CrowdCountingBaseline

Folders and files

Latest commit

History

Repository files navigation

Crowd Counting Baseline

Baseline Network Architecture

Augmentation

Map Size

Loss Function

Learning Objective

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages