- CNNs are designed to recognize images (whereas RNNs are designed for sequences), but can be used for other data types
- Basic unit of CNN is a feature - small chunk of image data
- CNN efficiency depends on num. of features, their size, weights of layers
- speed depends on input size, scales linearly
- Layers:
- convolution (multiplying feature to the part of the image)
- relu: Rectified Linear Units
- pooling (resizing)
- Backpropagation to assign weights to layers
- I'd like to understand depth more. Is it just stacking all of the different filters together?
- Lydia explained it: The input has a depth of 3 for the color channel. For the convolutional layers, the depth is the number of filters.
- What's the relationship between depth and interconnectedness?
- You're connected to the folks in your neighborhood, e.g., you are connected to your northwest/northeast/southwest/southeast/south/nort/east/west neighbors, but not all of the other pixels.
- Are filters that different from features in traditional ML?
- Lydia explained it: You don't pre-decide what the filters are going to be! The network learns that.
- What is the difference between stride and pooling? Both seem to shrink the image. (Reading farther down in Getting rid of pooling, the authors make a similar suggestion.
- I didn't get any of the fully-connected/conv layer conversion stuff.
- Not a question, but this was heartening:
Instead of rolling your own architecture for a problem, you should look at whatever architecture currently works best on ImageNet, download a pretrained model and finetune it on your data