This repository provides a brief overview to my bachelor's thesis and the corresponding source code.
The overall goal was to implement a conditioned convolutional neural network that uses lesion images from a patient to classify whether a given lesion is a malignant melanoma or a benign lesion, and to assess whether the conditioned convolutional neural network achieves a higher evaluation score than a non-conditioned convolutional neural network.
The dataset [1] used for this thesis was provided by the International Skin Imaging Collaboration (ISIC) and the Society for Imaging Informatics in Medicine (SIIM). Originally compiled for the SIIM-ISIC Melanoma Classification Challenge 2020, the dataset comprises
Instead of using just a single image (or any other appropriate format for a ConvNet, such as an audio file) for classification or regression, the artificial neural network will incorporate additional information related to the image. This additional information is often called conditioning information or contextual information and is used as an auxiliary input to the artificial neural network. Conditioning information may be metadata, images, or any other format that seems reasonable for the task.
Using this conditioning information, the input to the hidden layers will be transformed in a specific way, attempting to learn useful patterns from it. Not only the input to one hidden layer may be transformed. The choice of which hidden layer’s input, and how many of them, should be transformed (i.e., conditioned) by the conditioning information is determined by a hyperparameter.
One method to transform the non-conditioning input is to simply concatenate the conditioning information with the input to the hidden layer. For my Bachelor's thesis, however, I chose to apply a Feature-Wise Linear Modulation (FiLM) approach, which takes as input the lesion image to be classified, and a second lesion image from the same patient, that serves as the conditioning information.
FiLM applies feature-wise affine transformations to the feature maps of a convolutional neural network, thereby influencing the output. The feature wise-affine transformation ensures that, for example, intermediate features are either scaled up or down, allowing certain features to be emphasized or suppressed. [2]
A convolutional neural network, or more generally, an artificial neural network that incorporates FiLM, consists of the following components:
- FiLM-Generator is an artificial neural network (in the case of my thesis, a convolutional neural network) that takes conditioning information as input and outputs two learned parameters,
$$\gamma_{i,c}$$ and$$\beta_{i,c}$$ . These parameters modulate the non-conditioning information.$$\gamma_{i,c}$$ is a scaling parameter, whereas$$\beta_{i,c}$$ is a shifting parameter. Here,$$i$$ indicates the sample for which the parameter was calculated, and$$c$$ refers to the feature map (i.e., channel) for which the parameter was calculated. Thus, the FiLM-Generator can be seen as a function$$f$$ that, given an image$$l_{i}$$ (in this thesis), computes$$\gamma_{i,c}$$ and$$\beta_{i,c}$$ .$$f$$ can then be written as$$f(l_{i})=(\gamma_{i,c}, \beta_{i,c}) \in \mathbb{R}^{2}$$ . [2] - The Feature-wise Linearly Modulated Network (FiLM-ed network) in this thesis is, like the FiLM-Generator, also a convolutional neural network that takes the non-conditioning information (a lesion image, denoted as
$$x_i$$ ) as input. At certain hidden layers, called FiLM layers, it incorporates the parameters$$(\gamma, \beta)$$ computed by the FiLM-Generator [2]. Within the FiLM layers, the intermediate features of$$x_{i}$$ , denoted as$$F_{i}$$ , are modulated by these parameters as follows [2]:
Although the FiLM-Generator and the Feature-wise Linearly Modulated Network are individual artificial neural networks, the entire network was trained end-to-end. For simplicity, let's refer to the network that incorporates both the FiLM-Generator and the FiLM-ed network as the FiLM network. The following diagram depicts the architecture:
Figure 1) The FiLM network architecture. On the left side the FiLM-Generator with the conditioned information
EfficientNet-b0 and EfficientNet-b4 were used for the FiLM-Generator and the FiLM-ed network, respectively. The performance of the FiLM network was compared to that of a standard EfficientNet-b4 network without any FiLM layers. For more information about EfficientNet, refer to the paper by M. Tan and Q. V. Le.
The model's performance was measured using the Area Under the Receiver Operating Characteristic Curve (AUROC). For more information about AUROC, visit the corresponding google developer article.
Multiple experiments were conducted, with the best result achieved using a pretrained FiLM network. The FiLM network reached an AUROC score of
If you have any questions regarding my bachelor thesis feel free to write me a PM.