The effect of activation function choice on the performance of convolutional neural networks

With the advance of technology, artificial intelligence (AI) is now applied widely in society. For example, natural language processing, speech recognition, and autopilot are all famous examples of how AI is changing our world. In the study of AI, machine learning (ML) is a subfield in which a machine learns to be better at performing certain tasks through experience. This work focuses on the convolutional neural network (CNN), a framework of ML, applied to an image classification task. Specifically, we analyzed the performance of the CNN as the type of neural activation function changes. Choosing the right neural activation function is crucial to prevent the loss of important trends and increase the efficiency of training time. Among all the different widely used activation functions, we hypothesized that a rectified linear unit (ReLU) would be the most efficient in training time and attain the highest accuracy on the image recognition task because ReLU has the advantage of avoiding the vanishing gradient problem and only requires light mathematical calculations. Having high accuracy and efficiency in the task of image classification is beneficial as this technique can be employed in many different real-world applications, such as diagnosing in healthcare, identifying potential threats in security, or developing an autonomous vehicle. Our results indicate that when the number of hidden layers is small, networks employing the ReLU performed similarly to networks using hyperbolic tangent, and both networks with ReLU and hyperbolic tangent outperformed networks using the sigmoid function.