基于图像识别的烟盒特征获取算法研究外文翻译资料

 2021-11-28 09:11

Recent advances in convolutional neural networks

Jason Kuen, Amir Shahroudy.etc

Abstract:In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

Keywords: Convolutional neural network, Deep learning

Introduction

Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures. In 1959, Hubel amp; Wiesel [1] found that cells in animal visual cortex are responsible for detecting light in receptive fields. Inspired by this discovery, Kunihiko Fukushima proposed the neocognitron in 1980 [2], which could be regarded as the predecessor of CNN. In 1990, LeCun et al. [3] published the seminal paper establishing the modern framework of CNN, and later improved it in [4]. They developed a multi-layer artificial neural network called LeNet-5 which could classify handwritten digits. Like other neural networks, LeNet-5 has multiple layers and can be trained with the backpropagation algorithm [5]. It can obtain effective representations of the original image, which makes it possible to recognize visual patterns directly from raw pixels with little-to-none preprocessing. A parallel study of Zhang et al. [6] used a shift-invariant artificial neural network (SIANN) to recognize characters from an image. However, due to the lack of large training data and computing power at that time, their networks can not perform well on more complex problems, e.g., large-scale image and video classification.

Since 2006, many methods have been developed to overcome the difficulties encountered in training deep CNNs [7–10]. Most notably, Krizhevsky et al. proposed a classic CNN architecture and showed significant improvements upon previous methods on the image classification task. The overall architecture of their method, i.e., AlexNet [8], is similar to LeNet-5 but with a deeper structure. With the success of AlexNet, many works have been proposed to improve its performance. Among them, four representative works are ZFNet [11], VGGNet [9], GoogleNet [10] and ResNet [12]. From the evolution of the architectures, a typical trend is that the networks are getting deeper, e.g., ResNet, which won the champion of ILSVRC 2015, is about 20 times deeper than AlexNet and 8 times deeper than VGGNet. By increasing depth, the network can better approximate the target function with increased nonlinearity and get better feature representations. However, it also increases the complexity of the network, which makes the network be more difficult to optimize and easier to get overfitting. Along this way, various methods have been proposed to deal with these problems in various aspects. In this paper, we try to give a comprehensive review of recent advances and give some thorough discussions.

Sparse convolution

Recently, several attempts have been made to sparsify the weights of convolutional layers [165,166]. Liu et al. [165] consider sparse representations of the basis filters, and achieve 90%% sparsifying by exploiting both inter-channel and intra-channel redundancy of convolutional kernels. Instead of sparsifying the weights of convolution layers, Wen et al. [166] propose a Structured Sparsity Learning (SSL) approach to simultaneously optimize their hyperparameters (filter size, depth, and local connectivity). Bagherinezhad et al. [167] propose a lookup-based convolutional neural network (LCNN) that encodes convolutions by few lookups to a rich set of dictionary that is trained to cover the space of weights in CNNs. They decode the weights of the convolutional layer with a dictionary and two tensors. The dictionary is shared among all weight filters in a layer, which allows a CNN to learn from very few training examples. LCNN can achieve a higher accuracy in a small number of iterations compared to standard CNN.

Applications of CNNs

In this section, we introduce some recent works that apply CNNs to achieve state-of-the-art performance, including image classification, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing.

Image classification

CNNs have been applied in image classification for a long time [168–171]. Compared with other methods, CNNs can achieve better classification accuracy on large scale datasets [8,9,172] due to their capability of joint feature and classifier learning. The breakthrough of large scale image classification comes in 2012. Krizhevsky et al. [8] develop the AlexNet and achieve the best performance in ILSVRC 2012. After the success of AlexNet, several works have made significant improvement

原文

Recent advances in convolutional neural networks

Jason Kuen, Amir Shahroudy.etc

Abstract:In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

Keywords: Convolutional neural network, Deep learning

Introduction

Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures. In 1959, Hubel amp; Wiesel [1] found that cells in animal visual cortex are responsible for detecting light in receptive fields. Inspired by this discovery, Kunihiko Fukushima proposed the neocognitron in 1980 [2], which could be regarded as the predecessor of CNN. In 1990, LeCun et al. [3] published the seminal paper establishing the modern framework of CNN, and later improved it in [4]. They developed a multi-layer artificial neural network called LeNet-5 which could classify handwritten digits. Like other neural networks, LeNet-5 has multiple layers and can be trained with the backpropagation algorithm [5]. It can obtain effective representations of the original image, which makes it possible to recognize visual patterns directly from raw pixels with little-to-none preprocessing. A parallel study of Zhang et al. [6] used a shift-invariant artificial neural network (SIANN) to recognize characters from an image. However, due to the lack of large training data and computing power at that time, their networks can not perform well on more complex problems, e.g., large-scale image and video classification.

Since 2006, many methods have been developed to overcome the difficulties encountered in training deep CNNs [7–10]. Most notably, Krizhevsky et al. proposed a classic CNN architecture and showed significant improvements upon previous methods on the image classification task. The overall architecture of their method, i.e., AlexNet [8], is similar to LeNet-5 but with a deeper structure. With the success of AlexNet, many works have been proposed to improve its performance. Among them, four representative works are ZFNet [11], VGGNet [9], GoogleNet [10] and ResNet [12]. From the evolution of the architectures, a typical trend is that the networks are getting deeper, e.g., ResNet, which won the champion of ILSVRC 2015, is about 20 times deeper than AlexNet and 8 times deeper than VGGNet. By increasing depth, the network can better approximate the target function with increased nonlinearity and get better feature representations. However, it also increases the complexity of the network, which makes the network be more difficult to optimize and easier to get overfitting. Along this way, various methods have been proposed to deal with these problems in various aspects. In this paper, we try to give a comprehensive review of recent advances and give some thorough discussions.

Sparse convolution

Recently, several attempts have been made to sparsify the weights of convolutional layers [165,166]. Liu et al. [165] consider sparse representations of the basis filters, and achieve 90%% sparsifying by exploiting both inter-channel and intra-channel redundancy of convolutional kernels. Instead of sparsifying the weights of convolution layers, Wen et al. [166] propose a Structured Sparsity Learning (SSL) approach to simultaneously optimize their hyperparameters (filter size, depth, and local connectivity). Bagherinezhad et al. [167] propose a lookup-based convolutional neural network (LCNN) that encodes convolutions by few lookups to a rich set of dictionary that is trained to cover the space of weights in CNNs. They decode the weights of the convolutional layer with a dictionary and two tensors. The dictionary is shared among all weight filters in a layer, which allows a CNN to learn from very few training examples. LCNN can achieve a higher accuracy in a small number of iterations compared to standard CNN.

Applications of CNNs

In this section, we introduce some recent works that apply CNNs to achieve state-of-the-art performance, including image classification, object tracking, pose estimation, text detection, visual saliency

原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付

以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。