ImageNet Classification with Deep Convolutional
Neural Networks
Alex Krizhevsky |
Ilya Sutskever |
Geoffrey E. Hinton |
University of Toronto |
University of Toronto |
University of Toronto |
kriz@cs.utoronto.ca |
ilya@cs.utoronto.ca |
hinton@cs.utoronto.ca |
Abstract
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
1 Introduction
Current approaches to object recognition make essential use of machine learning methods. To im-prove their performance, we can collect larger datasets, learn more powerful models, and use bet-ter techniques for preventing overfitting. Until recently, datasets of labeled images were relatively small — on the order of tens of thousands of images (e.g., NORB [16], Caltech-101/256 [8, 9], and CIFAR-10/100 [12]). Simple recognition tasks can be solved quite well with datasets of this size, especially if they are augmented with label-preserving transformations. For example, the current-best error rate on the MNIST digit-recognition task (lt;0.3%) approaches human performance [4]. But objects in realistic settings exhibit considerable variability, so to learn to recognize them it is necessary to use much larger training sets. And indeed, the shortcomings of small image datasets have been widely recognized (e.g., Pinto et al. [21]), but it has only recently become possible to col-lect labeled datasets with millions of images. The new larger datasets include LabelMe [23], which consists of hundreds of thousands of fully-segmented images, and ImageNet [6], which consists of over 15 million labeled high-resolution images in over 22,000 categories.
To learn about thousands of objects from millions of images, we need a model with a large learning capacity. However, the immense complexity of the object recognition task means that this prob-lem cannot be specified even by a dataset as large as ImageNet, so our model should also have lots of prior knowledge to compensate for all the data we donrsquo;t have. Convolutional neural networks (CNNs) constitute one such class of models [16, 11, 13, 18, 15, 22, 26]. Their capacity can be con-trolled by varying their depth and breadth, and they also make strong and mostly correct assumptions about the nature of images (namely, stationarity of statistics and locality of pixel dependencies). Thus, compared to standard feedforward neural networks with similarly-sized layers, CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.
1
Despite the attractive qualities of CNNs, and despite the relative efficiency of their local architecture, they have still been prohibitively expensive to apply in large scale to high-resolution images. Luck-ily, current GPUs, paired with a highly-optimized implementation of 2D convolution, are powerful enough to facilitate the training of interestingly-large CNNs, and recent datasets such as ImageNet contain enough labeled examples to train such models without severe overfitting.
The specific contributions of this paper are as follows: we trained one of the largest convolutional neural networks to date on the subsets of ImageNet used in the ILSVRC-2010 and ILSVRC-2012 competitions [2] and achieved by far the best results ever reported on these datasets. We wrote a highly-optimized GPU implementation of 2D convolution and all the other operations inherent in training convolutional neural networks, which we make available publicly1. Our network contains a number of new and unusual features which improve its performance and reduce its training time, which are detailed in Section 3. The size of our network made overfitting a significant problem, even with 1.2 million labeled training examples, so we used several effective techniques for preventing overfitting, which are described in Section 4. Our final network contains five convolutional and three fully-connected layers, and this depth seems to be important: we found that removing any convolutional layer (each of which contains no more than 1% of the modelrsquo;s parameters) resulted in inferior performance.
In the end, the networkrsquo;s size is limited mainly by the amount of memory available on current GPUs and by the amount of training time that we are willing to tolerate. Our network takes between five and six days to train on two GTX 580 3GB GPUs. All of our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger datasets to become available.
2 The Dataset
ImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. The images were collected from the web and labeled by human labelers using Ama-zonrsquo;s Mechanical Turk crowd-sourcing tool. Starting in 2010, as part
全文共35569字,剩余内容已隐藏,支付完成后下载完整资料
英语原文共 6 页,剩余内容已隐藏,支付完成后下载完整资料
资料编号:[2176]
以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。
您可能感兴趣的文章
- 316L不锈钢激光部直接激光制造精密大型金属零件的形状和性能控制外文翻译资料
- 基于等离子弧焊的送丝电弧增材制造中 传热与流体数值分析外文翻译资料
- 焊接增材制造中用作二次热源的移动感应加热的建模外文翻译资料
- 齿轮参数优化综述外文翻译资料
- 铜基复合材料选择性激光熔融过程中的热行为和致密化机理:模拟 和实验外文翻译资料
- 基于ANSYS的激光选择性熔化三维热模拟的比较外文翻译资料
- 在SQL Server数据库中基于索引查询外文翻译资料
- 自适应Kurtogram及其在滚动轴承故障诊断中的应用外文翻译资料
- 集装箱起重机动力学:三维建模、全面实验和识别外文翻译资料
- 使用C空间法的五轴刀具定向优化外文翻译资料