Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
CNN architectures for large-scale audio classification
2.436
Zitationen
13
Autoren
2017
Jahr
Abstract
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.
Ähnliche Arbeiten
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
2014 · 10.779 Zit.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
2012 · 10.272 Zit.
Speech recognition with deep recurrent neural networks
2013 · 8.807 Zit.
LSTM: A Search Space Odyssey
2016 · 6.746 Zit.
Librispeech: An ASR corpus based on public domain audio books
2015 · 5.949 Zit.