Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Audio Set: An ontology and human-labeled dataset for audio events
2.948
Zitationen
8
Autoren
2017
Jahr
Abstract
Audio event recognition, the human-like ability to identify and relate sounds from audio, is a nascent problem in machine perception. Comparable problems such as object detection in images have reaped enormous benefits from comprehensive datasets - principally ImageNet. This paper describes the creation of Audio Set, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research. Using a carefully structured hierarchical ontology of 632 audio classes guided by the literature and manual curation, we collect data from human labelers to probe the presence of specific audio classes in 10 second segments of YouTube videos. Segments are proposed for labeling using searches based on metadata, context (e.g., links), and content analysis. The result is a dataset of unprecedented breadth and size that will, we hope, substantially stimulate the development of high-performance audio event recognizers.
Ähnliche Arbeiten
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
2014 · 10.779 Zit.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
2012 · 10.272 Zit.
Speech recognition with deep recurrent neural networks
2013 · 8.807 Zit.
LSTM: A Search Space Odyssey
2016 · 6.746 Zit.
Librispeech: An ASR corpus based on public domain audio books
2015 · 5.949 Zit.