An auditory classifier employing a wavelet neural network implemented in a digital design

Show full item record

Title: An auditory classifier employing a wavelet neural network implemented in a digital design
Author: Hughes, Jonathan
Abstract: This thesis addresses the problem of classifying audio as either voice or music. The goal was to solve this problem by means of digital logic circuit, capable of performing the classification in real time. Since digital audio is essentially a discrete non-periodic timeseries, it was necessary to extract features from the audio which are suitable for classification. The discrete wavelet transform combined with a feature extraction method was found to produce such features. The task of classifying these features was found to be best performed by an artificial neural network. Collectively known as a wavelet neural network, the digital logic design implementation of this architecture was effective in correctly identifying the test data sets. The wavelet neural network was first implemented as a software model, to develop the network architecture and parameters, and to determine ideal results. The unconstrained software simulation was capable of correctly classifying test data sets with greater than 90% accuracy. This model was not feasible as a digital logic design however, as the size of the implementation would have been prohibitive. The size of the resulting hardware model was constrained by reducing the widths of the data paths and storage registers. The hardware implementation of the wavelet processor consisted of a novel pipelined design with a novel data-flow control structure. The neural network training was performed entirely in software by way of a novel training algorithm, and the resulting weights were made to be available to be uploaded to the hardware model. The digital design of the wavelet neural network was modeled in VHDL and was synthesized with Synplicity Synplify, using Actel ProASICPlus APA600 synthesized library cells with a target clock frequency of 11.025 KHz, to match the sampling rate of the digital audio. The results of the synthesis indicated that the design could operate at 15.6 MHz, and required 96,265 logic cells. The resulting constrained wavelet neural network processor was capable of correctly classifying test data sets with greater than 70% accuracy. Additional modeling showed that with a reasonable increase in hardware size, greater than 86% accuracy is attainable. This thesis focused on classifying audio as either voice or music, and future research could readily extend this work to the problem of speaker recognition and multimedia indexing.
Record URI: http://hdl.handle.net/1850/2629
Date: 2006-10-03

Files in this item

Files Size Format View
JHughesThesis08-2006.pdf 430.2Kb PDF View/Open

The following license files are associated with this item:

This item appears in the following Collection(s)

Show full item record

Search RIT DML


Advanced Search

Browse