Real-time software electric guitar audio transcription

Show full item record

Redirect: RIT Scholars content from RIT Digital Media Library has moved from to RIT Scholar Works, please update your feeds & links!
Title: Real-time software electric guitar audio transcription
Author: Fiss, Xander
Abstract: Guitar audio transcription is the process of generating a human-interpretable musical score from guitar audio. The musical score is presented as guitar tablature, which indicates not only what notes are played, but where they are played on the guitar fretboard. Automatic transcription remains a challenge when dealing with polyphonic sounds. The guitar adds further ambiguity to the transcription problem because the same note can often be played in many ways. In this thesis work, a portable software architecture is presented for processing guitar audio in real time and providing a set of highly probable transcription solutions. Novel algorithms for performing polyphonic pitch detection and generating confidence values for transcription solutions (by which they are ranked) are also presented. Transcription solutions are generated for individual signal windows based on the output of the polyphonic pitch detection algorithm. Confidence values are generated for solutions by analyzing signal properties, fingering difficulty, and proximity to previous highest confidence solutions. The rules used for generating confidence values are based on expert knowledge of the instrument. Performance is measured in terms of algorithm accuracy, latency, and throughput. The correct result is ranked 2.08 (with the top rank being 0) for chords. The general case of various notes over time presents results that require qualitative analysis; the system in general is very susceptible to noise and has a difficult time distinguishing harmonics from actual fundamentals. By allowing the user to seed the system with a ground truth, correct recognition of future states is improved significantly in some cases. The sampling time is 250 ms with an average processing time of 110 ms, giving an average total latency of 360 ms. Throughput is 62.5 sample windows per second. Performance is not processor-bound, enabling high performance on a wide variety of personal computers.
Record URI:
Date: 2011-05

Files in this item

Files Size Format View
XFissThesis5-2011.pdf 3.555Mb PDF View/Open

The following license files are associated with this item:

This item appears in the following Collection(s)

Show full item record

Search RIT DML

Advanced Search