Music and Audio Retrieval Tools |
Musical Pitch ContoursA musical pitch contour describes a series of relative pitch transitions, an abstraction of a sequence of notes. A note in a piece of music is classified in one of three ways: it is either a repetition of the previous note (R); higher than previous note (U); or lower than the previous note (D). Thus, the piece can be converted into a string with a three letter alphabet (U, D, R). For example, the introductory theme to Beethoven's 5th Symphony would be converted into the sequence: - R R D U R R D. Notice that there is one less symbol than notes as only the transitions between notes are recorded. With respect to a search by humming system, the use of contours eliminates input errors due to the user singing out of key, out of time or out of tune. As long as the pitch direction is correct then the contour should be found. The drawback is that all rhythmic information is lost; if this could be used in conjunction with the pitch contour then the number of incorrect matches would be decreased. The contours we use to demonstrate the system are:
An nth-order pitch contour compares the current note with the nth preceeding note. So, a first order contour compares the current note with the note last played. A second order contour compares the current note with the note-before-last. The higher orders of contour are used to reduce the amount of information lost when converting to a first order contour, whilst retaining much of the abstraction the representation provides. An Animated ExampleThe animated figure below shows the creation of a first order contour for the start of "Happy Birthday". A MIDI file of the transcription is available. The pitch of the current note (in blue) is compared with that of the previous note to identify the transition class (Up, Down or Repeated). Creation of the second order contour of the same music is shown below. The current note (in blue ) is compared with the second preceeding note (outlined with a red box). Note that higher-order contours are not intuitive to most users and so it is unlikely that they would be entered by the user but would be automatically deduced from, for example, a transcription of the user singing. |
||
|
|||
|