I like to be able to compare mp3’s programmatically. The problem I don’t know by what. Header? Histogram? channels? Does anyone have experience with this subject?
-
I guess you mean "compare", do you? – schnaader Feb 15 '09 at 14:29
-
The programs used for comparing audio referenced in the answers below seem abandoned years ago. Is there any still maintained? I also face this problem and I need a program to compare two mp3s and get a report with differences. – Adrian Ber Nov 08 '15 at 16:06
-
@AdrianBer, Maybe https://acoustid.org/chromaprint? See my answer. – Alexis Wilke May 28 '19 at 22:49
9 Answers
I wrote my master's thesis on audio fingerprinting. The thesis lists a few open source solutions to the problem of comparing what the music sounds like, and provides performance comparisons between them. Might be overkill, but there are some really decent applications out there.
If you only want to compare by tagged data, the standard to look into is ID3. There are basically two versions, the first is very simple (ID3v1) and consists of a 128 byte block at the end of an MP3. ID3v2 puts a larger, variable sized block at the beginning of the MP3.

- 12,827
- 14
- 59
- 102
-
1The link to Vegard Larsen's thesis is broken now, but I could find it here: http://daim.idi.ntnu.no/masteroppgaver/IME/IDI/2008/4014/masteroppgave.pdf – mivk Nov 30 '11 at 22:04
-
@VegardLarsen the URL linking to your masters thesis is broken, could you update it please? – Hooray Im Helping Oct 17 '14 at 01:22
-
1@HoorayImHelping URL in post updated. Someone changed the URL structure: http://daim.idi.ntnu.no/masteroppgaver/004/4014/masteroppgave.pdf – Vegard Larsen Oct 17 '14 at 06:44
I like to be able to compare mp3’s programmatically
I had the same question. I found that itunes had altered many of my Amazon MP3 downloads, changing the time/date stamps, the file sizes and therefore the MD5 signatures. My backups suddenly had many near duplicate files.
When I did a VIM diff, I could see that the changes were limited to very small parts of the files. The files looked identical side by side in Audacity even at a close zoom.
My solution is to create a headerless WAV dump of the mp3 and then compare the MD5 signatures of each WAV. FFMPEG can do the translation quite easily.
ffmpeg -y -i $mp3 $mp3.wav;
md5sum $mp3.wav
I created a hash with MD5 as key pointing to the original MP3 file spec. Put the wav file on an SSD for speed.
Brute force, but it works.

- 19,179
- 10
- 84
- 156

- 151
- 1
- 4
-
1The created wav may still have some metadata. Use this to get rid of it: `avconv -i wav.wav -map 0 -map_metadata 0:s:0 -c copy nometa.wav` – Tamas Apr 22 '14 at 18:45
I guess there are a number of approaches you could take to this:
1. Compare tags
You could compare the data held in mp3's tags. The tags are held in the ID3 format. There are a number of libraries to help you access the tags, tagLib is a popular choice (TagLib Sharp for .net apps)
2. Acoustic fingerprint
This is by far the most robust method, allowing you to find matches regardless of the compression or even format. A unique fingerprint is created from the actual audio from the file allowing the song to be identified echoprint is an opensource example of this.
3. Creating a hash from the file
This is a quicker method allowing you to find file with content that matches exactly.
Some further reading:

- 3,225
- 1
- 26
- 33
What do you mean by comparing ? The meta-data (author, title, etc...), the audio data ? For what purpose ?
On popular and basic way to compare audio data is to compute some kind of distance on some spectral features, such as MFCC:
http://en.wikipedia.org/wiki/Mel_frequency_cepstral_coefficient

- 78,318
- 8
- 63
- 70
To answer your question better I think we need to know exactly what you are looking to do.
If you are looking to compare the actual song, musicDNS have a library that are able to create audio fingerprints. The library called libOFA can be found here. This fingerprinting system is used by for example musicbrainz to match digital audiofiles to their database. In theory you can use this to compare two different digital files.
If you are looking to compare tag data (id3v1/id3v2) there are a lot of libraries that can do that for you, taglib is mentioned and also libmpg123 have their own functions to extract tag data.
The good thing about the libOFA approach is that you can compare different formats to each other since the fingerprinting is done on the audio itself.

- 91
- 6
It looks like Chromaprint would do what you're looking for. It transforms PCM data in audio fingerprints which you can then use to compare.
They have a C API library (it's actually written in C++, though), a python front end, and also some utilities to convert the results in JSON which means you could use another language to manipulate the data. I don't think that they provide the compare function itself, though.
Also if you're using a Linux system, it's likely that you will find a package for it.

- 19,179
- 10
- 84
- 156
If you're just looking to compare mp3s based on the tags, I'd recommend taglib.

- 20,181
- 4
- 35
- 36
I wrote a php program to just compare the audio --ignoring all the headers, gfx, and other info.
Basically from a file list foreach as $src:
/usr/bin/ffmpeg -hide_banner -y -i "$src" -f s16le -acodec pcm_s16le output.raw 2> /dev/null
You can md5 the output.raw file (you have to record that file) and compare it with other raw files.
The converted file is a raw output audio and is not used except for creating the hash. The only problems I foresee with my script is keeping a file of lower quality after conversion/hashing, or keeping a file with less ID3 tags... although I move files, rather than delete them, so still have the old files.
I frequently use fdupes on linux to locate duplicate files. fdupes uses md5 checksums.

- 75,535
- 32
- 152
- 208