However I noticed that while there are two different audio tracks (original and dubbed), commercials are always on single language. So both audio tracks are playing the same data during commercial!
I made a quick and dirty test, extract both tracks with ffmpeg:
ffmpeg -i 00001.ts -map 0:1 lang1.wav -map 0:2 lang2.wav
Then I made my own C program that simply reads these files and compares in 1152 sample blocks (mp2 block size).. They are not exact matches, but very close, and always exact 1152 sample boundaries so they are quick to match. This finds the commercials really well.
But the problem is that if there is any errors in the original file, the generated .wavs can be much shorter than the full video, and then the cutpoints are wrong. So I can't really use my own program, it's not reliable because of this.. So I'm asking, is it possible to add to Comskip?
