Closed Captioning and Show Title

Here you can ask your questions on how to use Comskip for the detection of commercials. Also questions on how to remove commercials are welcome
Post Reply
pcon2009
Posts: 20
Joined: Wed Sep 28, 2016 5:30 pm

Closed Captioning and Show Title

Post by pcon2009 » Fri Sep 30, 2016 7:16 am

I still have a different question "pending" in another post, but I wanted to keep things organized and ask this in a different post... I was wondering if the Closed Captioning detection method is working for some people? I have tried enabling it in the .ini, and in the log of the files, the transcript of the CC shows up, so I know the captions are actually being detected, but it doesn't seem to have any actual effect on the commercial detection. Even if I add a specific line/phrase from a specific commercial to the .dictionary file, it will still be present in the output. I was hoping to use this method as a fine-tuning/fall-back method to help remove some pesky ads that aren't always removed by other methods. It seems like this could have some huge potential, but I am not sure why it isn't working, nor can I find any documentation on how to make sure I am doing this properly.

Also, in the console window while running, I have seen the "show title" is detected. Is there a way to get this to work to help remove any bits of show before and/or after the recording? Most of what I record starts close to on time, but just to be safe I usually add about 2 minutes to the start and end time of a recording. Sometimes this results in the recording starting with the end of the previous show - sometimes it is starting during a commercial. I know there are settings to remove show before first detected commercial, but I am under the impression that if a recording DOES NOT have this bit of show before a commercial break, that it will remove part of the show I am trying to keep. Is this true, or am I looking at this all wrong? And in either case, what with the show title information? Is that something I can use for tuning, or is not implemented at all?

Overall, I feel like I am getting very close to a usable .ini for most if not all of the shows I want to record, just needs a few tweaks here and there, so I really appreciate the help!

erik
Site Admin
Posts: 3310
Joined: Sun Aug 21, 2005 3:49 pm

Re: Closed Captioning and Show Title

Post by erik » Fri Sep 30, 2016 8:59 am

Not sure if CC is not broken.
Very little effort has been spend on keeping CC processing operational.
Maybe if I have more time I will look at it again.

pcon2009
Posts: 20
Joined: Wed Sep 28, 2016 5:30 pm

Re: Closed Captioning and Show Title

Post by pcon2009 » Fri Sep 30, 2016 5:09 pm

Thank you for the info on CC. I will do a bit of experimentation to see if I can get it to do anything, or if it indeed is broken. Personally, I think this method could turn out to be a lot more reliable than some of the other methods, especially for shows with too little logo or too much logo, etc. Since in the USA, the FCC has mandates for including captioning on most programming, shows should in theory be 99% captioned, with a few exceptions. If a commercial is captioned, it would be subject to the dictionary. If no captions are detected, that block could simply (a) be deemed a commercial or (b) caption method turns off (ie, when logo turns off if not enough is found, etc) and other methods are used. A lot of commercials these days can be IDs in just a few words of CC that are common across many of them (for example, most prescription drugs "Ask your doctor", or for automobile commercials "all-new 2017" etc) so if you do have any interest in working on CC again, please feel free to let me know as I would be happy to do some testing for you and work on a fresh dictionary file.

erik
Site Admin
Posts: 3310
Joined: Sun Aug 21, 2005 3:49 pm

Re: Closed Captioning and Show Title

Post by erik » Sat Oct 01, 2016 8:31 pm

Can you first test if the CC is listed in the log file?

pcon2009
Posts: 20
Joined: Wed Sep 28, 2016 5:30 pm

Re: Closed Captioning and Show Title

Post by pcon2009 » Sun Oct 02, 2016 7:39 am

Yes, I can confirm the CC is listed in the log file. As a quick example, here is the first section of show from an old Seinfeld episode, which includes the first lines of the show, goes through a commercial, and back into the show:

Code: Select all

Closed caption transcript
--------------------
0) S:     1 E:    33 L:  31 HE GOT YOU TO JOIN A BOOK CLUB?
1) S:   230 E:   249 L:  16  I GOT A FEELING
2) S:   266 E:   323 L:  50  I'M GOING TO BE MUCH SMARTER THAN YOUPRETTY SOON.
3) S:   426 E:   459 L:  29  I THINK THAT STATEMENT ALONE
4) S:   532 E:   575 L:  39  REFLECTS YOUR BURGEONING INTELLIGENCE.
35) S:  4304 E:  4349 L:  41  I absolutely love my New York apartment,
36) S:  4350 E:  4459 L:  28  but the rent is outrageous.
37) S:  4466 E:  4573 L:  24  Good thing GEICO offers
38) S:  4580 E:  4635 L:  30  affordable renters insurance.
39) S:  4642 E:  4725 L:  20  With great coverage
40) S:  4732 E:  4773 L:  34 it protects my personal belongings
41) S:  4780 E:  4873 L:  25  should they get damaged,
42) S:  4880 E:  4929 L:  21  stolen or destroyed.
43) S:  4936 E:  5003 L:  26  [doorbell] Uh, excuse me.
44) S:  5010 E:  5079 L:  10  Delivery.
45) S:  5086 E:  5171 L:   5  Hey.
46) S:  5178 E:  5247 L:   9  Lo mein,
47) S:  5254 E:  5323 L:  18  Szechwan chicken,
48) S:  5330 E:  5425 L:  12  chopsticks,
49) S:  5432 E:  5493 L:  10  soy sauce
50) S:  5500 E:  5577 L:  34  and you got some fortune cookies.
51) S:  5584 E:  5681 L:  17  Have a good one.
52) S:  5688 E:  5733 L:  39  Ah, these small New York apartments...
53) S:  5740 E:  5877 L:  26   Protect your belongings.
54) S:  5884 E:  5953 L:  20   Let GEICO help you
55) S:  5960 E:  6017 L:  25   with renters insurance.
56) S:  6166 E:  6201 L:  31  [ELAINE] HEY, WHAT'S GOING ON?
57) S:  6288 E:  6301 L:  11  NEW COUCH.
58) S:  6340 E:  6359 L:  16  NEW COUCH? WHY?
59) S:  6402 E:  6443 L:  37  KNOW THE BEST PART ABOUT THIS COUCH?
60) S:  6522 E:  6575 L:  46 IT DOESN'T FOLD OUT, SO NO ONE CAN SLEEP OVER.
(NOTE: I skipped lines 5-34 in order for this post not to be too long, but other than that, this is copy/pasted from the log.)

erik
Site Admin
Posts: 3310
Joined: Sun Aug 21, 2005 3:49 pm

Re: Closed Captioning and Show Title

Post by erik » Sun Oct 02, 2016 8:10 am

Excellent
Can you check if words in the dictionary are recognized

pcon2009
Posts: 20
Joined: Wed Sep 28, 2016 5:30 pm

Re: Closed Captioning and Show Title

Post by pcon2009 » Sun Oct 02, 2016 3:19 pm

A bit further down in the log file from this same video that I processed, we get to the following. I removed the lines where it is searching for words and phrases that were not found, but left those where it IS finding certain "good" and "bad" ones:

Code: Select all

Starting to process dictionary
-------------------------------------
Searching for: national captioning
NATIONAL CAPTIONING found in cc_text_block 568
Block 12 score:	Before - 0.00	After - 0.00
Finished with good phrases.  Now starting bad phrases.
Searching for: allegra
ALLEGRA found in cc_text_block 663
Block 19 score:	Before - 0.01	After - 0.02
ALLEGRA found in cc_text_block 667
Block 19 score:	Before - 0.02	After - 0.02
ALLEGRA found in cc_text_block 669
Block 19 score:	Before - 0.02	After - 0.02
ALLEGRA found in cc_text_block 670
Block 19 score:	Before - 0.02	After - 0.02
Searching for: allergy symptoms
ALLERGY SYMPTOMS found in cc_text_block 666
Block 19 score:	Before - 0.02	After - 0.02
Searching for: ask your doctor
ASK YOUR DOCTOR found in cc_text_block 662
Block 17 score:	Before - 10.13	After - 10.63
Searching for: flonase
FLONASE found in cc_text_block 571
Block 14 score:	Before - 29.57	After - 31.04
FLONASE found in cc_text_block 579
Block 14 score:	Before - 31.04	After - 32.60
FLONASE found in cc_text_block 581
Block 14 score:	Before - 32.60	After - 34.23
FLONASE found in cc_text_block 585
Block 14 score:	Before - 34.23	After - 35.94
Searching for: geico
GEICO found in cc_text_block 37
Block 3 score:	Before - 96.00	After - 100.80
GEICO found in cc_text_block 54
Block 3 score:	Before - 100.80	After - 105.84
GEICO found in cc_text_block 609
Block 15 score:	Before - 19.71	After - 20.70
Dictionary processed successfully
H4 Added cblock 20 because of large black gap with cblock 19
Threshold used - 1.0500	After rounding - 1.0500
(Not sure if the two lines after "dictionary processed successfully" apply to the CC processing, but the next section in the log is under a different header Initial Commercial List)

erik
Site Admin
Posts: 3310
Joined: Sun Aug 21, 2005 3:49 pm

Re: Closed Captioning and Show Title

Post by erik » Mon Oct 03, 2016 7:11 am

You see the score increasing causing it to be judged as commercial

pcon2009
Posts: 20
Joined: Wed Sep 28, 2016 5:30 pm

Re: Closed Captioning and Show Title

Post by pcon2009 » Mon Oct 03, 2016 4:15 pm

Okay, great, so it is working then I guess :D Now, in terms of the score increase, as I understand this is applied based on a multiplier. In the Comskip INI file (using the INI editor), I notice there are 4 options for CC:

ccCheck
cc_commercial_type
cc_wrong_type
cc_correct_type

First, I am not sure what ccCheck is and what that should be set to? Then, there are the "type"s. Do these affect the score when comparing to dictionary? I would like to use a larger multiplier when a block is found to have "bad phrases" from the dictionary. (The type detection is okay, but the dictionary detection should make a bigger difference than the types). For example, above you can see that one block was found to have the word Allegra, which I can be almost certain means it is a commercial for that product yet the score was barely increased even though it had the word multiple times. Granted, I don't want one single word to automatically flag it as commercial beyond all other detection methods and score modifiers, but if it says the word more than once, it should get multiplied that number of times, thus resulting in a higher score and flagging for commercial.

Post Reply