Comskip Support Forum

Comskip is a free commercial detector, browse the forum for more information
It is currently Thu Aug 22, 2019 2:08 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Fri Sep 30, 2016 7:16 am 
Offline

Joined: Wed Sep 28, 2016 5:30 pm
Posts: 20
I still have a different question "pending" in another post, but I wanted to keep things organized and ask this in a different post... I was wondering if the Closed Captioning detection method is working for some people? I have tried enabling it in the .ini, and in the log of the files, the transcript of the CC shows up, so I know the captions are actually being detected, but it doesn't seem to have any actual effect on the commercial detection. Even if I add a specific line/phrase from a specific commercial to the .dictionary file, it will still be present in the output. I was hoping to use this method as a fine-tuning/fall-back method to help remove some pesky ads that aren't always removed by other methods. It seems like this could have some huge potential, but I am not sure why it isn't working, nor can I find any documentation on how to make sure I am doing this properly.

Also, in the console window while running, I have seen the "show title" is detected. Is there a way to get this to work to help remove any bits of show before and/or after the recording? Most of what I record starts close to on time, but just to be safe I usually add about 2 minutes to the start and end time of a recording. Sometimes this results in the recording starting with the end of the previous show - sometimes it is starting during a commercial. I know there are settings to remove show before first detected commercial, but I am under the impression that if a recording DOES NOT have this bit of show before a commercial break, that it will remove part of the show I am trying to keep. Is this true, or am I looking at this all wrong? And in either case, what with the show title information? Is that something I can use for tuning, or is not implemented at all?

Overall, I feel like I am getting very close to a usable .ini for most if not all of the shows I want to record, just needs a few tweaks here and there, so I really appreciate the help!


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 30, 2016 8:59 am 
Offline
Site Admin

Joined: Sun Aug 21, 2005 3:49 pm
Posts: 3287
Not sure if CC is not broken.
Very little effort has been spend on keeping CC processing operational.
Maybe if I have more time I will look at it again.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 30, 2016 5:09 pm 
Offline

Joined: Wed Sep 28, 2016 5:30 pm
Posts: 20
Thank you for the info on CC. I will do a bit of experimentation to see if I can get it to do anything, or if it indeed is broken. Personally, I think this method could turn out to be a lot more reliable than some of the other methods, especially for shows with too little logo or too much logo, etc. Since in the USA, the FCC has mandates for including captioning on most programming, shows should in theory be 99% captioned, with a few exceptions. If a commercial is captioned, it would be subject to the dictionary. If no captions are detected, that block could simply (a) be deemed a commercial or (b) caption method turns off (ie, when logo turns off if not enough is found, etc) and other methods are used. A lot of commercials these days can be IDs in just a few words of CC that are common across many of them (for example, most prescription drugs "Ask your doctor", or for automobile commercials "all-new 2017" etc) so if you do have any interest in working on CC again, please feel free to let me know as I would be happy to do some testing for you and work on a fresh dictionary file.


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 01, 2016 8:31 pm 
Offline
Site Admin

Joined: Sun Aug 21, 2005 3:49 pm
Posts: 3287
Can you first test if the CC is listed in the log file?


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 02, 2016 7:39 am 
Offline

Joined: Wed Sep 28, 2016 5:30 pm
Posts: 20
Yes, I can confirm the CC is listed in the log file. As a quick example, here is the first section of show from an old Seinfeld episode, which includes the first lines of the show, goes through a commercial, and back into the show:

Code:
Closed caption transcript
--------------------
0) S:     1 E:    33 L:  31 HE GOT YOU TO JOIN A BOOK CLUB?
1) S:   230 E:   249 L:  16  I GOT A FEELING
2) S:   266 E:   323 L:  50  I'M GOING TO BE MUCH SMARTER THAN YOUPRETTY SOON.
3) S:   426 E:   459 L:  29  I THINK THAT STATEMENT ALONE
4) S:   532 E:   575 L:  39  REFLECTS YOUR BURGEONING INTELLIGENCE.
35) S:  4304 E:  4349 L:  41  I absolutely love my New York apartment,
36) S:  4350 E:  4459 L:  28  but the rent is outrageous.
37) S:  4466 E:  4573 L:  24  Good thing GEICO offers
38) S:  4580 E:  4635 L:  30  affordable renters insurance.
39) S:  4642 E:  4725 L:  20  With great coverage
40) S:  4732 E:  4773 L:  34 it protects my personal belongings
41) S:  4780 E:  4873 L:  25  should they get damaged,
42) S:  4880 E:  4929 L:  21  stolen or destroyed.
43) S:  4936 E:  5003 L:  26  [doorbell] Uh, excuse me.
44) S:  5010 E:  5079 L:  10  Delivery.
45) S:  5086 E:  5171 L:   5  Hey.
46) S:  5178 E:  5247 L:   9  Lo mein,
47) S:  5254 E:  5323 L:  18  Szechwan chicken,
48) S:  5330 E:  5425 L:  12  chopsticks,
49) S:  5432 E:  5493 L:  10  soy sauce
50) S:  5500 E:  5577 L:  34  and you got some fortune cookies.
51) S:  5584 E:  5681 L:  17  Have a good one.
52) S:  5688 E:  5733 L:  39  Ah, these small New York apartments...
53) S:  5740 E:  5877 L:  26   Protect your belongings.
54) S:  5884 E:  5953 L:  20   Let GEICO help you
55) S:  5960 E:  6017 L:  25   with renters insurance.
56) S:  6166 E:  6201 L:  31  [ELAINE] HEY, WHAT'S GOING ON?
57) S:  6288 E:  6301 L:  11  NEW COUCH.
58) S:  6340 E:  6359 L:  16  NEW COUCH? WHY?
59) S:  6402 E:  6443 L:  37  KNOW THE BEST PART ABOUT THIS COUCH?
60) S:  6522 E:  6575 L:  46 IT DOESN'T FOLD OUT, SO NO ONE CAN SLEEP OVER.


(NOTE: I skipped lines 5-34 in order for this post not to be too long, but other than that, this is copy/pasted from the log.)


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 02, 2016 8:10 am 
Offline
Site Admin

Joined: Sun Aug 21, 2005 3:49 pm
Posts: 3287
Excellent
Can you check if words in the dictionary are recognized


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 02, 2016 3:19 pm 
Offline

Joined: Wed Sep 28, 2016 5:30 pm
Posts: 20
A bit further down in the log file from this same video that I processed, we get to the following. I removed the lines where it is searching for words and phrases that were not found, but left those where it IS finding certain "good" and "bad" ones:

Code:
Starting to process dictionary
-------------------------------------
Searching for: national captioning
NATIONAL CAPTIONING found in cc_text_block 568
Block 12 score:   Before - 0.00   After - 0.00
Finished with good phrases.  Now starting bad phrases.
Searching for: allegra
ALLEGRA found in cc_text_block 663
Block 19 score:   Before - 0.01   After - 0.02
ALLEGRA found in cc_text_block 667
Block 19 score:   Before - 0.02   After - 0.02
ALLEGRA found in cc_text_block 669
Block 19 score:   Before - 0.02   After - 0.02
ALLEGRA found in cc_text_block 670
Block 19 score:   Before - 0.02   After - 0.02
Searching for: allergy symptoms
ALLERGY SYMPTOMS found in cc_text_block 666
Block 19 score:   Before - 0.02   After - 0.02
Searching for: ask your doctor
ASK YOUR DOCTOR found in cc_text_block 662
Block 17 score:   Before - 10.13   After - 10.63
Searching for: flonase
FLONASE found in cc_text_block 571
Block 14 score:   Before - 29.57   After - 31.04
FLONASE found in cc_text_block 579
Block 14 score:   Before - 31.04   After - 32.60
FLONASE found in cc_text_block 581
Block 14 score:   Before - 32.60   After - 34.23
FLONASE found in cc_text_block 585
Block 14 score:   Before - 34.23   After - 35.94
Searching for: geico
GEICO found in cc_text_block 37
Block 3 score:   Before - 96.00   After - 100.80
GEICO found in cc_text_block 54
Block 3 score:   Before - 100.80   After - 105.84
GEICO found in cc_text_block 609
Block 15 score:   Before - 19.71   After - 20.70
Dictionary processed successfully
H4 Added cblock 20 because of large black gap with cblock 19
Threshold used - 1.0500   After rounding - 1.0500


(Not sure if the two lines after "dictionary processed successfully" apply to the CC processing, but the next section in the log is under a different header Initial Commercial List)


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 03, 2016 7:11 am 
Offline
Site Admin

Joined: Sun Aug 21, 2005 3:49 pm
Posts: 3287
You see the score increasing causing it to be judged as commercial


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 03, 2016 4:15 pm 
Offline

Joined: Wed Sep 28, 2016 5:30 pm
Posts: 20
Okay, great, so it is working then I guess :D Now, in terms of the score increase, as I understand this is applied based on a multiplier. In the Comskip INI file (using the INI editor), I notice there are 4 options for CC:

ccCheck
cc_commercial_type
cc_wrong_type
cc_correct_type

First, I am not sure what ccCheck is and what that should be set to? Then, there are the "type"s. Do these affect the score when comparing to dictionary? I would like to use a larger multiplier when a block is found to have "bad phrases" from the dictionary. (The type detection is okay, but the dictionary detection should make a bigger difference than the types). For example, above you can see that one block was found to have the word Allegra, which I can be almost certain means it is a commercial for that product yet the score was barely increased even though it had the word multiple times. Granted, I don't want one single word to automatically flag it as commercial beyond all other detection methods and score modifiers, but if it says the word more than once, it should get multiplied that number of times, thus resulting in a higher score and flagging for commercial.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group