CCExtractor Development

Produce some TV show-specific dictionaries

We have a generic dictionary with words that need to be capitalized (in English). We use it to correct capitalization in those subtitles that come in all caps (very annoying). A problem we have is that being a single huge dictionary it doesn't really apply to all TV content. For example, suppose we're processing “House” (the TV show). You want House to be capitalized for that show but not in general. So the job is to make one generic dictionary and then some TV-show-specific ones for the most popular shows. You can do the shows you personally like yourself. Each instance of the task is for 3 TV shows: Pick any 3 TV shows you like and generate their show specific dictionary.

What to submit: -A pull request with 3 new dictionaries or -3 dict_{serie's name}.txt files on the dashboard

Examples of existing dictionaries can be found by following the link : https://github.com/CCExtractor/ccextractor/tree/master/Dictionary

It must be new shows (not cancelling, since all the episodes are already available so we don't need to produce new ones).

Task tags

  • character names
  • dictionary
  • capitalization

Students who completed this task

KESHAV GOYAL, Kaushal Dasika

Task type

  • chrome_reader_mode Documentation / Training
  • done_all Quality Assurance

Level

Beginner
close

2017