CCExtractor Development

Debugging a crash while processing DVB subtitles

CCExtractor supports a lot of different standards for extracting captions/subtitles from video files in almost any language. However, occasionally we still encounter issues with certain recordings. This is one of them. The samples we got result in either a crash of CCExtractor or just garbage output.

The issue linked in the external URL will give you the sample we have for this issue, and in order to solve this task, we'd like you to dig into why this sample could be producing issues.

We already did a bit of digging by ourselves, and we are certain that it is related to OCR. This conclusion was reached because if we use tesseract there's a crash, when there's no support for OCR the DVB isn't detected (and then it falls back to the teletext streams). If exporting to .srt using 0.77 we get an empty file.

To complete this task you need some proficiency with running a debugger on a program, so you can try to trace back the origin of the issue.

We expect either a report indicating why it's impossible to extract captions from it, or a root cause in case you find out why it isn't. Bonus points if you can open a PR with a fix.

Task tags

  • debugging
  • c
  • ocr
  • crash
  • dvb

Students who completed this task

Harry Yu

Task type

  • code Code
close

2017