Apertium
Function for calculating feature's probability to appear
As part of generating automatically Constraint Grammars we need to know the probabilities of each feature (e.g. verb (vblex), noun (n)) showing up. It's calculated simply by counting how many times the feature appears and then dividing that by the total number of features in the corpus (a text collection). Make a Python function that takes morphologically analyzed text corpus (in apertium stream format) and then returns the calculated feature probabilities in a dictionary data structure, for example {"n": 0.11, "vblex": 0.33, ...}, where 0.11 etc. are the probabilities (0.11 meaning 11%). Use streamparser for parsing the apertium stream format formatted text.
Task tags
Students who completed this task
nuboro