Apertium

Research on how to represent feature sequences

Research on how to represent feature (verb ( ), noun ( ), etc.) sequences in the Python programming language. One way of representing feature sequence would be strings. With strings you can represent a sequence of features like this: " " (e.g. the car is fast). Another way would be lists, for example in Python: ["det", "n", "vblex", "adj"]. With feature sequences the order of the features need to be retained, so for example 'set' data structure can't be used here.

Make a test program in Python that takes text formatted in the apertium stream format and puts the feature sequence found there to a data structure of your choosing (string, list, hash, your own object, etc.).

You can use this morphologically analyzed Finnish sentence to test your program: "^kala/kala $ ^ja/ja $ ^peruna/peruna $" If you would use for example that sentence as an input the programs output in string representation should be " ". Note that you just need to have only the first feature of each reading in the feature sequence, e.g., if you have the reading kala/kala you need only consider the feature and can ignore the rest ( , ).

You can use streamparser to parse the text that is in apertium stream format.

Please see our getting started guide for more info.

Task tags

  • python
  • streamparser
  • apertium stream format
  • morphology

Students who completed this task

nuboro

Task type

  • code Code
  • assessment Outreach / Research
close

2016