Research on how to represent feature sequences

Apertium

Research on how to represent feature (verb ( ), noun ( ), etc.) sequences in the Python programming language. One way of representing feature sequence would be strings. With strings you can represent a sequence of features like this: " " (e.g. the car is fast). Another way would be lists, for example in Python: ["det", "n", "vblex", "adj"]. With feature sequences the order of the features need to be retained, so for example 'set' data structure can't be used here.

Make a test program in Python that takes text formatted in the apertium stream format and puts the feature sequence found there to a data structure of your choosing (string, list, hash, your own object, etc.).

You can use this morphologically analyzed Finnish sentence to test your program: "^kala/kala $ ^ja/ja $ ^peruna/peruna $" If you would use for example that sentence as an input the programs output in string representation should be " ". Note that you just need to have only the first feature of each reading in the feature sequence, e.g., if you have the reading kala/kala you need only consider the feature and can ignore the rest ( , ).

You can use streamparser to parse the text that is in apertium stream format.

Please see our getting started guide for more info.

Task tags

python
streamparser
apertium stream format
morphology

Students who completed this task

nuboro

Task type

Code
Outreach / Research