Wikimedia

Add reading in of html "lang" property to html-metadata's parseGeneral method

Claim this task at https://phabricator.wikimedia.org/T148303

We use the node.js html-metadata library to scrape metadata from webpages. This metadata is then used to generate a citation for the webpage using citoid. Editors can use this citation when they're writing or editing an article on Wikipedia as a reference.

For this you will need to install node and npm: https://www.joyent.com/blog/installing-node-and-npm/.

You will also need to fork the html-metadata library on github: https://help.github.com/articles/fork-a-repo/

For this task you need to add in reading in of html "lang" property to html-metadata's parseGeneral method. For example, the website http://mno.hu/migr_1834/tellerlevel-cafolat-es-cafolat-700280 puts lang="hu" in its html tag, but this is not read in by the parseGeneral function in the library https://github.com/wikimedia/html-metadata

  • Add the metadata to the parseGeneral method
  • Update the static test html files to contain the lang property in the html tag
  • Add a live test using the example website above to ensure the lang property is being scraped

Task tags

  • node.js
  • javascript

Students who completed this task

Filip

Task type

  • code Code
close

2016