Add reading in of html "lang" property to html-metadata's parseGeneral method
Claim this task at https://phabricator.wikimedia.org/T148303
We use the node.js html-metadata library to scrape metadata from webpages. This metadata is then used to generate a citation for the webpage using citoid. Editors can use this citation when they're writing or editing an article on Wikipedia as a reference.
For this you will need to install node and npm: https://www.joyent.com/blog/installing-node-and-npm/.
You will also need to fork the html-metadata library on github: https://help.github.com/articles/fork-a-repo/
For this task you need to add in reading in of html "lang" property to html-metadata's parseGeneral method. For example, the website http://mno.hu/migr_1834/tellerlevel-cafolat-es-cafolat-700280 puts lang="hu" in its html tag, but this is not read in by the parseGeneral function in the library https://github.com/wikimedia/html-metadata
- Add the metadata to the parseGeneral method
- Update the static test html files to contain the lang property in the html tag
- Add a live test using the example website above to ensure the lang property is being scraped