Workshop for PhD students and young researchers
Grand Meeting for Scandinavian Dialect Syntax
Leikanger, 25 August 2005
Generating a lexicon of Scandinavian modal verbs from a parallel corpus
Gunnar Hrafn Hrafnbjargarson, University of Oslo
Morphology and phonology can in many cases be used to figure out which words correspond to which in Scandinavian. For instance, it is rather easy to figure out which Norwegian personal pronoun corresponds to which in Danish, and even Icelandic or Faroese.
However, when it comes to prepositions and modal verbs we cannot rely on morphology or phonology alone. For example, Norwegian and Danish måtte do not always have the same meaning. and similarly, Icelandic vilja is not used as the future modal as Norwegian and Danish ville.
Instead of relying on morphology or phonology, we can use parallel corpora. Unfortunately, there are not many parallel corpora that include all of the Scandinavian languages and those that exist are maybe not large enough to give reliable results.
Nevertheless, to get a picture of what it could look like, I used the Danish, Faroese, Icelandic, Norwegian. Swedish and English part of a small treebank (The Sophie Treebank) to find out which modal verbs correspond to which. I will discuss the results in my talk.
ADDED 30.08.2005:
Go to THIS PAGE to download presentation and documents related to the talk.
|