1
Project description
1. The Scandinavian "parent project" Scandinavian Dialect Syntax (ScanDiaSyn)
The general objectives of ScanDiaSyn have been described as follows:
- to conduct a systematic and coordinated investigation of syntactic variation across Scandinavian languages and dialects
- to create databases of transcribed and tagged material generally
available and easily accessible for research through a user friendly
interface on the internet
- to initiate theoretically driven research on dialectal syntactic variation in the Scandinavian domain
- to cooperate with other existing dialect syntax projects in
Europe and elsewhere so as to enhance the understanding of linguistic
diversity and microvariation at a general level
There are 9 groups of researchers involved in the Scandinavian project
(Tromsř (NO), Aarhus (DK), Reykjavík (IS), Tórshavn (FO), Trondheim
(NO), Helsinki (FI), Copenhagen (DK), Lund (SW) and Oslo (NO)) and the
central idea is that they will all be planning and conducting research
on syntactic variation in their respective languages and dialects and
that this resesarch will be "systematic and coordinated" in the sense
that the methods of data collection, elicitation and storage (in
databases) will be comparable and compatible to the extent possible and
feasible. While all kinds of "syntactic variation" are in principle of
interest for the project, the following overarching topics have been
outlined at preparatory meetings for the ScanDiaSyn as a whole (they
are described here using common but fairly technical terms that are
largely taken from generative syntax but to some extent going back to
the Danish linguist Diderichsen in the 1940s). It should be noted that
this division or grouping together of topics is simly made for
convenience – it has no particular theoretical status:
- The "pre-field" in the sentence or clause (“forfeltet”, CP,
Left Periphery), for example: subject/verb inversion (V2), question
formation, complementizers, topicalization etc.
- The "mid-field" (“midtfeltet”, IP), for example: V-to-I
movement, Object Shift, placement of adverbs/particles, subject/verb
agreement, finiteness, case marking of subjects, expletive
constructions etc.
- The "end-field" (“sluttfeltet”, VP), for example: the
syntax of verb particles, VP-syntax, complementation,
subcategorization, case marking of objects etc.
- Extraposition, for example: heavy NP/CP shift, Left Dislocation, Hanging Topics, tags and pragmatic particles etc.
- Nominal expressions, for example: article syntax, possessor constructions, pronominal systems, binding etc.
Although the topics listed here are very general (and partially
overlapping), this broad thematic organization is nevertheless quite
likely to be useful when the findings of ScanDiaSyn are applied to a
wider European context, i.e. when the syntax of the Scandinavian
dialects is compared to the syntax of the dialects in other European
languages. There is for instance a partial overlap between the five
major ScanDiaSyn topics and the major topics of the Dutch dialect
syntax project SAND (Syntactische Atlas van de Nederlandse
Dialecten).
Now it is of course impossible for all the groups to investigate all
the topics listed above in detail. In addition, some of the topics will
not be equally interesting in all the languages or dialects – and there
may also be additional topics that are of particular interest in the
relevant language or of particular interest to individual researchers
in the group in question. Hence a complete coordination is neither
possible nor desirable. Furthermore, the "state of the art" will
necessarily be somewhat different from one language area to the other,
as well as the availability of competent researchers. Broadly speaking,
however, each of the research groups will chose from the list a few
phenomena that they will be particularly occupied with, possibly in
addition to topics that will be of comparative interest for members of
other groups or topics that are of special "internal interest", as it
were. In this way each group will both be carrying out comparative
research, partly with special interests of the other groups in mind, as
well as collecting material for its own interest.
Considerable parts of the material collected or prepared in the project
will be pooled in a joint database. The research group in Oslo will
have the responsibility of building up the database and developing the
technical tools and applications needed. This work will obviously
to some extent be based on previous work done in comparable projects
elsewhere, such as the methods used in the presentation of SweDia
2000 and work on the linguistic tagging and
preparation of databases and corpora in the different countries, both
of spoken and written language, including the work done in Iceland in
that area (see "State of the Art" in 14 below).
Although the topics described above are largely couched in technical
terms from generative syntax (partly to ease comparison with other
projects, partly because the original initiative for the Scandinavian
project came from generative syntacticians), there are linguists of
very different persuasions and with quite varying interests in the 9
groups in the ScanDiaSyn project (see the network groups at the
ScanDiaSyn hompage): some are generative syntacticians, some
are sociolinguists, some are specialists in language technology, some
are dialectologists, some specialize in corpus linguistics, others in
conversational analysis or interactional grammar, etc. The leading idea
is that the different types of linguists represented can benefit from
working together and learn from each other. Thus the syntacticians can
for instance define some of the linguistic variables to be investigated
and formulate some of the linguistic hypotheses to be tested, the
sociolinguists can provide insights into data elicitation and ways of
relating linguistic variation to social variables, the language
technologists know how to prepare and handle databases, etc. This way
we hope that the different types of linguists and technologists can
learn from each other and broaden their horizons. Thus the
syntacticians will realize, for instance, that there is more to
(scientific) life than traditional syntactic research and the typical
introspective procedures of judging sentences (see also the discussion
of "state of the art" in 14 and of methodology in 17 below), the
sociolinguists will learn something about linguistic hypotheses and
ideas about universal grammar, the language technicians will be able to
make better use of linguistic concepts and tools in their work, those
who work within the framework of interactional grammar will realize
that they can benefit from some of the insights of theoretical syntax,
etc. The make-up of the groups is quite different, however, and
influenced by the availability of experts in each field and the
relevant work that has already been done. This will be described in
some detail for the Icelandic group below (and for the Faroese one, to
the extent that it is relevant here).
Most of the groups have access to material that has previously been
collected and to some extent even analyzed or tagged. This material
will be investigated and made use of. But since many syntactic
constructions and phenomena are infre¬quent in actual conversation and
written texts, one generally needs a much bigger corpus for conducting
syntactic investigations than is the case with the study of
phonological and morphological phenomena. Hence it is in general not
enough for the purposes of syntactic investigations to collect
spontaneous speech data or consider written texts. Although one will of
course get more types and tokens of the different syntactic
constructions as the size of the corpus is increased, it is simply
impossible to make up a corpus that is big enough too do justice to all
conceivable syntactic constructions. Simply put, a large corpus can
tell you a lot about which constructions are possible but it does not
really tell you which ones are impossible. Hence the syntactician has
to rely on other methods are necessary too, such as question¬naires,
other written or oral tasks, etc. That way it is often possible to get
a more direct channel towards revealing the rules governing syntactic
phenomena. At the same time, such elicitation runs the risk of becoming
artificial to some extent, but there are various ways of trying to
minimize that problem (see e.g. Schütze 1996, Bard et al. 1996, Cowart
1997, Cornips and Poletto 2004 and references cited there for relevant
discussion – see also Rickford 1987). This problem has been extensively
discussed at preparatory meetings for ScanDiaSyn. One method which aims
at "getting the best of both worlds" has been developed in the Dutch
dialect syntax project SAND (cf. above). When this method is used, the
linguists develop a written questionnaire covering the various
phenomena that they are interested in. Then interviews are conducted,
centering around the questionnaires. When there are considerable
dialectal differences, as there was often the case in the Dutch project
and will sometimes be in Mainland Scandinavia, it will be preferable to
have two informants speaking the same dialect discuss the topics
described in the questionnaire. The interview, or discussion, is
recorded and the basic idea is that the material will then both contain
judgments of syntactic examples as well as spontaneous speech (see also
the discussion of "methodology" in 17 below).
In addition to collection of new material, the plan is to include
existing Scandinavian material in the ScanDiaSyn database, such as
existing collections of spontaneous speech. Most of the research groups
have access to such material, including the SweDia material already
mentioned. The SweDia material has already been offered to the
ScanDiaSyn project for use. However, these recordings have only been
transcribed to a very limited extent and it will be the task of the
Lund and Helsinki groups within ScanDiaSyn to transcribe this material.
For Danish dialects there already exists a tagged dialect syntax corpus
of approx. 1 mill words (CorDiale), covering 150 measure points. The
Norwegian groups will also attempt to include existing Norwegian
dialect material in the archives to the extent possible and feasible.
The availability of Icelandic and Faroese material of this kind will be
described below.
2. The Icelandic (and Faroese) project
2.1 Collection of new data in Iceland and the Faroes
The topics that are of particular interest for the Icelandic (and
Faroese) groups are to some extent determined by properties of the
language and the particular linguistic situation. The points that need
to be taken into consideration include the following (Faroese is
included in many instances below because of the close connection
between the Icelandic and Faroese groups and because of the special
comparative interest that Faroese has for Icelandic):
- Icelandic and Faroese have richer inflectional morphology than
the Mainland Scandinavian languages. Hence constructions involving
morphosyntactic phenomena, sucn as case and agreement for instance,
might be of special comparative interest here.
- Icelandic has preserved various syntactic features that have
disappeared in the Mainland Scandinavian languages. Faroese often
occupies a middle ground in this respect, making comparison between
Icelandic, Faroese and Mainland Scandinavian especially interesting.
This is true of subject case marking and stylistic fronting, for
instance.
- Although there are some known dialectal differences in Icelandic
and Faroese, they are relatively minor and people are in general not
aware of dialectal differences in syntax. This does not mean that they
do not exist, however, and we fully expect to discover some that have
hitherto been unknown.
- To the extent that syntactic variation is known in Icelandic (and
Faroese), or has been studied, it seems to be connected to age groups
and social variables rather than to particular geographical areas.
- The linguistic communities in Iceland and the Faroes are much
smaller than those of the Mainland Scandinavian countries. While there
are some 9 million speakers of Swedish and 4.5 million speakers of
Norwegian, for instance, there are less than 300.000 speakers of
Icelandic and less than 50.000 speakers of Faroese.
- Although there are certain similarities between the (official)
language policies in Iceland and the Faroes, e.g. with respect to
emphasis on the creation of new words and opposition to (English)
loans, certain syntactic variants have been frowned upon in Iceland but
not in the Faroes. In general it seems that there is a greater
tolerance with respect to linguistic changes that cannot be traced
directly to foreign influence in the Faroes than there is in Iceland.
This makes certain predictions about the relationship of certain
variants to social class in Iceland on the one hand and in the Faroes
on the other (cf. below).
- The difference between "dialect" and "standard language" does not
really exist in Iceland and the Faroes to the extent that it does in
most countries. People do not switch between "speaking dialect" and
"speaking the standard language" to the extent that they do in most
other countries. This is important because it facilitates data
elicitation to some extent: The investigator does in general not have
to worry about not speaking the same syntactic dialect as the informant
– or speaking the standard language as opposed to some dialect.
- It is generally assumed, on the other hand, that there is
considerable difference between "spoken language" and "written
language", or between different types or styles of written language, or
different genres of texts, although systematic investigation of these
differences is just beginning.
Based on the linguistic situation described above, and on (preliminary)
results of the pilot study described, it is likely that the syntactic
constructions that will be of special interest for the Icelandic (and
Faroese) group in this connection will include some of the ones
listed below. Most of these are constructions that we have reason to
believe to show variation within Icelandic (and Faroese) but some are
mainly included since they may be of particular comparative interest
for linguists elsewhere in Scandinavia. First we list some
constructions where previous research has indicated interesting
variation that can profitably be studied with the aid of written
questionnaires, at least to some extent:
1. Subject case (including the change from oblique to nominative subject, and from accusative to dative ...)
2. The "new impersonal" construction (also known as "the new passive")
3. Extended progressive aspect (extension to new
semantic classes of verbs; possibly involving some change in the
semantics of the construction itself)
4. Long distance reflexives and their relation to the subjunctive.
5. Tense and mood in embedded clauses.
6. Agreement with nominative objects.
7. Possessive constructions and the structure of the extended NP.
8. Loss of case in topicalization structures.
9. Object case.
10. Complex pronominal constructions (each other ...)
11. Expletive constructions
12. Stylistic fronting.
13. Impersonal verbs in control constructions.
14. Tough-movement.
Results of a pilot study indicate, however, that constructions like the
following are judged differently (usually more positively) in oral
interviews than in written responses to questionnaires (partly because
here factors like stress and intonation play a role):
15. Position of adverbs in embedded clauses.
16. Complementizer deletion.
In addition, it seems that certain constructions that have been frowned
upon in schools need to be studied in oral interviews, at least when
adult subjects are involved. These include #1 and #2 above.
2.2 The use of available databases and corpora in Iceland
Various projects involving databases, text collections and corpora will
be connected to or integrated into the present one in Iceland. This is
a part of the general plan for ScanDiaSyn and the situation in Iceland
makes this even more feasible than in many other places.
First, in the Icelandic spoken language project, ÍSTAL, a corpus of
spoken Icelandic was established, based on some 15 hours of
spontaneous natural conversations (31 conversations in all). The
material has been transcribed, using conventional orthography and
including various symbols for marking conversational features such as
hesitations, repetitions, overlapping, interruptions, etc. Methods
developed in work on the Swedish Spoken Language Corpus in Gothenburg
and the British National Corpus were employed in the transcription to
the extent feasible. The corpus has already proved its usefulness in
several areas of research and teaching as it contains information on
aspects of spoken Icelandic never recorded before. The corpus, or at
least parts of it, could obviously be profitably included in the
planned database of ScanDiaSyn, as the inclusion of similar databases
in Scandinavia is also planned. Before this can be done, however, it
needs to be tagged. An Icelandic tagger has been developed (a project
supported by the Language Technology Program of the Icelandic Ministry
of Education) and it could in principle be used on this material. Since
the tagger has been trained exclusively on various kinds of "written"
(as opposed to "transcribed spoken") Icelandic, it would have to be
retrained on this kind of material. That would in itself be a valuable
addition to the tagger project and at the same time this training, or
the mistakes that the tagger would make when applied to this corpus,
would give important information about the differences between the
written and spoken variants of Icelandic. - In addition, it is
necessary to remove various kinds of personal and sensitive
information. Finally, it has to be integrated into the system
which is being developed at Tekstlaboratoriet in Oslo in connection
with ScanDiaSyn.
Second, Finnur Friđriksson, lecturer at the University in Akureyri, has
recorded some 30 hours of spontaneous spoken Icelandic (2-4
participants in each conversation, conversations from 9 different
places in Iceland, 12 subjects from each place). Finnur has collected
this material in connection with his dissertation project at the
University of Gothenburg. In his dissertation he has been studying the
distribution and frequency of various (recent or famous) syntactic
phenomena in Icelandic, including the so-called Dative Sickness (change
from accusative to dative case on the subject of certain verbs) and the
New Impersonal Construction (or New Passive), cf. items 1 and 2 on the
list in 2.1 above. He is a member of the Icelandic ScanDiaSyn group and
is willing to have some of his corpus included in the ScanDiaSyn
database if his informants agree. Before that can be done, the corpus
has to be scanned for sensitive or personal information, tagged etc. It
would obviously be an important addition to the database – and the
application of the Icelandic tagger to this kind of material could also
give important information about the characteristics of spoken vs.
written Icelandic.
Third, work is under way in the creation of a large tagged corpus of
Icelandic texts of different kinds, ranging from various kinds of
books, newspapers, journals and reports to texts from the
Internet. The work on this project will be supported by the Language
Technology Program of the Icelandic Ministry of Education and it is
being hosted by Orđabók Háskólans (The Icelandic Dictionary Project).
As there are undoubtedly interesting syntactic differences between
spoken and written Icelandic, as well as between different types of
texts, access to this corpus will be an important asset to the project
at hand. The tagger that has been developed will mark part of speech,
case, number, gender, tense, mood, etc. But if the corpus is to be
really useful for a syntactic project like the present one, it would
have to be parsed syntactically, giving information about subject,
object, verb phrase, etc. And this brings us to the fourth project.
Fourth, a syntactic parser is being developed by the private company
Friđrik Skúlason. That project is also being supported by the Language
Technology Program of the Icelandic Ministry of Education. Until now
this parser has been trained almost exclusively on typical
newspaper material and it would be very important for the project to
get the opportunity to try it out on various kinds of (tagged) texts.
Applying this parser to the different kinds of texts described above
would at the same time yield interesting information about the
differences between various text types, and even between typical
written language and transcribed spoken language, since the parser will
almost certainly yield interesting but wrong results when applied to a
transcribed corpus of spontaneous speech. Hence Friđrik Skúlason
is connected to the present project (cf. below), as this promises to be
a symbiotic relationship.
As should be clear from this, the present project brings together
investigators from various areas in Iceland. It attempts to make better
use of their expertise and the resources that they have developed, or
are developing, by creating a large umbrella project for them to
cooperate in and share their work and ideas. Hopefully, the results
from the syntactic investigations will shed some light on the syntactic
nature of different texts and thus contribute to the improvement of the
corpora. Conversely, syntactic data discovered in the different
databases and corpora will undoubtedly raise new syntactic questions
which can then be investigated further by using questionnaires and
interviews.
2.3 The partial inclusion of Faroese
The numerous references to Faroese above are explained by the fact that
the "Faroese group" of the ScanDiaSyn project includes some Icelandic
researchers that have been working on Faroese in the past, namely
Höskuldur Ţráinsson, Jóhannes Gísli Jónsson and Ţórhallur Eyţórsson.
Part of the reason is that comparison with Faroese provides an
excellent testing ground because of the similarities between the two
languages. In addition, there are not too many research funds available
to Faroese linguists so cooperation with linguists abroad is always
welcomed by them. Hence some comparative research on selected topics in
Faroese is planned as a part of this Icelandic project (see 17 below),
but it is also hoped that other members of the Faroese group will be
successful in securing some research funds to facilitate the inclusion
of Faroese into the ScanDiaSyn project as a whole. No work on existing
databases in the Faroes is planned as a part of the present project,
for instance, but samples of Faroese texts that were scanned in
connection with a previous project will be fixed up and made accessible
on the Internet as a part of the present project (cf. 17 below).
3. Summary – and some hypotheses
The objectives of this project can then be summed up as follows:
1. To collect new data on
syntactic constructions in Icelandic in a systematic fashion with the
guidelines established by the ScanDiaSyn project in mind. The Icelandic
project has thus an important comparative feature with special emphasis
on Faroese.
2. To develop further and make new
use of various resources that have been created by previous and ongoing
research projects, including databases of spontaneous spoken Icelandic,
a tagged corpus based on a large variety of texts, and a syntactic
parser. The leading idea is that the cooperation between the different
researchers involved and the symbiotic relationship between the
projects in question should lead to an improvement of the resources in
question (the databases/corpora/parser ...) and thus make them even
more useful and usable in the future.
3. To build on previous research
on syntactic variation in Icelandic (and Faroese) and thus add to the
knowledge already established.
4. To contribute to international
cooperation between linguists and researchers in related or connected
fields.
Because of the complexity of the project and the different types of
researchers involved it is not simple to formulate testable research
hypotheses for the project in general. They will vary to some extent
from one researcher to another, depending on their theoretical
persuasions and the type of research they are mostly interested in. The
syntacticians will thus be interested in syntactic characteristics of
the variation, which kinds of variants go together, what kind of
variation (and change) should be possible within the framework of
Universal Grammar, which characteristics are likely to get lost and
why, when language is passed on from one generation to another,
etc. In addition, variation is of interest to theoretical linguists in
and of itself since it has often been claimed that there is no such
thing as "free variation" of syntactic variants (or linguistic variants
in general – see e.g. Höskuldur Ţráinsson 2003 and references cited
there). The sociolinguists will be interested in the ways that the
variants can be linked to sociological differences, including variation
between male and female subjects. Still others will be interested in
trying to characterize the differences between (different kinds of)
written language on the one hand and (transcribed) spontaneous spoken
language on the other.
Given the interests, persuasions and frameworks of the researchers
involved, we can only give a couple of examples of the kinds of
hypotheses that can be formulated and tested (and they can, of course,
be either true or false!):
1. There is no geographically
"conditioned" variation in Icelandic syntax nor in Faroese syntax. When
ther appears to be geographically conditioned variation, there is
always some other explanation behind it, such as a sociolinguistic or
sociological one (e.g., differences w.r.t. education or class). (The
"New Impersonal" construction in Icelandic might provide an intersting
test case.)
2. There is no "free" variation
between syntactic variants – two variants are never completely
equivalent. (Variation in embedded clause word order in Faroese might
be a case in point here.)
3. There is considerable syntactic
variation between the speech of younger and older speakers of Icelandic
and Faroese and this variation represents "ongoing changes", i.e.
changes that are spreading through the linguistic community. The
direction of some of these changes can be predicted on structural
grounds and thus we expect the development to be parallel in both
languages, although the speed of the spreading may vary. A case
in point (no pun intended) would be changes in the case marking of
subjects and objects (Nominative Substitution, Dative Substitution –
cf. e.g. Jóhannes Gísli Jónsson 2003, Jóhannes Gísli Jónsson and
Ţórhallur Eyţórsson 2003a,b). Another predictable development could be
the relationship between long distance reflexivization and mood in
Icelandic: While there is (as far as we know today) a clear
relationship between long distance reflexives and subjunctive in the
speech of most speakers of Icelandic, we might expect this to change in
such a way that some speakers might be able to use long distance
reflexives in subjunctive AND indicative clauses, but we would not
expect any speakers to be able to use long distance reflexives
exclusively in indicative clauses.
4. Linguistic variation typically
stems from changes that occur when language is passed on from one
generation to the next. Hence we do not expect changes to "start out"
among the older generations or innovations to be more common in the
speech of the older generations. (Comparison of the variation in
subject case and the variation fount in "the extended progressive" in
Icelandic might yield different results here.)
5. Socially conditioned
variation in syntax will be found in Icelandic and Faroese to the
extent that the variants in question have been stigmatized or are
considered "bad" by (influential elements in) the linguistic community
and hence fought against or corrected in the schools. Thus we expect to
find socially conditioned variation in the use of subject case in
Icelandic (i.e. with respect to Dative Sickness) but not in Faroese to
the same extent since the development has not really caught the
attention of the language preservers ("Dative Sickness" is not
considered an epidemic in the Faroes).
This should suffice to give an idea of some of the kinds of hypotheses
that can be formulated. By and large, the formulation will be left up
to the individual researchers involved.
References
Bard, E.G., Robertson, D. and Angelica Sorace. 1996. Magnitude estimation of linguistic acceptability. Language 72: 32-68.
Cornips, Leonie. 2000. Spontaneous Speech Data Compared to Elicitation
Data: The Test Effects. A paper presented at a workshop on
Syntactic Microvariation, Meertens Institute, August 30-31, 2000.
(Accessible at: http://www.meertens.
knaw.nl/projecten/sand/sandworkshop/cornips.html ).
Cowart, Wayne. 1997. Experimental Syntax. Applying Objective
Methods to Sentence Judgments. Sage Publications, Thousand Oaks.
Cornips, Leonie, and Cecilia Poletto. 2004. On
standardising syntactic elicitation techniques (part 1). Lingua.
Höskuldur Ţráinsson. 2003. Syntactic Variation, Historical Development,
and Minimalism. Randall Hendrick (ed.): Minimalist Syntax, bls.
152–191. Blackwell, Oxford.
Jóhannes Gísli Jónsson. 2003. Not so Quirky: On Subject Case in
Icelandic. In Ellen Brandner and Heike Zinsmeister (eds.): New
Perspectives on Case Theory, pp.127-163. CSLI Publications,
Stanford.
Jóhannes Gísli Jónsson and Ţórhallur Eyţórsson 2003a. The Case of
Subject in Faroese. Working Papers in Scandinavian Syntax 72:207-231.
Jóhannes Gísli Jónsson and Ţórhallur Eyţórsson. 2003b. Breytingar á
frumlagsfalli í íslensku. [Changes in Subject Case in Icelandic.]
Íslenskt mál og almenn málfrćđi 25:7-40.
Rickford, John. 1987. The Haves and Have Nots: Sociolinguistic Surveys
and the Assessment of Speaker Competence. Language in Society 16:
149-177.
Schütze, Carson T. 1996. The Empirical Base of Linguistics.
Grammaticality judgments and linguistic methodology. Chicago: The
University of Chicago Press.
|