+0

I've been reading up on corpus linguistics and have starting looking around at the COCA and BNC corpora online (as well as the Lexical Tutor website). I'm still trying to figure out how to use all the tools they have but in the mean time I wanted to ask if those corpora would be the right place to go to help figure out some of the usage patterns?

Here are some examples from my classes this week:

1) Trousers. One student saw the word 'trousers' in a dictionary and used that word in class to refer to pants. Here in Canada the term trousers is almost never used.

2) When talking about daily routines, many students like to say When do you go to sleep. Although its not wrong, here the majority of people would say 'go to bed'.

3) Finally, when discussing the difference between wake up and get up (again, here in Canada) I noted that there is no real difference (although wake up might be used in more formal situations).

I know these are only hunches and I have no proof to back up what believe to be true (and I acknowledge that I might actually be wrong!) and that is why I love to have corpus data to to prove qualitatively what I 'feel' to be true.

Do you think that this is possible with a corpus? Are there tools that can can shed some light on these usage patterns? If so, does anyone have any good resources on how to use COCA or the Lexical Tutor (I've tried youtube but there isn't a lot out there!).


Thanks kindly,


David

+0
DC Foster. I'm still trying to figure out how to use all the tools they have but in the mean time I wanted to ask if those corpora would be the right place to go to help figure out some of the usage patterns?

Here are some random thoughts:

I do not know if there are separate corpora for dialects of English other than the two major strains: American and British.

Canadians tend to follow British English, rather than American English.

Indian English has a lot of vestiges of Victorian British English. And there is a large variation in dialects in regions of England, more so than in the US.

I used to go to the COCA (and the British set, too) but lately I have found that Google NGRAM viewer is more interesting and informative, especially in showing language evolution. It has a pattern-matching syntax that I found more useful than the corpora.

Another useful corpora is "fraze.it."

Of course, the initial query is only the starting point; you have to get to the individual quotations for any reasonable analysis.

The OED of course, is the gold standard for usage documentation (current and historical), but it is rather expensive. Your university library may have a subscription.

The best resource with contemporary dialectical usage is wiktionary.

https://en.wiktionary.org/wiki/trousers

  • Pants is about four times more common in the US than trousers, based on use in COCA.
  • Trousers is about nine times more common in the UK than pants, based on use in BNC.
  • Slacks about one tenth as common as pants in the US and trousers in the UK.

But wordnik can be useful too

https://www.wordnik.com/words/trousers

And Wordnet is no longer being updated, and fairly limited in its lexicon, but is interesting for collocations.

https://wordnet.princeton.edu/