**37 minutes**. Ask a Question.

Continuing from here:

http://www.EnglishForward.com/English/MeOrI/5/cdpng/Post.htm

MrP

http://www.EnglishForward.com/English/MeOrI/5/cdpng/Post.htm

BokehAll those occure on both sides of the equation. In maths/statistics when that happens such effects are considered cancelled.How do you know they occur equally on both sides, out of interest?

MrP

1 2

Comments

What do you mean by totals? Using the number of results Google gives on the first page? If so, than that has no pros at all, in my opinion, and I told you why. Just look at that example I gave earlier: if you don't check the real number of pages, you'll see that one string gets 3,380 hits and the other 94,400 and you'll think that the latter is more common on the net. There are lot of examples like that one, just try some searches.totals.As you said, we need to limit results in order to be able to check them. Oh, ok, I'll give a little example of what I do. The first one who starts laughing is a dead man, it that clear? LOL

... orUs girls are... Hmmm:We girls are"us girls are" site:www.myspace.com ---> 1,820 results indicated on the first page--> 69 real results"we girls are" site:www.myspace.com ---> 139 results indicated on the first page--> 20 real resultsSo "us girls are" seems 13 times more common according to the data on the first page, but it is actually 3.5 times more common.

Now, what can you say about those real results? Well, you should check them. Not all of them, but when there aren't too many it is easier to take a quick look and see if most of them are relevant. Then, you should check some pages to see the context they are in. For example, in blogs, it's a good thing to check the profile and see if that kind English could come from a native speaker. I've found a lot of weird things on myspace, but I then realized they all came mainly from non-natives.

Conclusion: I don't know if there is anything to conclude, but it's clear that there's a lot of native speakers who would naturally say

"...hope you boys show up in your big boy pants, cause

Freakin sweeeeet...us girlsare gonna bring the house down!"KooyeenBokehFor instance, on the first page for "us three are", I find: In the first page for "we three are", on the other hand, I find 10 cases where "we three are" means exactly that. (Remarkably, the result is the same for page 2 in each case.)

If we assume that this is a fair representation of the distribution, then allowing for the fact that I only show 18100 hits for "we three are" and 12100 for "us three are", and excluding the autoreferential hit, though including the link, genuine cases of "us three are" amount to only 16% of the total, rather than the 40% that the "it all evens out" method would derive from the same figures.

Interestingly,

"us three are" site:www.EnglishForward.comreturns 2 different occurrences, while"we three are" site:www.EnglishForward.comreturns 2 versions of the same occurrence. So even in the microcosm of our original thread, the googles are erratic.As I say, I don't deny that it's a useful tool; but we can't take the totals at face value.

MrP

MrPedanticYep, that must be true but, as I told you, if you try some searches, you'll notice that the number of results on the first page is definitely not a linear function of the real results. Plus, you'll notice that it is not even a function of the real results. The number of results shown on the first page must depend on a lot of things (could be number of links to certain pages, page rankings, type of websites, etc.), but they don't depend only on the number of real results. It is probably a function of all those variables [f(x

_{1},x_{2},x_{3},_{ }... x_{n})], but it's not a function of the number of real results [~~f(n~~]. And even if it was, it would definitely be non-linear, so it would be difficult to understand and compare the results, unless you knew the function. And even if you wanted to find that non-linear function, you could draw a graph of_{r})real results vs estimated results, but then you'd have no way to go on when you reach 999 (Google only shows 999 real results at most). So "big numbers" would have no sense at all in any case.KooyeenLikewise, in some strange but not exactly equivalent way, for lookups of "I is" vs. "I am". Whatever 'unfairness' is built into the one search is built into the other search.

A business manager once said to me that he didn't care if the cost figures weren't correct, as long as they were incorrect in the same way this year as they were last year.

We don't know the exact number of stars in the universe, but compared to planets visible from earth, "lots" is a good enough estimate. That even includes the case where we take the controversy surrounding Pluto in account.

And anyway, I've learned tons of idiomatic Spanish and French by Googling to see which of two of my guesses is 'correct' for how to express some thought or another. I've checked my results with native speakers, and 95% of the time there was nothing about using Google that threw me off the correct path.

Obviously, for situations where fine distinctions are needed, like exactly how many angels can dance on this pinhead vs. that pinhead, the noise level is greater than the signal, and all bets are off -- probably.

CJ

CalifJimI do not know much about probability et cetera, but I do think Google povides an unprecedented corpus. The question is whether it serves the purpose, that is, leading the searcher to the intended destination. When I carry out a search about a phrase, I generally doublecheck the reult with NYTimes or bbc.c.o.uk. Sometimes the result is consistent, sometimes not.

------------------------------------------------------------------------------

"risky roads" = 860 (google) // 0 (nyt) // 14 (bbc)

"hazardous roads" = 11.600 (google) // 6 (nyt) // 7 (bbc)

--------------------------------------------------------------------------

"the edge of the precipice" = 40.300 (google) // 9 (nyt) // 50 (bbc)

"the rim of the precipice" = 783 (google) // 0 (nyt) // 0 (bbc)

"the verge of the precipice" = 4.630 (google) // 0 (nyt) // 0 (bbc)

--------------------------------------------------------------------------

"dull sound" = 44.100 (google) // 6 (nyt) // 9 (bbc)

"drab sound" = 196 (google) // 0 (nyt) // 1 (bbc)

"monotonous sound" = 14.300 (google) // 1 (nyt) // 4 (bbc)

--------------------------------------------------------------------------

I have a tentative approach towards google results. More examples can be given, but the numbers above show that there is not always consistency in restricted and non-restricted results.

LinguaphileBokehMrPedantic