统计学 相关性 因果

You might remember this simple mantra from your statistics class:

您可能还记得统计课上的这个简单口头禅:

"Correlation does not imply causation."
“相关并不意味着因果关系。”

So maybe you think you know what this phrase means.

因此,也许您认为您知道此短语的含义。

Like, if you studied really hard in statistics, got a good grade, and then got into college, it must mean that you got into college because you aced Statistics class.

就像,如果您真的很努力地学习统计学,获得了一个不错的成绩,然后进入了大学,那一定意味着您进入了大学,因为您获得了统计学课的荣誉。

While that grade, along with the skills you learned, probably helped, you can't ignore the other factors at play - and likely can't argue that your Stats grade was the cause of your acceptance into college.

尽管该年级以及您所学到的技能可能有所帮助,但您不能忽略其他影响因素-并且可能无法说出您的Stats成绩是您被大学录取的原因。

首先,我们为什么将因果关系误认为是因果关系? (First things first - why do we mistake correlation with causation?)

It's easy to think that just because two things seem related, that one must be the cause of the other. But that can be a foolish and sometimes dangerous assumption.

容易想到,仅仅因为两件事看起来相关,所以一件事一定是另一件事的原因。 但这可能是愚蠢的,有时甚至是危险的假设。

For example, suppose you're trying to figure out what makes people less grumpy. You perform a study which finds that, when people get at least x hours of sleep a night, they're less grumpy.

例如,假设您正在尝试弄清什么使人们减少了脾气。 您进行的一项研究发现,当人们每晚至少睡眠x个小时时,他们的脾气就减少了。

But have you taken all factors into account here? Perhaps they also started working out more as a consequence of being well-rested, and this is what altered their moods.

但是您在这里考虑了所有因素吗? 也许由于休息好了,他们也开始锻炼了,这改变了他们的心情。

Not all examples are quite so benign - and some are downright nonsensical.

并非所有示例都如此良性-有些示例完全是荒谬的。

To illustrate how misleading it can be to assume that correlation implies causation, have a look at the following graph from Tyler Vigen's Spurious Correlations:

为了说明假设相关性暗示因果关系有多么令人误解,请看一下Tyler Vigen的Spurious Correlations中的以下图表:

While there happens to be a strong correlation between these two factors, I doubt you could effectively argue that one caused the other. Perhaps this will be a challenge for people to try and prove.

尽管这两个因素之间存在很强的相关性,但我怀疑您是否可以有效地论证一个因素引起了另一个因素。 也许这将是人们尝试证明的挑战。

Here's another gem from Tyler's collection:

这是泰勒收藏中的另一颗宝石:

Look at that beautiful correlation. But you'd be hard pressed to argue that, just because someone ate more cheese, they'd be more likely to fatally entangle themselves in their bed sheets.

看看那美丽的关联。 但是您很难辩称,仅仅因为有人吃了更多的奶酪,他们就更有可能致命地将自己缠在床单上。

统计相关性是什么? (What is correlation in statistics? )

According to the dictionary, a correlation is a mutual relationship or connection between two or more things (or variables) - especially one that is not expected on the basis of chance alone.

根据字典相关性是两个或多个事物(或变量)之间的相互关系或联系-尤其是仅靠偶然性无法预期的事物。

Let's use it in a sentence: The huge size of my homegrown tomatoes seems to correlate with the extra rain we had this summer.

让我们用一个句子来使用它:我自产的西红柿的巨大尺寸似乎与我们今年夏天的额外降雨有关。

Now, here I'm assuming that, because it rained a bit more than usual, my tomato plants went nuts and produced monster tomatoes.

现在,我在这里假设是,因为下雨比平常多一点,所以我的番茄植株去了坚果,产生了巨型番茄。

But is that the only factor? What about the nutrient rich compost I used in my raised beds? What about the quality of the plants I bought from the nursery? What about my careful pruning and tending?

但这是唯一因素吗? 我在高架床上使用的营养丰富的堆肥怎么样? 我从苗圃买的植物的质量如何? 我的仔细修剪和抚育又如何呢?

As you can see, although there is correlation between my large tomatoes and our rainy summer, this doesn't necessarily imply causation.

如您所见,尽管我的大西红柿和我们的多雨的夏天之间存在相关性,但这并不一定意味着因果关系。

什么是统计上的因果关系? (What is causation in statistics?)

Time for another definition. Causation, according to the dictionary, is the act or agency which produces an effect.

是时候重新定义了。 根据字典, 因果关系是产生效果的行为或代理。

Let's get a bit more specific. Causation means that there is a relationship between two events where one event affects the other. In statistics, when the value of an event - or variable - goes up or down because of another event or variable, we can say there was causation. A caused B to happen.

让我们更具体一点。 因果关系是指两个事件之间存在关系,其中一个事件影响另一个事件。 在统计中,当一个事件(或变量)的值由于另一个事件或变量而上升或下降时,我们可以说存在因果关系。 A 导致 B发生。

How about an example for this one? Perhaps you freelance for a magazine that pays by the word. The longer the story (and the more words it contains), the more you get paid.

这个例子怎么样? 也许您是一本按字写真的杂志自由职业者。 故事越长(其中包含的单词越多),您获得的酬劳就越多。

So there's a direct correlation between how many words you write and how much you get paid. But there's also causation (because you wrote more, you got paid more).

因此,您写的单词数与获得的报酬之间有直接的关系。 但是也存在因果关系(因为您写的更多,您得到的薪水也更多)。

为什么这样容易出错? (Why is it so easy to get this wrong?)

Why is it so easy to think that correlation implies causation? Well, if two things seem related, we tend to associate them and assume they impact each other. When the weather's cold, people spend more time inside. Around the holidays, shopping malls are packed. When you take some ibuprofen, your headache goes away.

为什么容易想到关联暗示因果关系呢? 好吧,如果看起来有两件事相关,我们倾向于将它们关联起来并假定它们相互影响。 天气寒冷时,人们会花更多时间在室内。 假期前后,购物中心挤满了人。 服用布洛芬后,头痛就会消失。

While these circumstances certainly are related - and some might even imply causality - they don't necessarily stand up to scientific analysis.

尽管这些情况当然是相关的-甚至可能暗示因果关系-但它们不一定能经受科学分析。

There are a few reasons we might mistakenly infer causation from correlation.

有几个原因可能会导致我们错误地从相关性推断因果关系。

什么是混杂变量? (What is a Confounding Variable?)

First of all, you might have a confounding variable in the mix. This is a variable that affects both the independent and dependent variables in your relationship - and so confounds your ability to determine the nature of that relationship.

首先,您可能在混合中包含一个令人困惑的变量 。 这是一个会影响您关系中自变量和因变量的变量,因此会混淆您确定该关系性质的能力。

For example, if a new family moves into a neighborhood, and crime goes up, the residents in that area might assume it's because of that new family. But what if, at the same time, a detention center opened nearby? That's the more likely cause of the increased crime.

例如,如果一个新家庭搬到附近,犯罪率上升,那么该地区的居民可能会认为这是因为这个新家庭。 但是,如果同时在附近开设拘留所怎么办? 这是犯罪增加的可能原因。

什么是反向因果关系? (What is Reverse Causation?)

Second, you might be dealing with reverse causation. This happens when, instead of correctly assuming that A causes B, you get them mixed up and assume that B causes A.

其次,您可能正在处理反向因果关系 。 当您没有正确地假设A导致B,而是混淆了它们并假设B导致A时,就会发生这种情况。

It might be hard to imagine how this happens, but think of how solar panels work. They produce more power when the sun is in the sky longer.

很难想象这是如何发生的,但要想想太阳能电池板是如何工作的。 当太阳在天空中停留的时间更长时,它们会产生更多的能量。

But the sun isn't in the sky longer because the panels are producing more power. The panels are producing more power because the sun shines for longer periods of time.

但是,太阳不再在天空中停留更长的时间,因为这些面板产生的功率更大。 面板产生更多的功率,因为​​阳光持续更长的时间。

什么是巧合? (What is a Coincidence?)

Third, we must not forget the power of coincidence. When two things happen to occur at the same time, it's tempting to see causation. But just like that silly graph above, with the arcades and CS degrees, many are just coincidences.

第三,我们决不能忘记巧合的力量。 当同时发生两件事时,很容易看到因果关系。 但是就像上面的那个愚蠢的图表,加上拱廊和CS角度,很多只是巧合。

最后-我们为什么要关心? (In the end - why do we care?)

Perhaps you're trying to figure out whether a certain new drug makes patients feel better. Or you'd like to know what makes people buy a certain product.

也许您正在尝试弄清楚某种新药是否会使患者感觉更好。 或者,您想知道是什么使人们购买某种产品。

Whatever your motivation, it's often very useful to figure out whether A causes B, along with how and why.

无论您的动机如何,找出A是否导致B以及原因和原因通常非常有用。

But as we've seen, it's not that easy. You've got to control as many factors as you can, reduce the likelihood of confounding variables and coincidences, and pare down the data to what's relevant.

但是,正如我们所见,这并不容易。 您必须控制尽可能多的因素,减少混淆变量和巧合的可能性,并将数据缩减为重要的内容。

We won't get into the deeper philosophical question of how we can really establish causation without a doubt. That's for another time.

毫无疑问,我们不会深入探讨如何真正建立因果关系的哲学问题。 那是另一回事了。

At least now you know that - even though two events or variables may seem related - it doesn't mean that one has a direct causal affect on the other.

至少现在您知道-尽管可能看起来有两个事件或变量相关联-并不意味着一个事件或变量对另一个事件有直接因果关系。

翻译自: https://www.freecodecamp.org/news/why-correlation-does-not-imply-causation-the-meaning-of-this-common-saying-in-statistics/

统计学 相关性 因果

Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐