控制變量 Covariate
控制變量:這些會(huì)影響因變量的因素是研究者不愿意看到的,它們的存在會(huì)干擾研究者分析自變量對(duì)因變量的影響??刂谱兞坑址Q為“額外變量”,是必須被想辦法施加控制或采用統(tǒng)計(jì)方法排除干擾的因素。
如果感覺上面那段話太抽象,我下面說個(gè)具體的例子解釋一下。就像昨天那出戲,我們想知道年輕人生活地點(diǎn)的差異會(huì)不會(huì)影響結(jié)婚年齡,但千人千面,漂亮的討人喜歡,丑了自然在婚戀市場(chǎng)上行情堪憂。因此將長(zhǎng)相列為控制變量,就是排除這各因素的影響,讓我們能夠聚焦于地點(diǎn)差異對(duì)結(jié)婚年齡的影響。
科學(xué)研究中,我們通常不可能保證控制變量全部相同,而是采取統(tǒng)計(jì)方法排除其對(duì)因變量的影響。然而在現(xiàn)實(shí)生活中,我們很難做到這種“統(tǒng)計(jì)排除”。所以,理解控制變量實(shí)際上就是讓我們分析因果(自變量影響因變量)關(guān)系時(shí),留意是否有需要控制的因素沒有被控制,如果沒有控制,那我們就有理由懷疑這種因果關(guān)系。
值得強(qiáng)調(diào)的是,并不是除了自變量外所有能影響因變量都是控制變量,中介變量和調(diào)節(jié)變量都能影響因變量。因此控制變量是一個(gè)相對(duì)的概念,主要看我們的研究目的或要弄清的問題。比如講生活地點(diǎn)對(duì)結(jié)婚年齡的影響,就要控制長(zhǎng)相因素,如果我們研究長(zhǎng)相對(duì)結(jié)婚年齡的影響,那長(zhǎng)相就是自變量了。
調(diào)節(jié)變量(moderator)和中介變量(mediator)是兩個(gè)重要的統(tǒng)計(jì)概念,它們都與回歸分析有關(guān)。相對(duì)于人們關(guān)注的自變量和因變量而言,調(diào)節(jié)變量和中介變量都是第三者,經(jīng)常被人混淆。從文獻(xiàn)上看,存在的問題主要有如下幾種: (1)術(shù)語混用或換用,兩個(gè)概念不加區(qū)分。例如,在描述同一個(gè)過程時(shí),既使用調(diào)節(jié)過程的術(shù)語又使用中介過程的術(shù)語(2)術(shù)語和概念不一致。如研究的是調(diào)節(jié)過程,卻使用中介的術(shù)語。(3)術(shù)語和統(tǒng)計(jì)分析不一致。如使用了中介變量的術(shù)語,卻沒有做相應(yīng)的統(tǒng)計(jì)分析。出現(xiàn)前面的任何一個(gè)問題都會(huì)使統(tǒng)計(jì)結(jié)果解釋含糊不清,往往導(dǎo)致錯(cuò)誤結(jié)論。僅在兒童臨床心理和少兒心理方面的研究文獻(xiàn)中, Holmbeck就指出了不少誤用的例子。 國內(nèi)涉及中介變量的文章不多,涉及調(diào)節(jié)變量的就更少。從國外的情況看,一旦這方面的定量分析多起來,誤用和混用的情況也就可能多起來,所以讓應(yīng)用工作者正確理解和區(qū)分中介變量和調(diào)節(jié)變量,會(huì)用適當(dāng)?shù)姆椒ㄟM(jìn)行統(tǒng)計(jì)分析,對(duì)提高心理科學(xué)的研究水平具有積極意義。 調(diào)節(jié)變量 如果變量Y與變量X的關(guān)系是變量M 的函數(shù),稱M 為調(diào)節(jié)變量。就是說, Y與X 的關(guān)系受到第三個(gè)變量M 的影響。調(diào)節(jié)變量可以是定性的(如性別、種族、學(xué)校類型等) ,也可以是定量的(如年齡、受教育年限、刺激次數(shù)等) ,它影響因變量和自變量之間關(guān)系的方向(正或負(fù))和強(qiáng)弱. 例如,學(xué)生的學(xué)習(xí)效果和指導(dǎo)方案的關(guān)系,往往受到學(xué)生個(gè)性的影響:一種指導(dǎo)方案對(duì)某類學(xué)生很有效,對(duì)另一類學(xué)生卻沒有效,從而學(xué)生個(gè)性是調(diào)節(jié)變量。又如,學(xué)生一般自我概念與某項(xiàng)自我概念(如外貌、體能等)的關(guān)系,受到學(xué)生對(duì)該項(xiàng)自我概念重視程度的影響:很重視外貌的人,長(zhǎng)相不好會(huì)大大降低其一般自我概念;不重視外貌的人,長(zhǎng)相不好對(duì)其一般自我概念影響不大,從而對(duì)該項(xiàng)自我概念的重視程度是調(diào)節(jié)變量。
在做調(diào)節(jié)效應(yīng)分析時(shí),通常要將自變量和調(diào)節(jié)變量做中心化變換(即變量減去其均值)。 最簡(jiǎn)單常用的調(diào)節(jié)模型,即假設(shè)Y與X 有如下關(guān)系
Y = aX + bM + cXM + e (1)
可以把上式重新寫成
Y = bM + ( a + cM ) X + e
對(duì)于固定的M ,這是Y對(duì)X 的直線回歸。Y與X 的關(guān)系由回歸系數(shù)a + cM 來刻畫,它是M 的線性函數(shù), c衡量了調(diào)節(jié)效應(yīng)(moderating effect)的大小。
調(diào)節(jié)效應(yīng)與交互效應(yīng)
對(duì)模型中調(diào)節(jié)效應(yīng)的分析主要是估計(jì)和檢驗(yàn)c。如果c顯著(即H0∶c = 0的假設(shè)被拒絕) ,說明M 的調(diào)節(jié)效應(yīng)顯著。熟悉交互效應(yīng)( interactioneffect)的讀者可以從模型看出, c其實(shí)代表了X與M 的交互效應(yīng),所以這里的調(diào)節(jié)效應(yīng)就是交互效應(yīng)。這樣,調(diào)節(jié)效應(yīng)與交互效應(yīng)從統(tǒng)計(jì)分析的角度看可以說是一樣的。 然而,調(diào)節(jié)效應(yīng)和交互效應(yīng)這兩個(gè)概念不完全一樣。在交互效應(yīng)分析中,兩個(gè)自變量的地位可以是對(duì)稱的,其中任何一個(gè)都可以解釋為調(diào)節(jié)變量;也可以是不對(duì)稱的,只要其中有一個(gè)起到了調(diào)節(jié)變量的作用,交互效應(yīng)就存在。這一點(diǎn)從有關(guān)討論交互效應(yīng)的專著中可以看出(例如,顯變量之間的交互效應(yīng),潛變量之間的交互效應(yīng) 。但在調(diào)節(jié)效應(yīng)中,哪個(gè)是自變量,哪個(gè)是調(diào)節(jié)變量,是很明確的,在一個(gè)確定的模型中兩者不能互換。例如,要研究數(shù)學(xué)能力的性別差異,將年級(jí)作為調(diào)節(jié)變量,這個(gè)問題關(guān)注的是性別差異,以及性別差異是否會(huì)隨年級(jí)而變化。如果從小學(xué)一年級(jí)到高中三年級(jí)都獲得了各年級(jí)學(xué)生有代表性的樣本,每個(gè)年級(jí)各用一份測(cè)試題,所得的數(shù)據(jù)就可以進(jìn)行上述分析。但同樣的數(shù)據(jù)卻不能用于做年級(jí)為自變量、數(shù)學(xué)能力為因變量、性別為調(diào)節(jié)變量的分析,因?yàn)楦髂昙?jí)的測(cè)試題目不同,得分沒有可比性,因而按調(diào)節(jié)效應(yīng)的分析方法 ,分別不同性別做數(shù)學(xué)能力對(duì)年級(jí)的回歸沒有意義。要做數(shù)學(xué)能力對(duì)年級(jí)的回歸,應(yīng)當(dāng)用同一份試題測(cè)試所有年級(jí)的學(xué)生。
調(diào)節(jié)效應(yīng)分析方法 調(diào)節(jié)效應(yīng)分析和交互效應(yīng)分析大同小異。這里分兩大類進(jìn)行討論。一類是所涉及的變量(因變量、自變量和調(diào)節(jié)變量)都是可以直接觀測(cè)的顯變量(observable variable) ,另一類是所涉及的變量中至少有一個(gè)是潛變量( latent variable) 。
顯變量的調(diào)節(jié)效應(yīng)分析方法 調(diào)節(jié)效應(yīng)分析方法根據(jù)自變量和調(diào)節(jié)變量的測(cè)量級(jí)別而定。變量可分為兩類, 一類是類別變量( categoricalvariable) ,包括定類和定序變量,另一類是連續(xù)變量( continuous variable) ,包括定距和定比變量。定序變量的取值比較多且間隔比較均勻時(shí),也可以近似作為連續(xù)變量處理。表1分類列出了顯變量調(diào)節(jié)效應(yīng)分析方法。
當(dāng)自變量和調(diào)節(jié)變量都是類別變量時(shí)做方差分析。當(dāng)自變量和調(diào)節(jié)變量都是連續(xù)變量時(shí),用帶有乘積項(xiàng)的回歸模型,做層次回歸分析: ( 1)做Y對(duì)X和M 的回歸,得測(cè)定系數(shù)R21。( 2)做Y對(duì)X、M 和XM 的回歸得R22 ,若R22 顯著高于R21 ,則調(diào)節(jié)效應(yīng)顯著;或者,做XM 的偏回歸系數(shù)檢驗(yàn),若顯著,則調(diào)節(jié)效應(yīng)顯著。
當(dāng)調(diào)節(jié)變量是類別變量、自變量是連續(xù)變量時(shí),做分組回歸分析。但當(dāng)自變量是類別變量、調(diào)節(jié)變量是連續(xù)變量時(shí),不能做分組回歸,而是將自變量重新編碼成為偽變量( dummy variable) ,用帶有乘積項(xiàng)的回歸模型,做層次回歸分析。 中介變量的定義 考慮自變量X 對(duì)因變量Y的影響,如果X 通過影響變量M 來影響Y,則稱M 為中介變量。例如,上司的歸因研究:下屬的表現(xiàn)———上司對(duì)下屬表現(xiàn)的歸因———上司對(duì)下屬表現(xiàn)的反應(yīng),其中的“上司對(duì)下屬表現(xiàn)的歸因”為中介變量 。
如果一個(gè)變量與自變量或因變量相關(guān)不大,它不可能成為中介變量,但有可能成為調(diào)節(jié)變量。理想的調(diào)節(jié)變量是與自變量和因變量的相關(guān)都不大。有的變量,如性別、年齡等,由于不受自變量的影響,自然不能成為中介變量,但許多時(shí)候都可以考慮為調(diào)節(jié)變量。對(duì)于給定的自變量和因變量,有的變量做調(diào)節(jié)變量和中介變量都是合適的,從理論上都可以做出合理的解釋。
一般說來,簡(jiǎn)單的分析該變量是否是中介變量可以用線性回歸,但要更嚴(yán)謹(jǐn)?shù)脑挘鸵捎媒Y(jié)構(gòu)方程(結(jié)構(gòu)方程正在學(xué)習(xí)中,所以我一定要好好學(xué),高人也只是告訴我如何用線性回歸來分析數(shù)據(jù)來證明該變量是否是中介變量)一般分為三步,首先是分別檢驗(yàn)每一個(gè)變量(包括自變量和第三變量)的主效應(yīng)是否顯著;第二步是將自變量放入回歸方程中,檢驗(yàn)自變量的效應(yīng);第三步,將第三變量也移入回歸方程中,檢驗(yàn)自變量的效應(yīng),若自變量的效應(yīng)與之前相比大大減少甚至變?yōu)榱?,那么該變量的確就起到了中介的作用。值得注意的一點(diǎn)就是,變量的中介作用必須建立在理論和現(xiàn)實(shí)的基礎(chǔ)上,正如前所述,自變量必須在現(xiàn)實(shí)或理論上可以影響第三變量的變化,否則,即使數(shù)據(jù)支持該變量有中介效應(yīng),該結(jié)果也是無效的。 Mediator versus Moderator variables
The classic reference on this topic is Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182. Most of what is written here comes directly from this classic paper. · Moderator variables - 'In general terms, a moderator is a qualitative (e.g., sex, race, class) or quantitative (e.g., level of reward) variable that affects the direction and/or strength of the relation between an independent or predictor variable and a dependent or criterion variable. Specifically within a correlational analysis framework, a moderator is a third variable that affects the zero-order correlation between two other variables. ... In the more familiar analysis of variance (ANOVA) terms, a basic moderator effect can be represented as an interaction between a focal independent variable and a factor that specifies the appropriate conditions for its operation.' p. 1174
· Mediator variables - 'In general, a given variable may be said to function as a mediator to the extent that it accounts for the relation between the predictor and the criterion. Mediators explain how external physical events take on internal psychological significance. Whereas moderator variables specify when certain effects will hold, mediators speak to how or why such effects occur.' p. 1176 The general test for mediation is to examine the relation between the predictor and the criterion variables, the relation between the predictor and the mediator variables, and the relation between the mediator and criterion variables. All of these correlations should be significant. The relation between predictor and criterion should be reduced (to zero in the case of total mediation) after controlling the relation between the mediator and criterion variables. Another way to think about this issue is that a moderator variable is one that influences the strength of a relationship between two other variables, and a mediator variable is one that explains the relationship between the two other variables. As an example, let's consider the relation between social class (SES) and frequency of breast self-exams (BSE). Age might be a moderator variable, in that the relation between SES and BSE could be stronger for older women and less strong or nonexistent for younger women. Education might be a mediator variable in that it explains why there is a relation between SES and BSE. When you remove the effect of education, the relation between SES and BSE disappears. Understanding moderation is one of those topics in statistics that is so much harder than it needs to be. Here are three suggestions to make it just a little easier. 1. Realize that moderation just means an interactionI have spoken with a number of researchers who are surprised to learn that moderation is just another term for interaction. Perhaps it’s because moderation often appears with discussions of mediation. Or because we tend to think of interaction as being part of ANOVA, but not regression. In any case, both an interaction and moderation mean the same thing: the effect of one predictor on a response variable is different at different values of the second predictor. When we speak of moderation, we usually call the first predictor an independent variable, and the second the moderator. In other words, we’re really just interested in the effect of the independent variable on the dependent variable, and this effect is different at different values of the moderator. We’re never interested in the effect of the moderator on the dependent variable. When we speak of interaction, we don’t usually distinguish between independent variable and the moderator. Either predictor could be considered to “moderate” the effect of the other. Mathematically, there is no distinction. You don’t have to interpret one variable as the independent variable and the other as the moderator. But it can help interpretation to think of them that way. 2. Graph the means and/or predicted valuesModeration effects are difficult to interpret without a graph. It helps to see what is the effect of the independent value at different values of the moderator. If the independent variable is categorical, we measure its effect through mean differences, and those differences are easiest to see with plots of the means. Moderation says that those mean differences are not the same at every value of the moderator. It can be hard to discern a pattern in how they differ without seeing it. For example, the mean difference may get larger as the moderator increases. Or it may flip signs. If the independent variable is continuous, we measure its effect through a slope of the regression line. So you want to plot the predicted values of those regression lines. Moderation says that the slope of the regression line is different at every value of the moderator. (Yes, that one regression equation really represents many different lines—one for every possible value of the moderator). Once again, a positive slope may get larger (or smaller) as the moderator increases. Or it too can flip signs, going from a positive slope at low values of the moderator to a negative slope at high values. But this is difficult to see without a graph. If the moderator itself is continuous, you could potentially choose an infinite number of values at which to plot the effect of the independent variable. Not only would that take a while, the graph would be such a mess, you couldn’t see any patterns. Luckily, plotting the effects of the independent variable at only a few values of the moderator are usually needed to see patterns. 3. Choose the values of continuous moderators intentionallyThere are conventions to help you choose the best values of the moderator for plotting predicted values. But these conventions don’t always work in every situation. For example, one convention suggested by Cohen and Cohen and popularized by Aiken and West is to use three values of the moderator: the mean, the value one standard deviation above, and the value one standard deviation below the mean. Most of the time, this is a great convention and it works very well. One situation where it doesn’t is if the moderator is positively skewed. The value one standard deviation below the mean can be beyond the range of the data. In that case, using the minimum or some other small value of the moderator may be a better choice. Likewise, sometimes very specific values of the moderator are particularly meaningful. For example, in years of education, values of 12 and 16 generally indicate high school and college graduation. If years of education was the moderator, plotting effects of the independent variable when education equaled 12 makes a lot of sense, even if the mean is 12.57. It’s not that using 12.57 is wrong. But spending a little time thinking about a more appropriate value can make interpretation, and therefore communication to your audience, easier.
Note #一封來自讀者的信,關(guān)于這份倡議,我想要聽取各位的意見和建議,請(qǐng)留言或者投票,來各抒己見來進(jìn)一步完善@計(jì)量經(jīng)濟(jì)學(xué)圈平臺(tái)。# 來信者:我叫CRY。我是@經(jīng)濟(jì)日記本(charitydove) 的老讀者啦,突然看到你改了公眾號(hào)名字,我個(gè)人對(duì)計(jì)量超級(jí)有興趣。17屆準(zhǔn)碩一枚。就想著可不可以弄個(gè)計(jì)量學(xué)術(shù)的群。就趕緊找著公眾號(hào)里面的聯(lián)系方式和你聯(lián)系一下??纯茨懿荒芘獋€(gè)交流群什么的。 圈圈:可以的??!不過,@經(jīng)濟(jì)日記本 里面的好多不是計(jì)量經(jīng)濟(jì)學(xué)的讀者,而是可能因?yàn)槠渌蜿P(guān)注那個(gè)公眾號(hào)的呀!現(xiàn)在改成@計(jì)量經(jīng)濟(jì)學(xué)圈 是因?yàn)榘l(fā)現(xiàn)這種名稱更容易被搜索到。不過,近似的公眾號(hào)好像挺多的,她們都有類似的計(jì)量經(jīng)濟(jì)學(xué)群,會(huì)不會(huì)這種群過剩了。不過你的建議挺好的,我會(huì)準(zhǔn)備將來建立一個(gè)@計(jì)量經(jīng)濟(jì)學(xué)圈的群,里面交流計(jì)量經(jīng)濟(jì)學(xué)方面的問題,謝謝來信了! 對(duì)于公眾號(hào)方面的素材,也希望你能夠幫助尋找呀!如果有原創(chuàng)的當(dāng)然更是歡迎,如果沒有,那如果有一些比較好的技巧性的或思想性的文章,都是大大的歡迎呀!如果運(yùn)營(yíng)人員不多的話,最終會(huì)讓我枯竭呀!呵呵 來信者:好的好的。我就是覺得有一個(gè)學(xué)術(shù)性的交流群就挺好。人大經(jīng)濟(jì)論壇的那個(gè)魚龍混雜什么人都有,有些不喜歡。沒問題。讀書去了肯定接觸比較多。哈哈,希望我能寫點(diǎn)有水平的原創(chuàng)出來。到時(shí)候再發(fā)給你看......。 投票決定 @計(jì)量經(jīng)濟(jì)學(xué)圈 記錄一個(gè)我們生活在其中的時(shí)代社會(huì),一個(gè)非常具有潛力的深度與客觀兼具的大號(hào),囊括的主題如下:經(jīng)濟(jì)、社會(huì)、歷史、新聞、世界、計(jì)量工具。 |
|