What makes a good test? Nick Michalak X?Y Do explicit and implicit attitudes correlate? Is antidepressant medication an effective treatment for depression? Does watching TV increase aggression in children? Does attention improve memory?
Does mortality increase and does fertility decrease after maturity? Presentation roadmap 1. Some test equations 2. Some pretend tests (experiments) 1. More wrong way of analyzing data 2. Less wrong way of analyzing data 3. Discussion 4. New and fancy way to test interaction
hypotheses 5. More discussion More specific hypothesis are easier to falsify. Thats good. Dienes (2008); Popper (1934/1959) The difference between significant and not
significant is not itself statistically significant Gelman & Stern (2006) Test one mean Test more means
= =
+ = =
= = (^
) = ) (
= = One regression slope
= ^ ^ ^ ^
( )+ ( ) = Slopes from the same Now for some pretend experiments
A hypothesis Feeling powerful (compared to feeling powerless) increases motivation (but not positive affect) in those with lower self-efficacy (but not lower selfesteem), which leads to better negotiation performance Study 1 Manipulation Measure
200 people randomly assigned to feel more powerful or to feel powerless Self-reported motivation Midpoint
Error bars 95% CIs t(99) = 1.66, pt(197.77) = .09927= -0.35, t(99) p = .7299 = 2.09, p = .03929 Study 1b Manipulation
Measures 200 people randomly assigned to feel more powerful or to feel powerless Self-reported motivation Self-reported positive affect
t(197.77) = -3.28, p = .001216 t(195.31) = -1.63, p = .104 Error bars 95% CIs Error bars 95% CIs t(194.29) = -1.0883, p = .2778
Study 2 Hypothesis Manipulations and Measures Feeling powerful increases motivation even for people feel less self-effective
400 people randomly assigned to feel more powerful or powerless and to feel more effective or less effective Measured self-reported motivation Error bars 95% CIs
t(197.97) = t(395.18) -2.0645, p==-1.62, .04028 p = . t(197.29) 10543833= 0.24254, p = .8086 Study 3 Hypothesis Manipulations and Measures
Greater self-efficacy (not self-esteem) correlates with greater negotiation performance when people feel powerless but not when people feel powerful. 200 people randomly
assigned to feel more powerful or powerless Measured negotiation performance and selfreported self-efficacy and self-esteem Separate: z = -2.063, p = 0.039 Seperate: z = 0.254, p = .799 Difference in 2-way interactions: z = -1.621, p = .105 Together: z = 0.159, p = 0.874 Together: z = -2.318, p = 0.02
Study 4 Hypothesis Manipulations and Measures For those lower in selfefficacy, power increases motivation which improves negotiation performance
Recruited 200 people pre-screened for low self-efficacy Randomly assigned to feel more powerful or powerless Measured motivation and negotiation performance
Preacher & Hayes (2008) Indirect motivation = 0.058, 95% CI [0.012, 0.105] Motivation a1 = -0.047 b1 = -0.166 Low Power = 1, High Power =1
Performance a2 = 0.273 b2 = 0.214 Positive Affect Indirect positive affect = 0.008, 95% CI [-0.014, 0.030] Motivation positive affect = 0.08, 95% CI [-0.102, 0.001] On full and partial mediation
As Rucker et al. (2011) nicely illustrated, the problem with this reasoning is that establishing that some variable M completely mediates the effect of X on Y says nothing whatsoever about the existence or absence of other possible mediators of Xs effect. Even if you can say youve completely accounted for the effect of X on Y with your favored mediator, this does not preclude another investigator from being able to make the same claim as you, but using an entirely different mediator. If there are multiple mediators that completely mediate Xs effect when considered in isolation, then what value is there to claiming that your favored mediator does? It is an empty claim, with no real value or meaning and nothing especially worthy of celebration much less
even hypothesizing in the first place. Andrew Hayes, 2013 Do researchers make claims about contrasts and interactions in their data without formally testing those claims?
Are we setting the bar too high if we expect researchers to test complex contrasts (e.g. test difference between 2 two-way interactions)? A different way to distinguish among patterns of interactions
Widaman, K. F., Helm, J. L., Castro-Schilo, L., Pluess, M., Stallings, M. C., & Belsky, J. (2012). Distinguishing ordinal and disordinal interactions. Psychological Methods, 17(4), 615-622. Study 3a again High power slope z = 2.470, p = 0.01
Low power slope z = -0.440, p = 0.660 Study 3b (replication) High power slope z = -0.486, p = .627 Low power slope z = 7.56, p < .0001
+ + +* =0 =1 + + +*+ + +* +* * =
Widaman, Helm, Castro-Schilo, Pluess, Stallings, & Belsky Cross-over point estimate 3 1= 2 Widaman, Helm, Castro-Schilo, Pluess, Stallings, & Belsky Study 3c (replication)
High power slope z = 0.486, p = .627 Low power slope z = 7.56, p < .0001 Cross-over b = 0.321, 95% CI [-0.22, 0.625] High power slope z = 0.534, p = .593
Low power slope z = 5.35, p < .0001 Cross-over b = 4.15, 95% CI [1.50, 6.81] High power slope z = 0.770, p = .442 Low power slope
z = 5.48, p < .0001 Cross-over b = 8.24, 95% CI [6.46, 10.00] Can statistics help us test more specific (more falsifiable) hypotheses? Or is falsifiability determined at the design stage? Thank you
[email protected] OSF profile: https://osf.io/gb5xj/ contact: [email protected] OSF profile: https://osf.io/gb5xj/