Wagenmakers - Clarifications For Bem.pdf

(292 KB) Pobierz
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Yes,PsychologistsMustChangetheWayTheyAnalyzeTheir
Data:Clari¯cationsforBem,Utts,andJohnson(2011)
Eric{JanWagenmakers,RuudWetzels,DennyBorsboom,Rogier
Kievit,&HanL.J.vanderMaas
UniversityofAmsterdam
Abstract
Doespsiexist?Inawidelypublicizedarticlefeaturingnineexperimentswith
overonethousandparticipants,Bem(inpress)claimedthatfutureevents
retroactivelya®ectpeople'sresponses.Inaresponse,wepointedoutthat
Bem'sanalyseswerepartlyexploratory.Moreover,wereanalyzedBem's
datausingadefaultBayesian t -testandshowedthatBem'sevidenceforpsi
isweaktononexistent.Arobustnessanalysiscon¯rmedourskepticalcon-
clusions.Recently,Bem,Utts,andJohnson(2011)questionseveralaspects
ofouranalysis.Inthisbriefreplyweclarifyouranalysisprocedureand
demonstratethatourargumentsstillhold.
Keywords:Con¯rmatoryExperiments,BayesianHypothesisTest,ESP.
TheHistoryandtheHype
Inarecentarticlefor JournalofPersonalityandSocialPsychology ,Bem(inpress)
presentednineexperimentsthattestforthepresenceofpsi.Speci¯cally,Bem'sexperi-
mentsweredesignedtoassessthehypothesisthatfutureeventsa®ectpeople'sthinkingand
people'sbehaviorinthepast(henceforthprecognition).Bemarguedthatineightoutof
thenineexperiments,thedatasupportedthepresenceofprecognition,thatis,one-sided p
valuesweresmallerthan.05.
Bem's¯ndings|and,perhapsmoreimportantly,thefactthattheyweregoingto
bepublishedinamajorjournal|createdastormofmediaattention.Inthe NewYork
Times ,severalresearchersvoicedstrongopinions:Dr.RayHyman,along-timecritic
ofESPresearch,questionedthequalityoftherefereeingprocessashebelievedthatthe
publicationofDr.Bem'sarticlewas\(...)purecraziness(...)anembarrassmentforthe
ThisversionwaslastupdatedwithminorchangesonFebruary18th,2011.Thisresearchwassupported
byVidigrantsfromtheDutchOrganizationforScienti¯cResearch(NWO).Correspondenceconcerningthis
articlemaybeaddressedtoEric{JanWagenmakers,UniversityofAmsterdam,DepartmentofPsychology,
Roetersstraat15,1018WBAmsterdam,theNetherlands.Emailaddress:ej.wagenmakers@gmail.com.
EXTRASENSORYPERCEPTION 2
entire¯eld" 1 ,andDr.DouglasHofstadterarguedfor\(...)acuto®forcraziness,and
whenthatthresholdisexceeded,thenthecriteriaforpublicationshouldgetfar,farmore
stringent."Bem'sarticlewasalsodiscussedin Science (Miller,2011)andmanyothermedia
throughouttheworld.AGooglesearchon\Bem"and\feelingthefuture"generatesover
50,000hits. 2 BemhimselfappearedonthepopularUStelevisionshow TheColbertReport ,
wherethehostdescribedBem'sworkas\extrasensorypornception"referringtothefact
thatExperiment1inBem(inpress)foundthatprecognitionwaspresentonlyforerotic
pictures.Inthe NewYorkTimes ,Bemwasquotedassaying\WhatIshowedwasthat
unselectedsubjectscouldsensetheeroticphotos,butmyguessisthatifyouusemore
talentedpeople,whoarebetteratthis,theycould¯ndanyofthephotos."
SomemonthsbeforeBem'sresearchstartedtoattractalotofmediaattentionwe
wrotearesponsethatcriticizedBem'sworkonseveralcounts.Thisresponsewassubmitted
toJPSPandpublishedinthesameissue(i.e.,Wagenmakers,Wetzels,Borsboom,&van
derMaas,inpress).Inthisresponse,we¯rstnotedthattheanalysisoftheexperiments
hadbeenpartlyexploratory,whereasthestatisticalanalysisassumedafullycon¯rmatory
approach.Thatis,wearguedthatBemhadusedthedatatwice:oncetodiscoveran
interestingresult,andthentotestit.Insupportofourclaim,wepointedtoseveral
instanceswhereitwasclearthattheanalysishadbeenexploratory.
NextweusedBayestheoremtoarguethatthebarforpublishingshouldbesethigher
forclaimsthatareoutlandishorimprobable.Third,weusedadefaultBayesian t test
(Rouder,Speckman,Sun,Morey,&Iverson,2009)tohighlightthattheone-sided p values
usedbyBemoverestimatetheevidenceagainstthenull;infact,ourdefaulttestindicated
littleevidenceinfavorofprecognition|onlyoneofBem'snineexperimentsyieldeddata
substantiallymorelikelyunder H 1 (i.e.,thehypothesisofprecognition)thanunder H 0 .
ItisimportanttonotethatourdefaultBayesiantestdoesnotdependatallon
thepriorprobabilitythatonemayassign H 1 .Therefore,itiscertainlynottruethatour
Bayesiananalysissimplycon¯rmsourinitialbiasagainstprecognition,assomebloggers
mistakenlybelieved.Instead,theresultofourBayesiantestisknownasthe Bayesfactor ,
andwithrespecttopriorassumptionsitonlydependsonthee®ectsize ± expectedunder
H 1 (seealsoLiang,Paulo,Molina,Clyde,&Berger,2008).Inwhatfollows,wewilldenote
thepriordistributionfore®ectsizeunder H 1 as p ( ±jH 1 ).
Thedefaultassumptionwemadeabout p ( ±jH 1 )wasbasedonalongtradition
inBayesianstatisticswherepriordistributionsareconstructedfromgeneraldesiderata
(Je®reys,1961).TheadvantagethatthisbringsisthattheBayesiananalysisisfully
objective(Berger,2004)andavoidssubjectivespeci¯cationoftheexpectede®ectsizes
under H 1 .Werealizedthatthedefaultchoiceleadstoaconservativetest.Indeed,our
abstractstatedthat\(...)inordertoconvinceaskepticalaudienceofacontroversialclaim,
oneneedstoconductstrictlycon¯rmatorystudiesandanalyzetheresultswithstatistical
teststhatareconservativeratherthanliberal."
Despitetheadvantagesofanobjectivetest,wealsorealizedthatthechoiceof p ( ±jH 1 )
couldbedisputed.Wethereforecarriedoutarobustnessanalysisinwhichwesystematically
1 Dr.Hymandidnotquestionthepublicationofaparapsychologicalarticleassuch.Instead,Dr.Hyman
waspuzzledthatJPSPhadacceptedanarticlewithsomanydeparturesfromacceptedmethodological
practice(Dr.Hyman,personalcommunication).
2 Queryissuedon15February2011.
EXTRASENSORYPERCEPTION 3
variedthescaleparameterfor p ( ±jH 1 ),andreportedtheresultsinanonlineappendix. 3
Theseresultsshowedthatforawiderangeofdi®erent,non-defaultpriordistributionson
e®ectsizetheevidenceforprecognitioniseithernon-existentornegligible.
Thepenultimatesectionofourresponseprovidedguidelinesoncon¯rmatoryresearch.
Westressedhowimportantitisthatresearchonprecognitionisconductedinthecontextof
anadversarialcollaboration,thatis,acollaborationwithaquali¯edskeptic(e.g.,Diaconis,
1991).
Throughoutourresponse,wearguedthatourcritiquewasnotmeanttoattackre-
searchonpsi.Thelastparagraphofourresponseisparticularlyclearonthebroader
consequencesofthedebate:
\ItiseasytoblameBemforpresentingresultsthatwereobtainedinpart
byexploration;itisalsoeasytoblameBemforpossiblyoverestimatingthe
evidenceinfavorof H 1 becauseheused p valuesinsteadofatestthatconsiders
H 0 vis-a-vis H 1 .However,Bemplayedbytheimplicitrulesthatguideacademic
publishing|infact,Bempresentedmanymorestudiesthanwouldusuallybe
required.Itwouldthereforebemistakentointerpretourassessmentofthe
Bemexperimentsasanattackonresearchofunlikelyphenomena;instead,our
assessmentsuggeststhatsomethingisdeeplywrongwiththewayexperimental
psychologistsdesigntheirstudiesandreporttheirstatisticalresults.Itisa
disturbingthoughtthatmanyexperimental¯ndings,proudlyandcon¯dently
reportedintheliteratureasreal,mightinfactbebasedonstatisticaltests
thatareexplorativeandbiased(...).WehopetheBemarticlewillbecomea
signpostforchange,awritingonthewall:psychologistsmustchangetheway
theyanalyzetheirdata."
ThebroaderimpactofourresponsetoBemhasbeendescribedas\theBayesian
bomb". 4 Consistentwiththisassessment,Wetzelsetal.(inpress)presenteddefaultBayes
factorsforall855 t testsreportedinthe2007volumesof PsychonomicBulletin&Review
and JournalofExperimentalPsychology:Learning,Memory,andCognition .Theresults
showedthatfor70%ofthedatasetsforwhich p valuesrangefrom.01to.05,theBayes
factorindicatedthattheevidenceinfavorof H 1 is\anecdotal"inthesensethatthedata
arelessthanthreetimesmorelikelyunder H 1 thanunder H 0 .
TheComplaintsbyBem,Utts,andJohnson(2011)
ArecentrebuttalbyBemetal.(2011) 5 questionsseveralaspectsofourresponse
outlinedabove.Wedisagreewithseveraloftheirpoints,butwealsobelievethatsomething
goodmaycomeoutofthisdebate,atleastforthe¯eldofpsi.
BelowwediscusstheBemetal.(2011)rebuttalintermsoffourcentralcomplaints.
The¯rstisthatBem(inpress)did not explorethedatawhenheanalyzedhisresults.
Wearguethatthisgeneralstatementfailstoaddressourdetailedpointsofcritique,that
inearlierworkBemhimselfarguedstronglyinfavorofexploration,andthattheBem
3 Available on the ¯rst author'swebsite or at http://www.ruudwetzels.com/articles/
Wagenmakersetal_robust.pdf .
4 GeorgevanHal,NWTMagazine.
5 Downloadedfrom http://dbem.ws/ResponsetoWagenmakers.pdf onFebruary15th,2011.
EXTRASENSORYPERCEPTION 4
experimentsshowastrongnegativecorrelationbetweensamplesizeande®ectsize(as¯rst
pointedoutbyDr.Hyman,personalcommunication).
ThesecondcomplaintisthatinBem'sexperimentsaone-sidedtestismoreappro-
priatethanatwo-sidedtest.AlthoughwegenerallyagreethataBayesianone-sidedtest
canbeentirelyappropriate(e.g.,Wagenmakers,Lodewyckx,Kuriyal,&Grasman,2010;
Wetzels,Raaijmakers,Jakab,&Wagenmakers,2009)thedangerofaone-sidedtestisthat
itcanbeabusedintheabsenceofstrong apriori expectationstocreateanoverlyopti-
misticimpressionofthetrueevidenceinfavorofthehypothesisunderconsideration.We
willillustratethisdangerwiththreeexperimentsreportedinBem(inpress).
Thethirdcomplaintisthatourdefaultpriordistributionone®ectsize, p ( ±jH 1 ),
wastoowideandassignedtoomuchweighttoimplausiblyhighvaluesofe®ectsize.As
indicatedabove,wehadalreadyaddressedthisissueinourrobustnessanalysis.However,
wedoappreciatetheproposalforaspeci¯cpriordistributionthatcannowbeusedto
computesubjectiveorinformedBayesfactorsinthe¯eldofpsi.Perhapsfuturestudieswill
usethispriortoevaluatetheevidenceinfavororagainstprecognitionandpsi.Weexamine
atwo-sidedversionoftheproposedpriordistributionindetailinthepenultimatesection
ofthispaper.
Thefourthcomplaintisthatevidenceshouldbecombinedacrossstudies.Weagree
that,inanidealworld,combininginformationacrossmultiplestudiesisuseful.However,
thisisnotaperfectworld,andasstatedinourresponse:
(...)wehaveassessedtheevidentialimpactofBem'sexperimentsinisolation.
Itiscertainlypossibletocombinetheinformationacrossexperiments,forin-
stancebymeansofameta-analysis(Storm,Tressoldi,&DiRisio,2010;Utts,
1991).Weareambivalentaboutthemeritsofmeta-analysesinthecontextof
psi:onemayobtainasigni¯cantresultbycombiningthedatafrommanyex-
periments,butthismaysimplyre°ectthefactthatsomeproportionofthese
experimentssu®erfromexperimenterbiasandexcessexploration.Whenexam-
iningdi®erentanswerstocriticismagainstresearchonpsi,Price(1955,p.367)
concluded\Buttheonlyanswerthatwillimpressmeisanadequateexperiment.
Not1000experimentswith10milliontrialsandby100separateinvestigators
givingtotaloddsagainstchanceof10 1000 to1|butjustonegoodexperiment."
WealsonotethatBem'sarticlewouldmostlikelynothavebeenpublishedifit
hadtobackawayfromtheclaimthattheexperimentsshowed independent evidencefor
precognition,i.e.,whenconsideredinisolation.JPSPdoesnotpublishmanyexperiments
with200participantsthatyieldinconclusiveresults.
Wenowdealwitheachofthecomplaintsindetail.Thereaderwhoisboredcansafely
skiptotheConclusionsection.
Complaint1:ThereReallyWasNoExploration
Bemetal.(2011)denythattherewasanyexplorationintheBem(inpress)exper-
iments.Theyarguethatthehypotheseswereallbasedonpriorresearch,andthateven
thoughmultipleanalyseswereconducted,theseanalysesservedtocon¯rmthesamepoint.
Thisstatementcontrastssharplywithreality.
EXTRASENSORYPERCEPTION 5
Firstofall,Bemetal.(2011)donotaddressthespeci¯cpointsofconcernthat
weraisedinfourparagraphsofourresponse.Forexample,itiscompletelyunclearwhy
gendere®ectsweretestedinthe¯rstplace,asBem(inpress)explicitlystatesthat\the
psiliteraturedoesnotrevealanysystematicsexdi®erencesinpsiability".Inaddition,
ourexperienceisthatpsychologistsexploretheirdataatleasttosomeextent.WhenBem
etal.(2011)claimnottohaveexploredthedataatall,theye®ectivestatethattheresearch
byBem(inpress)isthepinnacleofcon¯rmatoryresearch.Thisimpressionisinconsistent
withapainfullydetailedanalysisoftheBemexperimentsbyJamesAlcock. 6 Moreover,
thisimpressionisalsoinconsistentwiththequotationfromtheBemchaptersonwriting
thatwepresentedinourresponse:
\Theconventionalviewoftheresearchprocessisthatwe¯rstderiveasetof
hypothesesfromatheory,designandconductastudytotestthesehypotheses,
analyzethedatatoseeiftheywerecon¯rmedordiscon¯rmed,andthenchronicle
thissequenceofeventsinthejournalarticle.(...)Butthisisnothowour
enterpriseactuallyproceeds.Psychologyismoreexcitingthanthat(...)"(Bem,
2000,p.4).
Unfortunately,Bemetal.(2011)chosenottoelaborateontheextenttowhichthephi-
losophybehindthisquotation(andothers)discreditstheconclusionsfromallstatistical
analysis,Bayesian,frequentist,orotherwise.
Asa¯nalindicationthattheresultsfromBem(inpress)wereobtainedfromexplo-
ration,RayHyman(personalcommunication)notedthatintheBemstudythelowe®ect
sizestendedtooccurinexperimentswithmanyparticipants.Figure1showsthisasso-
ciation(seealsoHyman,1985).Howcanweexplainthisiftheexperimentswerepurely
con¯rmatory?
Insum,Bemetal.(2011)failtoaddressthequestionsaboutexplorationthatwe
raisedinourresponse.Inaddition,theBemexperimentswithmanyparticipantsshow
smallere®ectsthanthosewithfewerparticipants.Thisstronglysuggeststhatexploration
(perhapsthroughoptionalstopping)didtakeplace.
Complaint2:AOne-SidedTestisMoreAppropriateThanaTwo-SidedTest
Bemetal.(2011)arguethatthetestsforprecognitionintheBemstudiesshouldbe
one-sided,nottwo-sided.Aspointedoutabove,themainproblemwithone-sidedtestsis
thattheymaybeusedtobiastheresults.Thatis,aresearcherwithoutstrongapriori
expectationsmayawaitthedataandselecttheone-sidedtestthatproducesthemost
convincingresult.Infact,thisdisadvantageisillustratedintheverypaperthatBemetal.
(2011)seektodefend.
TheproblemconcernsExperiments5,6,and7anditisperhapsbestillustratedwith
acommentfromRouderandMorey(2011) 7 ,whoalsoadvocatedtheuseofaone-sidedtest
butexcludedtheseexperimentsfromconsideration:
6 Availableat http://www.csicop.org/specialarticles/show/back_from_the_future .Bem'sresponse
andAlcock'sreplycanalsobefoundonline.
7 Downloadedfrom http://pcl.missouri.edu/sites/default/files/rouder-morey.pdf onFebruary
15th,2011.
Zgłoś jeśli naruszono regulamin