[1] 0.001327097
2024-04-01
Most model selection procedures will choose features that have large \(T\)-statistics when testing whether they are 0 or not…
Even when nothing is happening some features will have large \(T\)-statistics!
Using \(p\)-values from summary() of a selected model is misleading.
Using confidence intervals from confint() of a selected model is misleading.
Y and X.[1] 0.001327097
[1] 0.83
80% of the time we’ll falsely declare a true relationship between Y and X!
80% of our confidence intervals won’t cover 0 (truth)…
Let’s look at a selection procedure we have used…
We’ll build up 100 null data sets and store them for a few analyses
In practice, there will likely be some signals – here there are none…
Distribution function here should be diagonal…
50% of our 95% confidence intervals will not cover 0 (truth)
[1] 0.5005834

[1] 0.04