So the PG estimator is clearly problematic. I agree that the yummfajitas (YM) es...

yummyfajitas · on Oct 31, 2015

I don't know of something to refer to, but I don't think the statistics are too hard. The test statistic would be exactly min(sample1) and min(sample2).

Suppose the cutoff sample is distributed according to f(x)H(x-C). Then the probability of the minima of a sample exceeding C+e by random chance, assuming the null hypothesis, is p = (1-\int_C^{C+e}f(x) dx)^N.

So now you have a frequentist hypothesis test. If you make reasonable assumptions on f(x) (non-vanishing near C, quantified somehow), it's even nice and non-parametric.

tansey · on Oct 31, 2015

Does that assume both samples are identically distributed and the only difference is the cutoff? If it does, then couldn't we just continue to do a difference of means test and still be consistent? If it doesn't, how do you handle identifying the cutoff minima and the two different distributions in a frequentist way?

yummyfajitas · on Oct 31, 2015

The only assumption I need is that P_{f,g}([C,C+d]) >= h(d) > 0 for some arbitrary monotonic function h(d). This comes directly from the p-value formula.

I.e., for any d, there is a finite probability of finding an A or a B in [C,C+d]. I don't actually care what the shapes of f or g are at all beyond this - as long as this probability exists and is bounded below (in whatever class of functions f and g might be drawn from), it's all fine.

tansey · on Oct 31, 2015

Sorry, I'm confused here. A p-value makes an implicit assumption that your null hypothesis is a known N(0,1). That may be throwing me off a bit. I get the point of you want to look at the likelihood function which is just one minus the CDF in the given interval. I'm just not clear on how you can get around f and g being arbitrarily parameterized functions of a given class. Are you assuming we know the class and something about f?

yummyfajitas · on Oct 31, 2015

A null hypothesis is just a specific thing you are trying to disprove. In this case, it's simply that the min of both distributions is identical.

I am assuming we know exactly one thing about the class the measures f and g come from: for every function in that class, \int_C^{C+d} f(x) dx >= h(d) for some monotonic function h(d).

The p-value is then computed in terms of h(d), since p >= h(d)^N.

tansey · on Oct 31, 2015

Okay, I'll have to wait for your full write-up, because I am not seeing the path of thought here.

lfowles · on Nov 1, 2015

I'll have you know that this particular subthread was way too civil for the Internet.