This is not an html file just grab it.
Date: Fri, 17 Jan 97 15:41 +0200
From:
To: Maya Bar Hillel
Cc: Dror Bar-Natan ,
Ilya Rips ,
aumann
Subject: What would convince me?
Shalom Maya,
> Date: Sat, 11 Jan 1997 23:52:14 +0200 (WET)
> From: Maya Bar-Hillel
> To: aumann israel
> Cc: Dror Bar-Natan ,
msmaya
> ... please consider this a written request for suggestions as to
> what kind of results which Dror and I might come up with you
> would find more convincing than those I already told you about
> (conditional, of course, on our rechecking that there is no error
> in the numbers).
1. This is mainly to respond to the above request. Let me say
at once that I find your approach interesting, and think it has
potential. There are two major (somewhat related) points that I
will make below:
a) The statistical analysis, though it seems attractive at
first, turns out, on examination, to be unsound. That is to say,
the idea is interesting, but is not carried out in an appropriate
way. Below I will suggest more appropriate ways to carry it out
(these suggestions have NOT been checked out with either Rips or
Witztum or, for that matter, with anybody else).
b) A charge of (intentional or unintentional) "cheating" --
such as is inherent in your story -- should be plausible in view
of the "uvdot bashetakh;" i.e., in view of things like the
chronology, the necessary extent of the circle of conspirators,
and so on. You DON'T necessarily have to have a proof that would
yield a conviction in a court of law; you DON'T have to believe
what the suspects say; but you DO have to relate to these things,
there has to be some kind of plausible story, it can't be wildly
implausible, you can't just wave the facts away. Dror realizes
this.
2. This is being written in a big hurry, as I have a million
things to do before leaving for abroad on Sunday. So I will try
to be accurate and precise and reasonably complete, and to use
appropriate phrasing, but I may slip up here and there. Please
forgive me. I am sending a copy to Rips (who will presumably
pass it on to Witztum), so if I misunderstood something they told
me, they can correct it.
3. It's important to get the chronology straight. The following
chronology is partial in the sense that not all relevant events
are included, but as far as it goes, I believe it's correct (I
don't know the order within any given item). I will not use
dates -- these can be checked out -- but present the events in
their correct time order. (In general, I don't know the
chronological order within each of the items below.)
A. Experience with one-dimensional, and later with two-
dimensional ELS's.
B. It is noticed (I believe by Rips) that the Rambam appears
in close proximity to "Mishne Tora" with a skip of 613. Also that
Herzl (the founder of modern Zionism) appears in close proximity
to a phrase that includes his BIRTHDAY. There is NO experience
with Tora personalities vis-a-vis their dates. (This was told to
me by Rips last Friday, January 10, 1997 -- BEFORE I reported to
him on the Zarka Ma'in meeting).
C. RIPS suggests checking Tora personalities vis-a-vis their
dates in a systematic way. (Confirmed to me by Rips last Friday,
January 10, 1997 -- also before my report on the Zarka Ma'in
meeting.)
D. The first list is generated. Havlin is approached for the
appellations, Urbach for the date forms, the dates are checked
out and (where necessary) corrected, spelling forms are
determined. The statistics P1 and P2 are defined.
E. The test is performed on the first list. Both P1 and P2
-- which at that time were considered significance levels -- turn
out amazingly small.
F. The results are sent to Diaconis. He asks (inter alia)
for the same test to be carried out on a fresh list of
personalities.
G. The second list is generated and tested, using exactly
the same test as for the first list. Again, high "significance."
H. The results are sent to Diaconis. He is unconvinced,
asks for a permutation test, and asks that the first list not be
used in a formal test.
I. P3 and P4 are defined.
J. The details of a formal test are agreed between Diaconis
and Aumann (I'm trying to avoid pronouns, because they often lead
to confusion).
K. The formal test turns out significant at a level of 16
out of a million. (That is, the best result of the four
statistics is 4 out of a million, and then Bonferoni.)
3. Now let's get to Maya's tests. The idea is that the WRR
(Witztum-Rips-Rosenberg) test involves many arbitrary choices.
For each of 13 such choices, Maya looks whether the test
statistic comes out better when the choice is made as it was.
One might expect that it comes out better in about half the
cases, and worse in about half the cases. But Maya finds that in
each of the 13 cases, WRR's choice was to their advantage. This
seems highly improbable, UNLESS the WRR statistic was observed
BEFOREHAND to react favorably to at least some of the choices
involved. And of course, if one does this, it is less surprising
that one can generate significance at a high level.
4. For this to make sense, clearly the statistic to be
calculated in connection with each choice should be the one with
which WRR were working at the time that the choice in question
was made. HERE IS PROBLEM NO. 1 WITH MAYA'S TESTS: She does NOT
do this.
The statistic Maya uses is the rank order out of ten million
random permutations. But the entire test -- dates, spelling,
appellations, date forms, EVERYTHING, was fixed BEFORE Diaconis
suggested the permutations. Using the permutations here is an
inadmissible anachronism -- it's like asking why the defenders of
Metzada didn't use Uzi's.
Indeed, many of the choices that Maya examines were made
before the FIRST list was tested, i.e., even before Diaconis
suggested the second list, and a fortiori before he suggested the
permutations.
5. Of course, it is remotely possible that before testing the
first list, WRR foresaw (from Dilugim? :-) ) that Diaconis would
ask them for a fresh list of personalities, and also foresaw that
still later he would ask them to use a permutation test rather
than the test they had been using. Theoretically, one could
even raise the possibility that Diaconis himself was part of the
conspiracy, though I think that everybody concerned would be
willing to rule THAT one out.
What I'm suggesting is that in carrying out this
investigation, one must stay more or less in the realm of the
plausible; one must maintain common sense. And, common sense
calls for using the statistics P1 and P2, and NOT the permutation
test, to test Maya's hypothesis.
6. Let us now look at each of the 13 choices that Maya examined.
1: When Margaliot had an incorrect date, WRR substituted the
correct date -- rather than using Margaliot's incorrect one
(first list).
2: Same (second list).
3: When Margaliot had an incorrect date, WRR substituted the
correct date -- rather than omitting the item (first list).
4: Same (second list).
5: WRR used birthdays as well as deathdays; they could have
used just the death days (first list).
6: Same (second list).
7: WRR used both forms for each of 15 and 16; they could have
used just "tet-vav" and "tet-zayin" (first list).
8: Same (second list).
9: The form "be-alef be-tishri" could have been used, and was
not (first list).
10: Same (second list).
11: WRR used an incorrect first list, because of incorrect
measurement of the length of a column (they claim by mistake, but
maybe it was really on purpose). They could have used a correct
list.
12 & 13. Same for the second list. In this case Maya examined
two rather complicated alternatives, but did NOT (!) examine the
alternative of simply using the correct second list.
7. Maya's 13 tests fall naturally into two classes: 1 through 8
and 9 through 13. The first class (1 through 8) consists of
cases where WRR expected beforehand that the results would be
improved by their choice; eventually, according to your report,
they indeed were. Tests 1 through 4 are good examples of this.
Clearly, if one thinks that dates are important, it's a good idea
to get the right dates! That the right dates score better than
the wrong ones -- or than none at all -- should be no surprise IF
the research hypothesis is correct. Maya, of course, rules out
the possibility that the research hypothesis is correct -- but
you can't assume that in your analysis!
Similarly for Tests 5 and 6. WRR think that birthdays ARE
significant -- remember, the whole idea started with Herzl's
birthday (see 3B above)!
And similarly for 7 and 8. One must remember that the
reason for using "tet-vav" and "tet-zayin" is to avoid using the
Name of the Lord in vain. But IF there IS a code in Bereshit,
then that would not be using the Name in vain! After all,
Bereshit is full of entirely explicit occurrences of the Name;
the hypothesis that the author of Bereshit is willing to use the
Name does make some sense. THAT is reason that Rips suggested at
the outset to use both forms. And apparently, it works!
Applying Maya's idea to this kind of phenomenon is a little
like saying that in testing whether surgery + radiation works in
treating cancer, one must try the treatment without radiation!
8. Tests 9 through 13 are admittedly different. In 9 and 10
there is no apparent reason for having left out the additional
form (WRR wrote that they received an expert opinion to this
effect, and the expert has meanwhile died; but we'll ignore
that). In Tests 11, 12, and 13, WRR admit to measurement errors,
which Maya claims are to their advantage.
Are they indeed?
NO!
BOTH in Test 9 AND in Test 10, min(P1,P2) actually becomes
SMALLER when the form "be-alef be-tishri" is added. And
remember: that -- NOT the permutation -- is the correct statistic
to use (see Point 5 above). So WRR's choice was to their
DISADVANTAGE in both Tests 9 and 10.
9. In the case of Test 11, Maya is right! min(P1,P2) does
become somewhat larger when the correct list is used.
10. The case of Tests 12 and 13 is somewhat strange. As
mentioned above, in this case Maya examined two rather
complicated alternatives, but for some reason did NOT (!) examine
the simple alternative of just using the correct second list. If
one does use this simple alternative, one finds again that
min(P1,P2) becomes smaller when the correct list is used. So
again, WRR's mistake was to their DISADVANTAGE.
11. Since the the second list was the one actually used in the
formal test on which Diaconis and Aumann agreed, and that was
eventually published in STATISTICAL SCIENCE, it is of some
interest to ask how the use of the correct list would affect the
true significance level -- that given by the permutation test.
That is, quite apart from the question of cheating -- which we
have seen is NOT indicated by this mistake -- the question
arises: In view of this mistake, does the result in fact remain
valid?
Answer: YES, very much so. The significance level
improves by a FACTOR of 40 -- from 16 in a million to 4 in
ten million! And this 4 in ten million is itself only because of
Bonferoni -- the true best result is 1 in 10,000,000, and may
be even better (only 10,000,000 permutations were examined).
12. Summary: Out of Maya's 13 tests, the first 8 are
disqualified on conceptual grounds. Of tests 9, 10, and 11, Maya
is right in one, wrong in two. Tests 12 and 13 seem contrived
and complicated; if one replaces them by a natural,
straightforward test, Maya is again wrong. Final score for Maya:
1 in 13, or 1 in 5, or at the very best, 1 in 3. But no matter
how you score it, there's NO indication of cheating.
13. Here's another choice that WRR made that they didn't have to
make at all, and that was definitely to their DISADVANTAGE: The
addition of the statistics P3 and P4 (see 3I above). This was
rather late in the game -- AFTER the permutation test had been
suggested; so if they were cheating, they should have known by
that time what they're doing. But it cut down significance by a
factor of 1.6 -- from 10 in a million to 16 in a million.
Again, it COULD all be a nefarious plot -- a kind of decoy;
having foreseen Maya's test as well, they wanted to show how
honest they are. But how plausible is that?
14. Let's now consider the matter of "stars," which Maya raised
at Zarka Ma'in. This sounds interesting at first, but on
consideration, it's not clear that there's anything to it. One
must remember two things. First, by all accounts, EVERYTHING had
been fixed by the time the permutations were suggested. The
matter of stars must therefore be evaluated in the light of P1
and P2, or if you wish, in the light of the distribution of the
c(w,w'). The second thing to note is that WRR's contention has
been, all along, that there are an unusually large number of
unusually small c(w,w') -- i.e., stars (look at the bar graphs
on P.437 of their article). If you look at the construction of P1
and P2, that's what it amounts to. So what's it all about?
I'll agree that this bears further looking into. But for
the time being, I see nothing there.
15. Burden of Proof: At Zarka Ma'in, Maya said that the burden
of proof is now on WRR. I don't see it that way. They've gotten
a very high significance level. So far, it's stood up under
examination. Asking questions and raising possibilities, which
on examination turn out groundless, is not enough.
16. So now, what kind of results that Dror and Maya might come
up with would be more convincing than those that Maya already
discussed?
17. Before answering this, let me turn the question around and
ask YOU, Maya: What kind of results that WRR might come up with
would be more convincing to you than those already published?
The answer is public knowledge: There are none. You are on
record as saying (in the discussion after my "Rationality on
Friday" talk) that NOTHING will convince you.
That's OK. Michelson and Morley believed in an ether to the
end of their lives. Though the experiment that destroyed the
ether was their own, and though they kept refining the experiment
and never found any effect at all, they kept believing in the
ether. A scientist does not need to keep an open mind on
everything; like anybody else, he has a right to his faith.
In this respect, I have an advantage over you. I've always
been very, very skeptical about this business. Frankly, I still
am. Though I can't say why, I'm far from convinced that they're
right. In my bones, I feel that I need more evidence -- lots
more.
BUT UNLIKE YOU, I DON'T RULE IT OUT COMPLETELY. Though
utterly fantastic, it's just barely possible. I'm keeping an
open mind, and I'm going to play it by the rules. I really want
to find out whether there is a phenomenon there or not. In
contrast, you're already sure; you only want to find out HOW they
cheated, not IF they cheated.
18. Back to Point 16: First of all, whatever you do, you've got
to say BEFOREHAND "I'm going to do this and that and that."
You've got to do that BEFORE you actually compute anything. And,
you've got to give PRECISE criteria for success and failure. YOU
can make them up as you wish, but you've got to tell the world
BEFOREHAND what they are. And success or failure, you've got to
tell us afterward how your tests came out. So we can keep score.
That's what they did. I didn't believe they would, but they
did. And if you want to convince ME, you're going to have to do
the same.
If at first you don't succeed, you can keep trying. Just
tell us BEFOREHAND what you're doing, and what the criteria are,
and whether or not this test is going to be definitive, and so
on. You can keep it open, or close it, or do what you want.
Just tell us. Beforehand.
19. Now to specifics. As you yourself pointed out, the
procedure for calculating the c(w,w') is very complicated. You
can ring the changes on it in lots of ways. For example, there
are lots of ways of perturbing the ELS's. Or you can use a
different distance function. Or you can raise the 8. Lots of
possibilities there. I don't have time now, but I'll be able to
make some specific suggestions.
I haven't checked this out with Ilya and Doron (in fact, I
didn't check out anything in this letter with them, except where
I cite them explicitly), and they should tell me if they think
I'm wrong, and why. But on the face of it, all these things seem
to me to be neutral to substance (UNLIKE your tests 1 - 8). So if
in all these cases, or in disproportionately many of them, WRR's
choice was to their advantage, then I'll sit up and take notice.
You'll still have to tell a plausible story. And you'll
still have to explain things like Gans. You won't be home yet,
but at least you'll be on first base.
(19a. There's a small problem of credibility here. I have
complete faith in your honesty. But the honesty of WRR has been
impugned, so if one wants to maintain objectivity, one should
address this matter for your tests too. I think this can be
overcome; there are so many parameters floating around here that,
for example, one could choose a test or tests at random from a
large number of possibilities.)
20. Appellations and dates: Let's replicate this! I'm sure we
can agree on a way of finding an expert; if you wish, he can be
anonymous (not known to either side). Dror suggested a letter to
"Prof. Zalman Hayadua". That's fine! He ("Prof. Zalman
Hayadua") will get agreed-upon written instructions, and he'll
produce lists, and we'll be able to run the thing from there.
21. War and Peace: First of all, I've got to see the list of
appellations, to be able to judge whether it begins to make any
sense. In fact, I (Yisrael) can't judge, but I can try to find
out from somebody else. Here we're back to Point 20, and we can
use that kind of procedure to judge the list.
IF it does, it IS interesting to have established that by
careful cheating, you CAN do this kind of thing. But you've
still got to establish that that kind of careful cheating was in
fact feasible under the circumstances, you've still got to tell a
plausible story. Was Havlin in the conspiracy? How about Gans
(who says he started by trying to break the whole business)?
If something is just barely possible with careful cheating, that
doesn't mean that everybody who did it is a cheater. Maybe it's
possible without cheating, too. If you show that by careful
work, you can make a good counterfeit $100 bill, that doesn't
mean that everybody with a $100 bill is a counterfeiter; you've
got to find the printing press, or at least to make it plausible
that he had one.
That's about it for now.
I'm thinking of sending this to the people who were at Zarka
Ma'in. Any objections?
Kol Tuv, Shabbat Shalom,
Yisrael