This is not an html file just grab it.

<p> <H1> This is not an html file just grab it.
<p> <H3>

Date:     Fri,  17 Jan 97 15:41 +0200
From:      <AUMANN@HUJIVMS>
To:        Maya Bar Hillel <MSMAYA@PLUTO.MSCC.HUJI.AC.IL>
Cc:       Dror Bar-Natan <drorbn@math.huji.ac.il>,
          Ilya Rips <rips@math.huji.ac.il>,
          aumann
Subject:  What would convince me?

Shalom Maya,

> Date: Sat, 11 Jan 1997 23:52:14 +0200 (WET)
> From: Maya Bar-Hillel <msmaya@olive.mscc.huji.ac.il>
> To: aumann israel <aumann@vms.huji.ac.il>
> Cc: Dror Bar-Natan <drorbn@math.huji.ac.il>,
        msmaya <msmaya@olive.mscc.huji.ac.il>

> ...  please consider this a written request for suggestions as to
> what  kind  of  results which Dror and I might come up  with  you
> would  find more convincing than those I already told  you  about
> (conditional, of course, on our rechecking that there is no error
> in the numbers).

1.   This is mainly to respond to the above request.   Let me say
at  once that I find your approach interesting,  and think it has
potential.  There are two major (somewhat related) points that  I
will make below:

a)     The  statistical analysis,  though it seems attractive  at
first, turns out, on examination, to be unsound.  That is to say,
the idea is interesting, but is not carried out in an appropriate
way.  Below I will suggest more appropriate ways to carry it  out
(these  suggestions have NOT been checked out with either Rips or
Witztum or, for that matter, with anybody else).

b)    A  charge of (intentional or unintentional) "cheating"   --
such as is inherent in your story -- should be plausible in  view
of  the  "uvdot  bashetakh;" i.e.,  in view of  things  like  the
chronology,  the necessary extent of the circle of  conspirators,
and so on.  You DON'T necessarily have to have a proof that would
yield  a conviction in a court of law;  you DON'T have to believe
what the suspects say; but you DO have to relate to these things,
there has to be some kind of plausible story,  it can't be wildly
implausible,  you can't just wave the facts away.   Dror realizes
this.


2.   This  is being written in a big hurry,  as I have a  million
things to do before leaving for abroad on Sunday.   So I will try
to  be accurate and precise and reasonably complete,  and to  use
appropriate phrasing,  but I may slip up here and there.   Please
forgive  me.   I  am sending a copy to Rips (who will  presumably
pass it on to Witztum), so if I misunderstood something they told
me, they can correct it.


3.   It's important to get the chronology straight. The following
chronology  is partial in the sense that not all relevant  events
are  included,  but as far as it goes,  I believe it's correct (I
don't  know  the order within any given item).   I will  not  use
dates  -- these can be checked out -- but present the  events  in
their  correct  time  order.    (In  general, I  don't  know  the
chronological order within each of the items below.)

   A. Experience  with  one-dimensional,   and  later  with  two-
dimensional ELS's.

   B. It is noticed  (I believe by Rips)  that the Rambam appears
in close proximity to "Mishne Tora" with a skip of 613. Also that
Herzl  (the founder of modern Zionism) appears in close proximity
to a phrase that includes his BIRTHDAY.   There is NO  experience
with Tora personalities vis-a-vis their dates.  (This was told to
me by Rips last Friday,  January 10, 1997 -- BEFORE I reported to
him on the Zarka Ma'in meeting).

   C.  RIPS  suggests checking Tora personalities vis-a-vis their
dates in a systematic way.  (Confirmed to me by Rips last Friday,
January  10,  1997  -- also before my report on the  Zarka  Ma'in
meeting.)

   D.  The first list is generated.  Havlin is approached for the
appellations,  Urbach for the date forms,  the dates are  checked
out   and  (where  necessary)  corrected,   spelling  forms   are
determined.  The statistics P1 and P2 are defined.

   E.    The test is performed on the first list.  Both P1 and P2
-- which at that time were considered significance levels -- turn
out amazingly small.

   F.    The  results are sent to Diaconis.  He asks (inter alia)
for  the  same  test  to  be carried  out  on  a  fresh  list  of
personalities.

   G.    The  second list is generated and tested,  using exactly
the same test as for the first list.  Again, high "significance."

   H.     The results are sent to Diaconis.   He is  unconvinced,
asks for a permutation test,  and asks that the first list not be
used in a formal test.

   I.   P3 and P4 are defined.

   J.    The details of a formal test are agreed between Diaconis
and Aumann (I'm trying to avoid pronouns, because they often lead
to confusion).

   K.    The  formal test turns out significant at a level of  16
out  of  a  million.   (That is,  the best  result  of  the  four
statistics is 4 out of a million, and then Bonferoni.)


3.   Now  let's get to Maya's tests.   The idea is that  the  WRR
(Witztum-Rips-Rosenberg)  test  involves many arbitrary  choices.
For  each  of  13  such choices,  Maya  looks  whether  the  test
statistic  comes  out better when the choice is made as  it  was.
One  might  expect  that it comes out better in  about  half  the
cases,  and worse in about half the cases. But Maya finds that in
each of the 13 cases,  WRR's choice was to their advantage.  This
seems  highly improbable,  UNLESS the WRR statistic was  observed
BEFOREHAND  to   react favorably to at least some of the  choices
involved.  And of course, if one does this, it is less surprising
that one can generate significance at a high level.


4.    For  this  to  make sense,  clearly  the  statistic  to  be
calculated in connection with each choice should be the one  with
which  WRR  were working at the time that the choice in  question
was made.  HERE IS PROBLEM NO. 1 WITH MAYA'S TESTS:  She does NOT
do this.
     The statistic Maya uses is the rank order out of ten million
random  permutations.   But the entire test  -- dates,  spelling,
appellations,  date forms,  EVERYTHING, was fixed BEFORE Diaconis
suggested  the permutations.   Using the permutations here is  an
inadmissible anachronism -- it's like asking why the defenders of
Metzada didn't use Uzi's.
     Indeed,  many  of  the choices that Maya examines were  made
before  the FIRST list was tested,  i.e.,  even  before  Diaconis
suggested the second list, and a fortiori before he suggested the
permutations.


5.   Of  course,  it is remotely possible that before testing the
first list,  WRR foresaw (from Dilugim? :-) ) that Diaconis would
ask them for a fresh list of personalities, and also foresaw that
still  later he would ask them to use a permutation  test  rather
than  the test they had been using.    Theoretically,  one  could
even  raise the possibility that Diaconis himself was part of the
conspiracy,  though  I  think that everybody concerned  would  be
willing to rule THAT one out.
     What   I'm   suggesting  is  that  in  carrying   out   this
investigation,  one  must  stay more or less in the realm of  the
plausible;  one must maintain common sense.   And,  common  sense
calls for using the statistics P1 and P2, and NOT the permutation
test, to test Maya's hypothesis.


6.  Let us now look at each of the 13 choices that Maya examined.

  1:  When Margaliot had an incorrect date,  WRR substituted  the
correct  date  -- rather  than using  Margaliot's  incorrect  one
(first list).

  2:  Same (second list).

  3:  When Margaliot had an incorrect date,  WRR substituted  the
correct date -- rather than omitting the item (first list).

  4:  Same (second list).

  5:   WRR used birthdays as well as deathdays;  they could  have
used just the death days (first list).

  6:  Same (second list).

  7:   WRR used both forms for each of 15 and 16; they could have
used just "tet-vav" and "tet-zayin" (first list).

  8:  Same (second list).

  9:   The form "be-alef be-tishri" could have been used, and was
not (first list).

 10:   Same (second list).

 11:    WRR  used an incorrect first list,  because of  incorrect
measurement of the length of a column (they claim by mistake, but
maybe it was really on purpose).   They could have used a correct
list.

 12 & 13.   Same for the second list.  In this case Maya examined
two rather complicated alternatives,  but did NOT (!) examine the
alternative of simply using the correct second list.


7.   Maya's 13 tests fall naturally into two classes: 1 through 8
and  9  through 13.   The first class (1 through 8)  consists  of
cases  where  WRR expected beforehand that the results  would  be
improved by their choice;  eventually,  according to your report,
they  indeed were.  Tests 1 through 4 are good examples of  this.
Clearly, if one thinks that dates are important, it's a good idea
to get the right dates!   That the right dates score better  than
the wrong ones -- or than none at all -- should be no surprise IF
the research hypothesis is correct.   Maya,  of course, rules out
the  possibility that the research hypothesis is  correct  -- but
you can't assume that in your analysis!
     Similarly for Tests 5 and 6.   WRR think that birthdays  ARE
significant  -- remember,  the  whole idea started  with  Herzl's
birthday (see 3B above)!
     And  similarly  for 7 and 8.   One must  remember  that  the
reason  for using "tet-vav" and "tet-zayin" is to avoid using the
Name  of the Lord in vain.   But IF there IS a code in  Bereshit,
then  that  would  not be using the Name  in  vain!   After  all,
Bereshit  is full of entirely explicit occurrences of  the  Name;
the hypothesis that the author of Bereshit is willing to use  the
Name does make some sense.  THAT is reason that Rips suggested at
the outset to use both forms.  And apparently, it works!
     Applying Maya's idea to this kind of phenomenon is a  little
like  saying that in testing whether surgery + radiation works in
treating cancer,  one must try the treatment without radiation!


8.   Tests  9 through 13 are admittedly different.   In 9 and  10
there  is no apparent reason for having left out  the  additional
form  (WRR  wrote  that they received an expert opinion  to  this
effect,  and  the  expert has meanwhile died;  but  we'll  ignore
that).  In Tests 11, 12, and 13, WRR admit to measurement errors,
which Maya claims are to their advantage.
     Are they indeed?
     NO!
     BOTH in Test 9 AND in Test 10,   min(P1,P2) actually becomes
SMALLER  when  the  form  "be-alef  be-tishri"  is  added.    And
remember: that -- NOT the permutation -- is the correct statistic
to  use  (see  Point 5 above).   So WRR's  choice  was  to  their
DISADVANTAGE in both Tests 9 and 10.


9.    In  the case of Test 11,  Maya is right!   min(P1,P2)  does
become somewhat larger when the correct list is used.


10.    The  case  of  Tests 12 and 13 is  somewhat  strange.   As
mentioned  above,    in  this  case  Maya  examined  two   rather
complicated alternatives, but for some reason did NOT (!) examine
the simple alternative of just using the correct second list.  If
one  does  use  this simple alternative,  one  finds  again  that
min(P1,P2)  becomes  smaller when the correct list is  used.   So
again, WRR's mistake was to their DISADVANTAGE.


11.    Since the the second list was the one actually used in the
formal  test  on which Diaconis and Aumann agreed,  and that  was
eventually  published  in  STATISTICAL SCIENCE,  it  is  of  some
interest to ask how the use of the correct list would affect  the
true  significance  level -- that given by the permutation  test.
That  is,  quite apart from the question of cheating  -- which we
have  seen  is  NOT  indicated by this  mistake  -- the  question
arises:  In view of this mistake,  does the result in fact remain
valid?
       Answer:   YES,  very  much  so.   The  significance  level
improves  by  a  FACTOR  of 40 -- from 16 in a million  to  4  in
ten million!  And this 4 in ten million is itself only because of
Bonferoni -- the true best result is  1  in 10,000,000,   and may
be even better (only 10,000,000 permutations were examined).


12.    Summary:   Out  of  Maya's  13  tests,  the  first  8  are
disqualified on conceptual grounds.  Of tests 9, 10, and 11, Maya
is right in one,  wrong in two.   Tests 12 and 13 seem  contrived
and   complicated;   if   one   replaces  them  by   a   natural,
straightforward test, Maya is again wrong.  Final score for Maya:
1 in 13,  or 1 in 5,  or at the very best, 1 in 3.  But no matter
how you score it, there's NO indication of cheating.


13.  Here's another choice that WRR made that they didn't have to
make at all,  and that was definitely to their DISADVANTAGE:  The
addition  of the statistics P3 and P4 (see 3I  above).  This  was
rather  late  in the game -- AFTER the permutation test had  been
suggested;  so if they were cheating,  they should have known  by
that time what they're doing.   But it cut down significance by a
factor of 1.6 -- from 10 in a million to 16 in a million.
     Again,  it COULD all be a nefarious plot -- a kind of decoy;
having  foreseen  Maya's test as well,  they wanted to  show  how
honest they are.  But how plausible is that?


14.   Let's now consider the matter of "stars," which Maya raised
at  Zarka  Ma'in.  This  sounds  interesting  at  first,  but  on
consideration,  it's  not clear that there's anything to it.  One
must remember two things.  First, by all accounts, EVERYTHING had
been  fixed  by the time the permutations  were  suggested.   The
matter  of  stars must therefore be evaluated in the light of  P1
and P2,  or if you wish,  in the light of the distribution of the
c(w,w').   The  second thing to note is that WRR's contention has
been,  all  along,  that there are an unusually large  number  of
unusually  small  c(w,w') -- i.e.,  stars (look at the bar graphs
on P.437 of their article). If you look at the construction of P1
and P2, that's what it amounts to.  So what's it all about?
       I'll agree that this bears further looking into.   But for
the time being, I see nothing there.


15.   Burden of Proof:  At Zarka Ma'in, Maya said that the burden
of proof is now on WRR.  I don't see it that way.  They've gotten
a  very  high significance level.  So far,  it's stood  up  under
examination.   Asking questions and raising possibilities,  which
on  examination turn out groundless,  is not enough.


16.   So now, what kind of results that Dror and Maya might  come
up  with  would be more convincing than those that  Maya  already
discussed?


17.   Before answering this,  let me turn the question around and
ask YOU,  Maya:  What kind of results that WRR might come up with
would be more convincing to you than those already published?
     The answer is public knowledge:  There are none.  You are on
record  as  saying  (in the discussion after my  "Rationality  on
Friday" talk) that NOTHING will convince you.
     That's OK.  Michelson and Morley believed in an ether to the
end  of their lives.   Though the experiment that  destroyed  the
ether was their own, and though they kept refining the experiment
and  never  found any effect at all,  they kept believing in  the
ether.   A  scientist  does  not need to keep  an  open  mind  on
everything; like anybody else, he has a right to his faith.
      In this respect, I have an advantage over you.  I've always
been very,  very skeptical about this business.  Frankly, I still
am.  Though I can't say why,  I'm far from convinced that they're
right.   In  my bones,  I feel that I need more evidence  -- lots
more.
      BUT  UNLIKE YOU,  I DON'T RULE IT OUT  COMPLETELY.   Though
utterly  fantastic,  it's just barely possible.   I'm keeping  an
open mind,  and I'm going to play it by the rules.  I really want
to  find  out  whether there is a phenomenon  there  or  not.  In
contrast, you're already sure; you only want to find out HOW they
cheated, not IF they cheated.


18.  Back to Point 16:  First of all, whatever you do, you've got
to  say  BEFOREHAND  "I'm going to do this and  that  and  that."
You've got to do that BEFORE you actually compute anything.  And,
you've got to give PRECISE criteria for success and failure.  YOU
can  make them up as you wish,  but you've got to tell the  world
BEFOREHAND what they are.   And success or failure, you've got to
tell us afterward how your tests came out.  So we can keep score.
     That's what they did.  I didn't believe they would, but they
did.   And if you want to convince ME, you're going to have to do
the same.
     If at first you don't succeed,  you can keep  trying.   Just
tell us BEFOREHAND what you're doing,  and what the criteria are,
and  whether or not this test is going to be definitive,  and  so
on.   You  can keep it open,  or close it,  or do what you  want.
Just tell us.  Beforehand.


19.    Now  to  specifics.   As you  yourself  pointed  out,  the
procedure  for calculating the c(w,w') is very complicated.   You
can ring the changes on it in lots of ways.   For example,  there
are  lots  of ways of perturbing the ELS's.   Or you  can  use  a
different  distance function.   Or you can raise the 8.   Lots of
possibilities there.   I don't have time now, but I'll be able to
make some specific suggestions.
       I haven't checked this out with Ilya and Doron (in fact, I
didn't check out anything in this letter with them,  except where
I  cite them explicitly),  and they should tell me if they  think
I'm wrong, and why.  But on the face of it, all these things seem
to me to be neutral to substance (UNLIKE your tests 1 - 8). So if
in all these cases, or in disproportionately many of them,  WRR's
choice was to their advantage, then I'll sit up and take notice.
     You'll  still have to tell a plausible  story.   And  you'll
still  have to explain things like Gans.   You won't be home yet,
but at least you'll be on first base.


(19a.  There's  a  small  problem of  credibility  here.  I  have
complete faith in your honesty.   But the honesty of WRR has been
impugned,  so  if one wants to maintain objectivity,  one  should
address  this  matter for your tests too.   I think this  can  be
overcome; there are so many parameters floating around here that,
for  example,  one could choose a test or tests at random from  a
large number of possibilities.)


20.   Appellations and dates:  Let's replicate this!  I'm sure we
can agree on a way of finding an expert;  if you wish,  he can be
anonymous (not known to either side).  Dror suggested a letter to
"Prof.   Zalman  Hayadua".   That's  fine!    He  ("Prof.  Zalman
Hayadua")  will get agreed-upon written instructions,  and  he'll
produce lists, and we'll be able to run the thing from there.

21.    War and Peace:   First of all, I've got to see the list of
appellations,  to  be able to judge whether it begins to make any
sense.   In fact,  I (Yisrael) can't judge, but I can try to find
out from somebody else.   Here we're back to Point 20, and we can
use that kind of procedure to judge the list.
       IF it does,  it IS interesting to have established that by
careful  cheating,  you CAN do this kind of  thing.   But  you've
still  got to establish that that kind of careful cheating was in
fact feasible under the circumstances, you've still got to tell a
plausible  story.  Was Havlin in the conspiracy?   How about Gans
(who  says  he started by trying to break  the  whole  business)?
If something is just barely possible with careful cheating,  that
doesn't mean that everybody who did it is a cheater.   Maybe it's
possible  without  cheating,  too.   If you show that by  careful
work,  you  can make a good counterfeit $100 bill,  that  doesn't
mean that everybody with a $100 bill is a  counterfeiter;  you've
got to find the printing press,  or at least to make it plausible
that he had one.

That's about it for now.

I'm  thinking  of sending this to the people who  were  at  Zarka
Ma'in.  Any objections?

Kol Tuv,  Shabbat Shalom,

Yisrael