
  Para aligned bryd cores and a new bryd core wrapper.

  Also, the max number of threads running DES can be _raised_to_four_. 
  The first two threads will run the optimized core, the second two will 
  run the 'spare' (ie on a P5 the PPRO ones, and vice versa) cores. The 
  second two be naturally be slower than the first two, but its still 
  better than nothing.

  The wrapper, des-bryd.cpp, has compatability functions that emulate the 
  old function names. That way problem.cpp and cliconfig.cpp don't need 
  to be modified to test the modules. 

  bryddes5.asm   replaces bdeslow.asm and bbdeslow.asm
  bryddes6.asm   replaces p1bdespro.asm and p2bdespro.asm
  des-bryd.cpp   replaces des-x86.cpp

  The two asm modules generate *four* cores. To have the new cores 
  generate old obj names (so that the makefile doesn't need to be 
  modified), use
  
  tasm32 /ml /m9 /q /t /dCORE1 bryddes5.asm, bdeslow.obj
  tasm32 /ml /m9 /q /t /dCORE2 bryddes5.asm, bbdeslow.obj
  tasm32 /ml /m9 /q /t /dCORE1 bryddes6.asm, p1bdespro.obj
  tasm32 /ml /m9 /q /t /dCORE2 bryddes6.asm, p2bdespro.obj

Have fun,
Cyrus
