HPC Asia97, Seoul Conference PAPERS WORTH NOTING
                   Rajkumar Buyya
        Centre for Development of Advanced Computing
               Bangalore, India

This  section discusses trends noted in a few selected papers presented
at the HPC-Asia'97 conference. The work presented  by the  authors can be exploited
commercially either by improving existing systems or by building new
ones.

This is one of the sections of HPC Asia'99 Seoul conference report
published by the ATIP (Asian Technology Information Program), Japan/USA.

--------------------------------------------------------------------------------

High Performance RAID System by using Dual Head Disk Structure
* Nam-Kyu Lee (nklee@star.sst.co.kr)
  Samsung Electronics Co. Ltd., Korea;
* Tack-Don Han, Shin-Dug Kin, and Sung-Bong Yang
  {hantack, sdkim, yang}@kurene.yonsei.ac.kr
  Yonsei University, Korea;

Trends and Remarks:
--------------------------------------------------------------------------------

High  performance computer systems use RAID to improve both  I/O
bandwidth and reliability. However, the performance of RAID system
degrades  significantly  for the applications that perform small
read/write operations  frequently.  To  improve  the performance of RAID
systems,  the  authors  have proposed  a new system called DU-RAID (Dual
Head RAID). Simulation  studies carried  out  by them show that dual
head disk model reduces not  only  the maximum  seek  time  but also the
average seek time. This  is  achieved  by dividing the disk surface into
two separate regions, one for each head.

The performance analysis is carried out on IBM 0661, Fujitsu M2652H/S,
DEC DSP3210,  and DEC SP3430 disk models with different workloads. The
following are the findings: For a given dual head disk model, the
average  response  time including any queuing delay can be reduced by
16% compared  to the single head disk model below the 30rq/s arrival
rate and also it can be improved  more than 23% of the time over 40rq/s
arrival rate  depending  on disk  scheduling  algorithm. This shows
that, significant  performance  improvement can be attained by the
proposed dual head system even when  work-load increases.

The above performance analysis shows that the proposed dual head model
can be  exploited by commercial RAID system vendors (such as IBM,
Fujitsu,  and DEC).

--------------------------------------------------------------------------------

ISCN: Towards a Distributed Scientific Computing Environment
* Longsong Lin (lin@horn.el.yuntech.edu.tw)
  National Yunlin Institute of Technology, Taiwan;
* Karsten M Decker (decker@cscs.ch)
  Swiss Centre for Scientific Computing, Switzerland;
* Mark J Johnson (mjohnson@cscs.ch)
  Swiss Centre for Scientific Computing, Switzerland;
* Christophe Domain (cdomain@cscs.ch)
  Electricite de France, France;
* Yves Souffez (Yves.Souffez@der.edfgdf.fr)
  Electricite de France, France;

The  most important component of the next generation  scientific
computing component  is not the high performance computer itself, but
rather, a  distributed  computing  infrastructure of national or even
global  scale.  The authors  in  their work on ISCN (Interactive
Scientific Computing  on  Networks)  have  conducted a feasibility study
and built a  distributed  object-oriented framework for scientific
computing applications on distributed HPC machines.  They  have  built a
client-server  framework  supporting  simple interactive selection of
different remote HPC servers, configurations,  and batch queues. ISCN
also provides interactive access to the running application, even when
submitted to batch queues, interactive supervision/steering of the
application and immediate visualization of results, and an  interactive
mechanism to manipulate output data visualization.

The  ISCN system uses COBRA-based ILL software as the communication
infrastructure. Java language is used to build a portable client which
executes on  desktop workstation or PC. The remote machines can be any
HPC  servers such as NEC SX-4 or SunSPARC 1000.

The operation of ISCN system is as follows (application used in
explanation is  simulation type) : 1) Job servers are installed and
started on the  HPC systems.  2)  The job servers perform a trading
export  operation,  sending their  object identifiers to trader. 3) The
user starts up a  Java  enabled WWW browser. 4) The user goes to the
application interface page on the Web, and  browser  dynamically
downloads the latest version of  the  application Java  applet (which
provides interactive GUI) from the web server.  5)  The user  selects an
HPC machine to use and then client  performs  a  trading import
operation to accredit and fetch the object identifier of  a  server
running  on the desired machine. 6) The user sets up the simulation
script which causes client to request the server to start the
application. 7)  The server launches the application, either on demand
or via NQS queue. 8) When the application is ready, the client sends the
command script to the  server,  which  is  directly passed to the
application.  The  application  then starts the simulation and starts
writing results to the output pipes opened by the server. 9) The client
requests results from the server for immediate visualization, while
application continues the simulation.

The  ISCN project shows how remote HPC systems can be effectively
utilized for scientific computing and also integration of specialized
and  heterogeneous  remote HPC systems, desktop clients, and internet
and  its  enabling technologies for efficient scientific computing.
Commercial vendors of  HPC machines such as IBM, DEC, Sun, Silicon
Graphics could adopt this  technology or  build similar technology into
their product line to increase  usability of their HPC systems by remote
clients.

--------------------------------------------------------------------------------

Parallel  Programming with VPE: a Case Study of an Integrated  Visual
        Programming Environment
* Wentong Cai (aswtcai@ntuix.ntu.ac.sg)
  Nanyang Technological University, Singapore;
* Hung-Khoon Tan (aswtcai@ntuix.ntu.ac.sg)
  Nanyang Technological University, Singapore;
* Stephen J Turner (stev@dcs.exeter.ac.kr)
  University of Exeter, UK;


A  Visual  Programming  Environment (VPE) developed by  the  above
authors allows  to create parallel applications based on message-passing
paradigm using  graphical  notations. The graphical notations are
derived  from  the BLOX methodology proposed by Gilinert in his work on
"Towards Second Generation  Interactive  Graphical Programming
Environments". The  authors  have extended  the BLOX methodology and
made the VPE notations  "expandable"  so that new components can be
added to the component library.

The above work contributes to the field of parallel programming with
visual techniques  in a number of ways: Firstly, it demonstrates
creation  of  expandable  visual  components for parallel programming.
Secondly,  diagram notations  supported by VPE make the task of parallel
program  construction easy and manageable. Thirdly, it integrates the
programming tools (a visual constructor,  visual compiler, and program
visualization tool) with a  consistent visual representation.

The  visual  constructor allows to create  diagrammatic  representation
of parallel application using graphical aids incrementally. The visual
compiler translates such parallel application into an existing parallel
programming languages/subsystems such as MPI and PVM and then to
executable of a target parallel  computer.  Parallel application can be
fine tuned  by  using  the program visualization tool.

This work can easily be exploited commercially as its target code is
based on standard message-passing subsystems such as MPI/PVM, which is
(will  be) supported by most of the high performance computers.

--------------------------------------------------------------------------------

An  Efficient Caching Scheme for Software RAID File System  in  Workstation
      Cluster
* Jong-Hoon Kim, Sam H Noh, and Yoo-Hun Won
  {jkim, noh, won}@cs.hongik.ac.kr
  Hong-Ik University, Seoul, Korea

A  software  RAID system is gaining popularity over  hardware  RAID
system because of changing HPC hardware trends. It distributes data
redundantly across an array of disks attached to each of the
workstations  connected on a high speed network and provides high
throughput as well as high availability.  In order to increase its
efficiency, in this paper,  the  authors have  proposed an efficient
caching scheme (two-level caching)  whose  performance  is better than
the previous schemes (multi-copy caching and  one- copy caching).

The multi-copy caching scheme is proposed by Russel Sandberg et al in
their work on 'design and implementation of SUN network file system' and
the one- copy caching is proposed by A Leff et al in their work on 'An
efficient LRU based  buffering in a LAN Remote Caching Architecture'.
These  schemes  are designed  for conventional file systems and adapted
for the  software  RAID systems. The main advantage of multi-copy
caching scheme is that it reduces access  time of remotely located data.
It is achieved by keeping  duplicate copies of cached blocks wherever
the block is referenced. The block-invalidate  protocol is employed to
resolve the problem of  block  inconsistency.  The one-copy scheme
requires that at most one copy of any block be buffered in  all  the
system caches at any time. The motivation behind this  is  the rapid
improvement in the network speed. Both multi-copy and one-copy  caching
schemes require four disk accesses to perform a write-back of a
dirty-block.

Both  multi-copy and one-copy caching schemes do not exhibit good
performance  for small-write operations due to the fact that old parity
and  data blocks do not reside on the mother disk for restoring data in
case  of  a failure. The new proposed scheme, two-level caching scheme,
overcomes  this limitation.  The  two-level caching scheme divides the
cache  into  to  two logical levels and allows special actions upon a
write to a block. At  each level, the same LRU policy may be used.
Specifically the first level  works similar  to the one-copy scheme. On
the other hand, those  blocks  in  the second-level  cache are placed
only due to a write request. For write-back of the second level cache,
it simply writes the dirty block to the disk and thus reducing the
number of disk accesses which improves performance unlike multi-copy
caching scheme.

The  proposed  two-level caching scheme  out-performs  previously
proposed schemes.  The hit-ratio of new scheme is 10-25% higher while
the   average busy  time  per request is lower by a minimum of 12% and
up to  maximum  of roughly 40%.

The  performance  study shows that the proposed scheme can be  employed
by cluster  based HPC systems while implementing the RAID system  by
software means. This scheme can also be exploited by commercial vendors
such as  Sun Microsystems.  The Solaris-MC cluster operating system can
incorporate  this scheme in its globalized file system to improve disk
access performance.

--------------------------------------------------------------------------------

Cluster-Oriented Software Development Environment and Its Applications
* Xingfu Wu and Wei Li
  {para, liwei}@cs.sebuaa.ac.cn
  Beijing University of Aeronautics and Astronautics, Beijing, China


Among the main four architectures (vector, SMP, MPP, cluster) for
parallel processing  systems, cluster architecture is emerging as a
viable  solution for  high performance computing. Except for cluster
systems, the other  systems are expensive, require large development
investments, and significantly long development time. To compound to
this  problem, only a few countries have the capability to develop and
produce  these  systems.  In this work the authors have shown how high
performance  computers can   be   built using  available   off-the-self
components   such   as workstations/PCs  which are interconnected
together using  LAN/Switch network.  They  have built eight nodes
systems (called BBP_SPC) by using  one SunSPARC-2  workstation and one
90MHz and six 133MHz Pentium PCs.  These nodes are interconnected using
a 32-bit bus  bridge  network system (BBnet) whose peak transmission
rate is 18MB/s.

The  BBP_SPC cluster system has a comprehensive program development
environment  and  supports  popular message-passing interfaces such as
PVM.  The BBP_PVM  programming environment allows to view this
heterogeneous  cluster system  as  a  single  parallel virtual  machine.
The  visualization  tool BBP_ParaTool  provides  an intuitive, dynamic,
and graphical  animation  of parallel programs and graphical summary of
their performance. This helps in improving performance by
algorithm/program restructuring.

The  BBP_SPC system demonstrates how high performance parallel systems
can be built with minimum time and money, but achieving performance
comparable to  traditional  high  cost architecture machines.  Parallel
execution  of remote  sensing  application  on BBP_SPC system  shows
better  performance compared  to  the  parallel execution on the network
of  four  SunSPARC-10 workstations which connect the file server Sun670.

The authors feel that high performance computer systems can be easily
built with minimal investment  of time, money, and resources; and such
machines can be commercially developed.  Both serial and parallel
applications can be executed  with  improved performance. The operating
environment must be familiar and  exhibit a single-system image.

---------------------------------------------------------------------