Computer Centre 2000 project

 

VIDEOCONFERENCING FOR HEP RESEARCHERS

 

INTRODUCTION

In the last year’s packet-mode videoconference has found a very important place in the academic world and has been adopted by the major HEP (High Energy Physics) sites and collaborations. Both ATLAS and CMS in their Computing Model Proposal clearly indicate that videoconferencing is an essential part of the tools needed to collaborate worldwide, which can be verified by the intensive usage of CERN videoconference facilities by LHC experiments and by the success of the CERN VRVS project.

This project complements the project present last year, which has not completely implemented due to the lack of proper funding.

 

There are presently two major forms of video-conferencing:

 

1.      Packet video-conferencing

Since several years a number of initiatives have been undertaken to overcome the many limitations of CODEC video-conferencing by using workstations, packet networks and software applications enabling cooperative distance working. This is what is now called Packet Video-Conferencing. From the user's point of view, a key feature of Packet video-conferencing is that it allows the implementation of software application sharing, and shared whiteboard. Another key feature of Packet VC is that it may be at no direct cost for users who can perform Packet VC over an existing network.

2.      CODEC video-conferencing

This is the best known and longer in use form of video-conferencing. Closed systems are available from industry and can be installed in conference rooms as dedicated equipment. CODEC systems communicate over ISDN links.

Videoconference sessions among several sites require an additional expensive piece of equipment called Multi-point Control Unit (MCU). Telephone companies are selling MCU services together with a scheduling system. This scheduling system imposes a major constraint. Indeed a scheduling system may require booking as much as two weeks in advance, which is a very stringent limitation.

 

 

OBJECTIVES

The objective is to complement and develop the work initiated in 1999 and further investigate all the topics of modern video conferencing tools. The main goals are:

·        Consolidate the installed infrastructure and expand it to other LIP sites.

·        Deploy videoconference systems and further consolidate them as a service.

·        Validate prototypes with our partners CERN, FCCN and other remote research and academic institutions.

·        Improve existing tools and develop new ones where needed.

·        Further improve quality of service.

 


PACKET-MODE VIDEOCONFERENCE

Packet-mode video-conferencing makes use of workstations equipped with audio and video capabilities and connected over regular IP packet networks (Internet/MBONE). It can be cheap, flexible and well integrated in the users computing environment. The applications can be supported on a wide range of UNIX workstations, PCs and in some cases Macintoshes. Public-domain software is available, and ensures good interoperability.

 

Packet-mode videoconference is a technology that can decrease of traveling needs to attend meetings and conferences hence decreasing costs and increasing productivity. This kind of technology is also helpful for geographically scattered organizations since allows cooperative work between remote researchers.

Another potential use of this technology is remote intervention to solve problems, for instance in data acquisition systems, allowing engineers and experts to watch equipment and trouble shoot problems without being actually present.

 

Most packet mode videoconference applications can run in two modes:

·        UNICAST MODE - A UNICAST packet is a packet addressed to a particular, single system. Only the recipient will "see" this packet since its network interface knows about its own particular address. All the other stations on the subnet will not read this packet since the packet destination address differs from their own addresses.

Unicasts cross bridges transparently. Since the bridges know about the network topology only the segments that have to support the traffic will receive the packet.

·        MULTICAST MODE - A MULTICAST packet is a packet addressed to a group of nodes. The destination address is particular to the group of systems it wants to reach. Network interfaces are only listening to the groups the system should listen to (requested by the applications on the node). Bridges forward multicasts, and since they cannot know where the potential destinations are located, the multicast packets are sent to all interfaces (flooding the whole network) unless special switches are deployed.

 

 

PARTNERS

Collaboration already exists with CERN IT division Internet Applications group that is in charge of the CERN pilot Packet video and video-conferencing activities. The Internet Applications group is conducting the Video-conferencing project for LHC collaborations officially approved by the LHC Computing Board (LCB). LIP continues particularly interested in this project which covers a wide range of technologies.

 

FCCN (Fundação para a Computação Científica Nacional) the Portuguese foundation which coordinates the Portuguese academic and research network (RCCN) is officially researching and supporting packet-mode-video-conference activities within the Portuguese network. LIP is collaborating with FCCN to develop and promote the RCCN videoconference infrastructure and their usage.

 

 

 

 

 

THE VIRTUAL ROOM VIDEOCONFERENCING SYSTEM (VRVS)

The Virtual Room Videoconferencing System (VRVS) is a packet mode video-conferencing system based on the LBNL and UCL applications that are well adapted to HEP needs, and is being developed by CERN and Caltech.

The VRVS system is based on a “Virtual Videoconference Room”. A series of IP Servers/Reflectors connects users within a virtual room via a set of interconnected IP tunnels, so that they form a private video-group. Each participant sees the others in the “virtual room” through a series of windows.  A web user-interface provides worldwide secure access, on demand, to each virtual room. The “Virtual Rooms” concept makes conference access and scheduling easier, and makes effective bandwidth management on critical links.

 

Figure 1

The video Reflectors run on Unix platforms, and interconnect the users joining a virtual room by permanent IP tunnels, forming a set of virtual video sub-networks. Participants at any location can join videoconferences by contacting their "closest" reflector. In order to make efficient use of the bandwidth, packets (video, audio and data streams) are sent through the tunnel between two reflectors only if there are participants on both sides.

In addition, the network reflector topology is chosen taking into account both geography and the bandwidth available on each network link, in order to optimize the network-connectivity paths. The extension of the virtual video sub-networks has progressed by installing several reflectors in Universities and HEP laboratories:

 

Switzerland

CERN

Italy

CNAF Bologna

UK

Rutherford Lab

Germany

Heidelberg University

France

IN2P3 Lyon, CPPM Marseille

Spain

IFCA-University Cantabria

Finland

FUNET Helsinki

Venuzuela

CeCalULA

Taiwan

Academia Sinica

Portugal

LIP

Russia

Moscow State University, Tyer University

USA

Caltech, LBNL, SLAC, FNAL, ANL, Jefferson Lab, BNL, DoE HQ Germantown

      

 

The use of Web technology allows any authorized user, at any location, to access a wide range of services for packet-based videoconferencing. The Web-based user interface supports the centralized conference scheduling, coordination and access control.

CURRENT RESULTS OF THE VRVS PACKET SYSTEM

Since the system went into trial, the system has been deployed and expanded to 1131 registered hosts running the VRVS software.

 The following graph shows the evolution of the number of machines registered in the VRVS system as from 1st January 1997.

 

The following table shows the number of real-work meetings (excluding test sessions) from 1st January 1998 to 20 May 1999. The different meetings involved at least one site from Europe (UK, Italy, France, Spain, etc…), or one site from the USA.

 

 

At LIP several videoconference tests have been conducted in 1999 showing the feasibility of using the system with the current available bandwidth. The situation improved with the bandwidth upgrade (from 8 to 15 Mb/s) between Portugal and the TEN 155 (Trans European Network) backbone. Many CERN and non-CERN broadcasted events have been attended at LIP using VRVS and the Mbone systems.

Events such as the VIII International Conference on Calorimetry in High Energy Physics organized in Lisbon by LIP have been broadcasted worldwide using packet-mode videoconference.

The first VRVS reflector in Portugal was installed in September 1999, simultaneously with the installation of a Videoconference infrastructure in the main meeting room which is now being tested and is expected to became fully operational in October.

Strong interest in this project has been shown by the LIP community and many physicists are interested in using the facilities as soon has they became operational and others are using them as beta testers.

 

 

TASKS

The identified main tasks to be performed are:

·        Packet video-conferencing:

·        Extend the current existing VC infrastructure to include other LIP sites using PCs equipped with audio and video capture boards and low cost video cameras.

·        Development of recording/playback facilities for packet videoconferences.

·        Implementation of a videoconference recording system.

·        Implementation of a video on demand server for videoconferencing playback.

·        Codec and Packet video-conferencing integration:

·        In a mixed environment, there is a clear need for achieving inter-working between CODEC systems and packet-based systems. Via a gateway, CODEC and Packet participants will be able to share the same teleconference.

·        Improve the installation at the LIP main conference room:

·        Add a second data projector to display local video such as presentations in simultaneous with the display of remote video.

·        Add a pair of loudspeakers and corresponding amplifier to cover the whole room.

 

 

 

COMMODITY COMPUTING COMPONENTS

 

INTRODUCTION

In the past years the High Energy Physics community have satisfied his need for computing by deploying RISC/UNIX machines. The new generation experiments, based on LHC (Large Hadron Collider), will demand computing capacities that are at least three orders of magnitudes higher making the usage of the traditional RISC farms unaffordable.

 

At CERN the IT division through the IP section of the PDP (Physics Data Processing) group has established a pilot project to construct and evaluate PC farms. Several PC farms are in study and production:

·        LXPLUS - before the end of the current year CERN will introduce a new central Linux-based public service named LXPLUS.

·        PCSF - The PCSF farm has thirty-five client systems and one server machine interconnected with fast ethernet.  The PCSF service is targeted as an event simulation facility for the LHC experiments, based on Intel Pentium II and Pentium Pro processors running Windows NT.

·        PCRD - The PC Performance Research & Development Project is a pilot farm for many research projects such as NA45, NA48 and COMPASS. It aims at demonstrating that PC farms can be used in a cost-effective manner to process Physics jobs.  The PCRD farm is currently made of a set of 7 Dual Pentium PCs plus 2 low-end RISC machines. The current PCRD PCs runs Windows NT 4.0 and Red Hat Linux to understand the power and limitations of both Operating Systems.

·        NA49PC - The NA49PC farm is dedicated to NA49's PROOF visualization tool.  The NA49PC farm is currently made of a set of 5 Dual Pentium II PCs @ 300MHz. The Operating System is Windows NT 4.0.

·        PC-NOMAD - A PC based production environment for NOMAD's reconstruction

 

LIP has developed activities in this area since 1997. These activities have resulted in the decision of using low cost PCs as a replacement for desktop X terminals. These PCs run Linux in a cluster configuration sharing the operating system and configurations. Tools have been developed to ease the cluster management and integration of new desktop systems. Work is also being done in the evaluation of PCs for high performance computing with both Linux, WindowsNT and FreeBSD.

 

This work is being done in the belief that only by using commodity computing components will the HEP community be sure to align itself with the best price/performance possible and reach the LHC computing requirements at an affordable cost.

 

OBJECTIVES

The objective is to build a small production farm of PCs running Linux.

 

THE FARM

The farm systems should be connected through a Fast Ethernet network. The architecture will be composed of one main server with scsi disks dedicated mostly to data storage and retrieval and several (at least two) stripped systems dedicated to CPU intensive tasks. The know-how obtained in the Linux desktop cluster will be applied and enhanced in this project.

 

The stripped systems will be equipped with the essential minimum, motherboard, cpu, memory, network card, graphics card and a small disk for swapping. The keyboard, mouse and monitor ports will be connected to one data switch which will allow to perform the management tasks for all systems from one console. With this configuration increasing the computing power of the cluster should be fairly inexpensive.

 

The choice of Linux is based on the know how obtained previously which indicate that Linux is better suited to run physics jobs than WindowsNT. Linux is also easily integrated in the LIP computer Center infrastructure that is now based mostly on UNIX systems. Users are familiar with UNIX systems and some are already running jobs in their Linux desktops, this choice will also facilitate the porting of applications to the farm. Finally Linux is widely available, well supported by the academic community, is open source and free.

 

 

 

STORAGE SYSTEMS

 

INTRODUCTION

Disk failures are common in computer centers the mechanical nature of disks makes them extremely prune to malfunctions. Hard disks are the only continuously moving devices in a computer and environmental factors such as dust temperature and spikes can reduce considerably the mean time between failures (MTBF). Failures in disks storing critical information usually means down time with impact to all users using either the broken disk or the system hosting it. In fact the system to which the disk is attached must be brought offline in order to diagnose the problem and if it can’t be corrected the disk must be replaced by a new one, finally the data must be restored from backups. Generally this means that something will be lost. Even when backups are made frequently any change performed after the last backup will be lost. In computer centers where jobs run for long periods (days or even weeks) opened files can’t be completely backed up while the corresponding job is running which can lead to more lost data.

RAID systems due to their redundancy overcome these problems making the data always available. RAID technology is made of different configurations called levels. The level defines how data is written to the drives and the minimum number of drives required.

In RAID level 5 systems the data and parity information is striped among all drives. When one of the disks is not available the redundant information stored in the other disks is sufficient to continue working, and since disks are hot swappable they can be replaced and rebuilt without the need of stopping the system. In practice this means that no data is lost from the broken disk while users can continue to work without interruption.

Since the data is spread across all the disks of the RAID system throughput is also better than in traditional disks, this behavior is also improved by the usage of large memory caches inside the RAID controllers.

 

OBJECTIVES

Implement a high performance fault tolerant disk storage system for critical information such as user accounts and software shared by several systems, in order to improve the reliability and up time of all central services.

 

RAID SYSTEM

The system should be built around a RAID (Redundant Array of Inexpensive Disks) level 5 storage system connected to two UNIX servers through a dual path SCSI controller (RAID side) and SCSI RAID controllers (server side). Load balancing and failover between the two servers should be supported. Network access to the RAID volumes will be performed through NFS and SAMBA. For better network access performance the servers should be equipped with two fast ethernet controllers each. Due to the large storage capacity of the RAID system a DLT 4000 tape drive should be dedicated to RAID backup operations.

In the same framework distributed file systems should be studied in order to give a better network access to the raid systems. Two options will be considered:  

·        DFS (Distributed File System) a standard from Open Software Foundation will be evaluated as an option to AFS. DFS is a standard included in the DCE (Distributed Computing Environment) which is already supported by many major vendors and presents several advantages including lower cost than AFS.

·        CODA is a distributed file system developed at CMU and is descended from AFS2, the last non-commercial release of AFS. CODA supports persistent client caching of whole files so that client activity can continue even when the server becomes disconnected.