James E. Cabral, Jr., Christian Deforge, and Yongmin Kim
Image Computing Systems Laboratory
Department of Electrical Engineering, Box 352500
University of Washington, Seattle WA 98195
With the goal of eventually increasing the quality of medical care, especially in remote areas, we have developed a system for telemedicine research based on a combination of ATM networking and a high-speed DSP board based on the Texas Instruments TMS320C80. The purpose of the system is to give health care providers at remote locations the ability to consult with specialists using a combination of video, audio, and externally-acquired images. The system can also be used for education purposes to support bi-directional video/audio communications for grand round lectures, classes, and case conferences. In order to maximize the utilization of the available transmission medium (ranging from land-based copper and fiber optic cable to satellite link) while providing the best possible video and audio quality, the compression performed by the system is adaptable to a wide variety of bandwidths. After about two years of experience with telemedicine in a research environment, we have some preliminary findings to report regarding the performance of a telemedicine application combining ATM and programmable multimedia processors in PC environments.
Keywords: telemedicine, ATM, MediaStation 5000, MPEG, multimedia, medical image processing.
In order to gain first-hand experience with telemedicine and systematically analyze and collect its requirements, we have developed a prototypical telemedicine workstation.1 Because of its flexibility and processing power, the MediaStation 5000 (MS5000)2 which was designed and implemented by our team has been used as the centerpiece of the workstation. The MS5000 is a single-board multimedia system capable of digitizing stereo audio and video, displaying up to 1280x1024 pixels, and performing image processing tasks requiring up to 2 billion operations per second, including real-time MPEG-1 or H.320 compression and decompression, using the Texas Instruments TMS320C80 digital signal processor.3 Coupled with an asynchronous transfer mode (ATM) network adapter, the system has been equipped to transmit and receive video, audio, and medical images. Because the MS5000 is programmable, it can also perform image display, image processing, and graphics functions such as window and level, unsharp masking, and 3-D reconstruction.
Two telemedicine workstations were connected between the University of Washington and Madigan Army Medical Center (50 miles away from the University of Washington) in early 1995 and demonstrated to a group of physicians from the University of Washington, Madigan Army Medical Center and Seattle Veteran's Administration Hospital. A workstation, a Fore Systems (Warrondale, PA) ASX-200 ATM switch and a DS-3 connection to a US West regional fiber ring supporting ATM over SONET were installed at each site. X-ray, CT and MR images were exchanged, manipulated and discussed interactively between the two sites. Ultrasound video was digitized, compressed at one site, transmitted, decompressed and displayed at the other site in real time using MPEG-1 at 30 frames/second. The general response from the physicians was positive, particularly with regard to the image quality, real-time MPEG video and responsiveness of the system.
Since early 1995, several areas for improvement have been identified and remedied. In the original prototype, the host workstations were 486-based PCs with dual VESA local and EISA buses. Testing showed that the dual bus configuration, coupled with slow CPUs, resulted in low bus throughput and low overall performance. Image transfer throughput was limited to 13-15 Mbps out of the 45 Mbps DS-3 services. To eliminate this bottleneck, the host computers have been replaced with Pentium-based PCs each with a single PCI bus. To support the new PCI bus and to add bus mastering and improved video capture, the MediaStation 5000s based on the VESA local bus were upgraded to a derivative board, the Precision MX from PDI (Redmond, WA). Similarly, to support the new PCI bus, the EISA-based Fore ATM network adapters were upgraded to new PCI versions.
Software for both the host and the C80-based board has also continued to evolve. In the original configuration, images were transferred between workstations using a file transfer method analogous to Internet FTP (file transfer protocol). This required both workstations to read the image files from disk and convert the image files into image objects stored in memory on the MS5000. To speed up this process, images are read from disk and converted to objects only by the workstation sending the images. The images are then transferred across the network as objects directly into the memory of the receiving workstation. Another advantage to this approach is that, by standardizing on a portable image object, image conversion functions for other image formats can be localized to the workstations which directly deal with those image formats. For example, in a telemedicine network which interfaces to, among others, an imager which uses a proprietary image format, as long as one telemedicine workstation can serve as a gateway to the imager, the images may be shared on the telemedicine network.
Synchronization between users was another problem in the original implementation. If both users attempted to manipulate the same object simultaneously, it could produce an unpredictable result. In many cases, the result was not the same for both users, in which case the views were no longer synchronized. To remedy this, software locks were created to prevent more than one user from manipulating an object at the same time.
The following requirements for a telemedicine system have been compiled based on our experience with telemedicine. Figure 1 illustrates a possible telemedicine workstation of the future4 which would be used in a remote location. This workstation would include a PACS interface (if any), an ATM interface, an electronic stethoscope, a digital blood pressure cuff, a monitor with speakers, a microphone and a camera, a PC, and a film digitizer. The corresponding system used by the consultant would be similar, but may not require media acquisition devices such as the film digitizer and stethoscope. Depending on a specific application and the level of expertise of the users, multimedia systems for telemedicine will require different combinations of the hardware and/or software components described under each of the following categories.
Figure 1. A Future Telemedicine Workstation
Images may be obtained from a number of sources. Still digital cameras can be used for acquiring high-resolution images, e.g., for teledermatology or telepathology. X-ray films are digitized with laser scanners while images from digital imaging modalities such as MR, CT and CR are available directly in digital, DICOM-compliant formats.
Digital video may be either available directly (e.g., digital bitstreams from CD ROM, digital video disks or next-generation ultrasound machines) or acquired through the combination of a video digitizer and an analog video source, such as a video camera, an ultrasound imager or an endoscope. Audio may be obtained using an audio digitizer while a digital stethoscope could provide another audio source.
Our telemedicine prototype supports any VHS or SVHS video devices, including standard videocassette cameras while audio may be obtained either through the video camera, through a mono/stereo microphone or both. One lesson we have learned with regard to video for telemedicine is that the video acquisition hardware should support image capture in 4:2:0 format. This format is required by MPEG-1 and H.261, the video-conferencing compression standard. Because the Precision MX only supports video acquisition in 4:2:2 format, we need to do the conversion to 4:2:0 in software which reduces the computational power available for other image and video processing.
Video and audio clips and medical images require temporary or permanent local storage. This storage can be provided through either magnetic and/or magneto-optical (MO) drives. If the telemedicine system is incorporated within a PACS, new incoming and old comparison images can be stored permanently in the PACS archive and accessed by the telemedicine system as needed. For reasons of simplicity and cost efficiency, the telemedicine prototype uses standard hard disk storage.
Image processing requirements for telemedicine applications can be derived from duplicating the functionality available to corresponding tasks performed in the clinical environment without telemedicine. For example, radiologists and clinicians often have default orientations for images (e.g., the left portion of the image corresponds to the patient's left) which can vary between hospitals, departments, or imaging modalities. Basic image manipulation functions such as 90 degree rotations, horizontal and vertical flip are essential to correct the errors in image acquisition and assure that images can be presented to the clinicians in a way that they are accustomed to viewing them. This is particularly important in teleradiology. Zooming and panning are necessitated by the limited spatial resolution of CRTs when compared to X-ray films. Real-time window/level (brightness and contrast adjustment) is required by the need to interactively examine medical images with more than 8 bits/pixel by adjusting the range (window) and the center position (level) in the wide input dynamic range.5 In the case of diagnostic video, manipulation functions such as play, record, pause and rewind are important for simulating the VCR environment often used in ultrasound consultation.
The telemedicine prototype assigns image and video processing tasks to the MS5000 or Precision MX. The programmability and computational power of the TMS320C80 make it a powerful computing engine for performing typical image processing tasks at interactive rates. For instance, real-time window/level and zoom and pan capabilities help to increase interaction with the user and to simplify synchronization with the remote workstation.
Compression of medical images has been historically reversible or "lossless," limiting compression ratios to between 2:1 and 4:1. Lossy compression schemes have not been widely used for both clinical and legal reasons. However, standard and newer compression algorithms such as JPEG and wavelet-based compression can yield "visually lossless" images with compression ratio between 10:1 and 20:1.6,7 They produce statistically identical diagnostic results compared with using the original images without any lossy compression. If this kind of compression is properly used and it does not require much additional time for compression and decompression, it can significantly reduce the communications bandwidth, storage requirements and overall delay in the telemedicine systems.
The accepted international standard for video-conferencing is H.3208 which includes support for video (H.261) and audio (G.722, G.728) compression/decompression, multiplexing and synchronization, as well as document sharing (T.120). H.320 is designed to work over the range of ISDN connections (from 64 kbps to 1.92 Mbps). There exist other compression standards which support higher-quality video and have correspondingly higher bandwidths requirements. Motion JPEG is a "symmetric" codec (it requires roughly the same amount of computation to encode and decode frames) which eliminates intraframe redundancy. Better compression ratios could be obtained by utilizing both interframe and intraframe redundancies. These algorithms require "asymmetric" codecs (which take significantly more computation to encode than to decode the frames due to motion estimation between successive video frames) such as MPEG-19 or MPEG-2.10 MPEG-1 is mainly designed to compress video into a 1.2 Mbps bitstream for VHS-quality or higher-quality video at higher bitrates. MPEG-2 is more flexible and supports various combinations of levels and profiles from VHS-quality up to HDTV-quality video. At the main profile and main level, MPEG-2 can compress 720x480 video at 30 frames per second into a 5-15 Mbps bitstream.11
Some compression can be accomplished in software, depending on the main CPU of the telemedicine system. For example, the public-domain MPEG decoder developed by MPEG Software Simulation Group achieves about 1 frame per second (fps) on a PC with a 66 MHz Intel i486 processor and 1.4 fps on a SparcStation 2 with an 80 MHz Weitek processor. On a high-end Sun SparcStation 20/71 that uses a 3-issue superscalar processor running at 75 MHz, we have achieved about 5 fps. To achieve the necessary image and video compression for telemedicine at real-time rates, a dedicated compression/decompression board based on either programmable DSPs such as the Texas Instruments TMS320C8x family or special compression chipsets such as those from LSI Logic, C-Cube, or SGS-Thomson is normally required.
Our prototype supports a single MPEG-1 video stream for one-way transfer of good-quality video. However, video-conferencing requires two video streams and synchronized audio. To that end, we have worked on porting H.320 to our telemedicine system. In doing so, we have found some important issues to address. The TMS320C80 is extremely programmable and can support multiple tasks simultaneously. In some cases, both image processing and video compression operations need to be executed simultaneously. For example, while video-conferencing to discuss a patient's medical image, radiologists will often need to manipulate the image, e.g., adjusting the window and level. When video-conferencing is paired with real-time image processing tasks such as window/level, we have found that it is possible to overload the processor. The most demanding task is naturally the video compression (H.261) followed by the audio compression (G.722 and G.728). The architecture of the TMS320C80 provides 4 digital signal processors (DSPs) and a RISC processor. On our prototype, the simultaneous encoding and decoding of the video (H.261) requires about 92% of three of the four DSPs and the audio requires about 60% of the remaining DSP, the RISC processor being dedicated for the other H.320 tasks. This does not leave enough processing power for the image processing tasks. For this reason, we decided to move the audio compression to a dedicated audio board. This strategy allows one DSP to be dedicated to image manipulations. The process of moving to the new design was facilitated by the flexibility of our software architecture on the TMS320C80. Our telemedicine software uses the University of Washington Image Computing Library (UWICL) architecture which allows the scaling of an image processing task from 1 to 4 DSPs depending on the load on the TMS320C80.12 Although the execution time of the each image processing function is increased by a factor of 4 when the teleconference is enabled, the processing times are low enough to insure good interactivity.
The user interface should be graphical due to the fact that much of the information being shared in a telemedicine system is inherently graphical. One or more high-resolution displays, a keyboard, a pointing device such as a mouse, and a window manager such as Microsoft Windows make up a basic telemedicine user interface. These are all in addition to the multimedia devices (cameras, microphones, speakers) required by the application.
The telemedicine prototype currently uses a dual-monitor configuration with a keyboard and mouse. The use of two monitors avoids the issue of trying to simultaneously display color images such as video-conferencing or Doppler ultrasound and high definition grayscale X-ray images on the same screen. However, the prototype does support a pass-through mode which, when used with the PC's graphics adapter, will make a single monitor solution possible.
In the future, a telemedicine interface needs to become "simpler." A single display, a single primary input device such as a reliable voice recognition unit for commands and report writing or a virtual reality glove and automation of the most common tasks such as communications connection and disconnection and bandwidth allocation could significantly reduce the "information overload" of telemedicine interfaces and allow the user to focus on the tasks at hand.
Required network interfaces for telemedicine can range from low to high bandwidth, depending on the application. Low bandwidth interfaces should support multiple links as low bandwidth connections are often combined together to provide the bandwidth necessary for telemedicine. These interfaces include V.34 POTS connections, 56 kbps dedicated or frame-relay connections, Integrated Services Digital Network (ISDN) connections at 64 kbps to 1.92 Mbps, and fractional T1 and full T1 interfaces which provide up to 1.54 Mbps/connection. Higher bandwidth interfaces include TAXI (100 Mbps) and SONET (155 Mbps and higher). Furthermore, interfaces between the wide area network and local area networks (e.g., Ethernet, FDDI) will be required to allow the clinicians and other healthcare providers to access medical imaging devices, PACS and other medical information systems regardless of where they are located.
In a telemedicine system using special hardware for compression, it is highly desirable for the network interface to be tightly integrated with the compression hardware. This decreases network latency and reduces the processing load on the host. Bus mastering support by both the network interface and the compression hardware in a single-bus configuration could provide good performance assuming other bus masters (including the host CPU) release control of the bus quickly. A better alternative would be to use a connection separate from the host bus such as a second bus or a high-speed serial connection such as IEEE P1394 (Firewire).13
Our prototype uses a bus-mastering Fore Systems ATM adapter with a 100 Mbps TAXI interface. Besides providing high bandwidth, ATM network interface cards support multiple independent network channels each with separate contracts with the network guaranteeing minimum bandwidth and maximum latency and jitter. For example, a video-conferencing channel between two ATM devices will not be affected by bursts in a separate file transfer channel, even between the same devices.
Support for standard networking protocols is critical to telemedicine systems in order to meet the performance and interoperability requirements. ATM14 is the preferred internetworking protocol between telemedicine systems based on the bandwidth and quality of service requirements of medical video and imaging applications. Wide area networking connections at rates lower than T1 require ISDN or similar protocols. In locations where ISDN is not available, POTS could be used to support H.324 video-conferencing. In addition, support for TCP/IP is most likely a requirement on both the LAN and WAN interfaces for access to medical records locally and other resources on remote local area networks.
Standards for supporting real-time audio and video services over most networking protocols, including ATM, are still being defined. Unlike H.320 (which was designed specifically for ISDN), most video compression methods have not been designed for a specific networking model. For instance, MPEG-2 depends on a constant channel delay to properly receive timing information.11 ATM guarantees a maximum variance in delay, but not a constant delay. There is also a significant amount of redundancy in timing, multiplexing and error detection and recovery information between the MPEG-2 transport stream and the ATM adaptation layer. The MPEG-2 quantization scale which determines the compression ratio does not automatically adjust to changes in the available network bandwidth. Therefore, although it is possible to carry MPEG-2 over ATM, the two standards are currently not well integrated. The ATM Forum is in the process of establishing a standard for better support of MPEG-2 over ATM.15 Ways to guarantee a certain video or audio quality in the presence of congestion and other properties of real-world networks need to be investigated further.
The telemedicine prototype uses TCP/IP over ATM Adaptation Layer 5 (AAL 5). AAL 5 is the simplest of the ATM adaptation layers but does not include the increased network overhead from the synchronized timing or constant bit rate features of other adaptation layers such as AAL1 or AAL 3/4. This provides all the benefits of ATM while preserving compatibility with TCP/IP networks, including the Internet.
A low-cost workstation for real-time, interactive telemedicine for many applications is currently possible with existing hardware and software. Video-conferencing and image sharing are just initial examples of telemedicine applications, and a programmable DSP approach ensures that new applications and algorithms can be supported as they emerge. However, pervasive use of telemedicine could wait for additional hardware and software advances. These include increased integration at a lower cost of DSP and network hardware beyond that available today and real-time algorithms for diagnostic-quality video compression and decompression.