Table of Contents

Changes to UWICL for PRE-RELEASE 2.0


Changes to UWICL for PRE-RELEASE 2.0

Changes to UWICL for PRE-RELEASE 2.0

Image Computing Library Team Document # - ICSL-UWICL-03/96 rev. 1.0 Image Computing Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195 1.0 Introduction

This document outlines all the changes that have occurred in the UWICL for pre-release 2.0 since release 1.1 in October 1995. These changes include new functions, modifications to the library as suggested by the consortium members (such as the ability to create a true library of object code), and bug fixes.

An important note is that the pre-release 2.0 is a subset of the release 2.0 scheduled for April 2, 1996. Only the functions that are independent (that do not depend on any other UWICL functions) are included. The reason for this is that we have included the ability to generate a true object library with this release. Since this is a major addition and change to the UWICL, we want to thoroughly test this feature and, as of the pre-release date of March 7, 1996, we have completed testing with the independent functions. This pre-release will provide consortium members with an early look at the changes that have been made since the last release (modifications, new functions), and allow the members to get prepared for the upcoming release.

If you are planning to try the pre-release 2.0, we would appreciate your comments, feedback, and bug reports by March 22, 1996 so that we could incorporate some changes in time for the formal release 2.0 on April 2, 1996. 2.0 New Functions

There are 11 new functions out of the 81 functions included in pre-release 2.0 of the UWICL:

matmul64 - multiply a matrix with a vector in fixed-point notation and output a 64 bit result

matmulfp - multiply a matrix with a vector in fixed-point notation

gfft - two-dimensional general NxM 16-bit Forward Fast Fourier Transform

gifft - two-dimensional general NxM 16-bit Inverse Fast Fourier Transform

rfft - two-dimensional 512x512 16-bit Real Forward Fast Fourier Transform

rifft - two-dimensional 512x512 16-bit Real Inverse Fast Fourier Transform

p_warp16 - perspective warp of a 16-bit image

affine8 - affine warp of an 8 bit image

cp_warp - perspective warp of an RGB color image

yuv2rgb - color space conversion

radsrch - maxima search in radial direction 3.0 Modifications 3.1 Object Library

The most significant change to the UWICL for this release is the support for creating an object library. As suggested by the consortium members, a more useful and efficient UWICL library structure would involve the creation of a software object library which would contain the object code of all the image computing functions in one file. This would simplify the process of using a set of UWICL functions for a given application to compiling the application source code and dynamically linking it with the UWICL object library to generate the executable program. Modifications to the UWICL that were done to support the creation and use of the object library are described in subsequent sections. 3.2 File Name Change To create the object code library, we decided to use the file archiver, mvpar, that is supplied with the MVP tools. This archiver allows object code (and source code, but we only archive object code) to be combined into a single file which they call an archive or library. However, the archiver requires that the names of all archived files follow the DOS 8.3 file naming convention. File names that do not observe this naming convention will be truncated. This may cause problems when trying to archive two files that are not unique for the first eight characters of the name. For example, the files mp_binary_add8.o and mp_binary_add16.o will both be truncated to mp_binar.o. The archiver would consider both object files to be identical and would overwrite one object file with the other. Therefore, to overcome this limitation of the archiver, we have renamed all of the source files, and subsequently the object files, to conform to the DOS 8.3 naming convention. Please note that the function names have not changed, just the file names. The new file naming format for all of the files is as follows: 3.2.1 MP C source file format: <function_name>.mpc e.g., add8.mpc 3.2.2 MP assembly file format: <function_name>.mps e.g., sub8.mps 3.2.3 MP C object file format: <function_name>.mco e.g., clip8.mco 3.2.4 MP assembly object file format: <function_name>.mso e.g., floor.mso 3.2.5 MP include file format: <function_name>.h e.g., invert.h 3.2.6 PP C source file format: <function_name>.ppc e.g., median.ppc 3.2.7 PP assembly file format: <function_name>.pps e.g., pwarp.pps 3.2.8 PP C object file format: <function_name>.pco e.g., abs8.pco 3.2.9 PP assembly object file format: <function_name>.pso e.g., open.pso 3.2.10 PP include file format: <function_name>.i e.g., divide.i 3.3 Updated Makefiles

The makefiles for all of the UWICL functions have been greatly simplified. A lot of the redundant sections of these makefiles, such as targets and environment variable declarations, have been included into a global include makefile, and new targets to support the creation of the object library have been added. 3.3.1 Function makefile

The individual function makefiles are significantly smaller and contain the following items: · The hardware and simulator executable file names are defined in the HARDWARE_OUT and SIMULATOR_OUT environment variables. · All of the object files are listed in the OBJS environment variable as before. · The C and assembly include files are now defined in the C_INC_FILES and ASM_INC_FILES environment variables. These variables allow modifications to the named include files to cause the source files to be recompiled. Previously, modifications to the include files did not cause recompilation. · The image and kernel files to be used for testing on the simulator are listed in the IMAGE_FILES environment variable. · The look-up table and data files to be used for testing on the simulator are listed in the LUT_FILES environment variable. · The compiler, linker, and archiver options are defined in the variables prefixed with EXTRA. · The list of tasks to be performed to make or remake the hardware or simulator versions of the function is defined by the FORCE_HW_TASKS and FORCE_SIM_TASKS environment variables. By default, both the simulator and hardware task list removes the server.mco file as it must be recompiled to operate under the two different environments. · A global include makefile is included as defined by the UWICL_MAKE environment variable. 3.3.2 Global include makefile

A global include makefile, named UWICL_Make, has been created and stored in $(ICLIB_ROOT)/env (recall that ICLIB_ROOT is an environment variable indicating the root directory in which the UWICL has been installed). An environment variable, UWICL_MAKE, must be set to reference this global include makefile.

The global include makefile contains the following items: · The environment variables previously declared in all the function makefiles are declared. These include such things as MP_INCLUDE, CC, MP_COMPILE_OPTIONS, LIBRARY_PATH, LIBS, and RM_FILE_TYPES. · The all target is set to $(HARDWARE_OUT) by default which is defined in each of the individual function makefiles. · The $(HARDWARE_OUT) target executes the hardware tasks defined by the FORCE_HW_TASKS environment variable and links the object files to create the hardware executable. · The $(SIMULATOR_OUT) target executes the simulator tasks defined by the FORCE_SIM_TASKS environment variable and links the object files to create the simulator executable. · The force_hw_tasks and the force_sim_tasks targets execute the tasks to be performed as defined by the FORCE_HW_TASKS and FORCE_SIM_TASKS environment variables. · The objs target causes the source files to be compiled without linking into an executable. This target is useful when creating the library of object files. · The install target causes the includes and library targets to be executed. · The includes target copies all the .h files into the $(ICLIB_ROOT)/include/UWICL directory · The library target finds all the object files (excluding the server.mco) and creates a symbolic link to the $(ICLIB_ROOT)/libs directory. Then, the archiver is executed to add those object files to the main object library. Finally, the symbolic links are removed. · The help target simply provides the user with information on how to use the makefile. · The clean target removes the files defined by RM_FILE_TYPES. · The cpimg target creates symbolic links to the image and kernel files defined by the IMAGE_FILES environment variable. Likewise, the delimg target removes the symbolic links. · The cplut target creates symbolic links to the look-up table and data files defined by the LUT_FILES environment variable. Likewise, the dellut target removes the symbolic links. · The suffix rules for the source and object files are defined. · The object files defined by OBJS are dependent on the header files defined by the C_INC_FILES and ASM_INC_FILES environment variables 3.4 Image and LUT Formats

The test images are provided in COFF format only. The user can use the supplied coff2bin utility function to convert an image from COFF to RAW format. The look-up table files are provided in binary format only. The user can use the supplied bin2coff utility function to convert an image from RAW to COFF format. However, the data files have been supplied in COFF format for the pre-release. 3.5 Move Timing

The source code that was used to report the execution time of the functions have been moved from the MP level of the source code to the server.mpc source file. Specifically, the timing code surrounds the call to the MP level function in the server.mpc source file: case CMND_DCT: start_time = TCOUNT; return_code = mp_dct8x8(msg); *(MVP_ARG) = start_time - TCOUNT; if (return_code != OK) *(MVP_ARG) = -return_code; break; 3.6 Remove Hardware Dependencies

We have removed all the GSP5 hardware dependent code from the MP and PP level of the source code. This will ease the port of the UWICL to other platforms.

One modification to remove the hardware dependencies was to replace the global include file mp_icl.h by several include files, mp_icl_err.h and mp_icl_hw.h. The mp_icl_err.h include file contains all of the error codes and the mp_icl_hw.h include file contains the hardware dependent code. All of the MP level source code now only includes mp_icl_err.h. The hardware dependent code in mp_icl_hw.h is only required by the server.mpc.

Another modification to remove the hardware dependencies was to remove the mp_pp_icl_gsp5.h include file from the MP level source code of all the functions. The MP source code only needed the mp_pp_icl_typedef.h include file that was included in the mp_pp_icl_gsp5.h include file. The mp_pp_icl_typedef.h include file is now explicitly included in the MP level source code.

Yet another modification to remove the hardware dependencies was to create a global include file called mp_icl_start_pps.h which contains global variables to describe such things as the number of PPs to use, and the maximum number of message buffers. Also, variable arrays to contain semaphores and the command buffers were removed from the function include files and declared in the server.mpc where it is required. By default, we start the communication with four PPs, and the default memory allocation for passing the parameters to the PPs is set to 35 parameters. 3.7 Thread-Safe Malloc

The standard malloc function provided with the MP compiler is not thread-safe. We have supplied a multitasking enhanced version of the ANSI Standard C memory allocation function, called mp_mt_malloc. This function is currently used in the MP level source code to safely allocate memory for the UWICL functions. 3.8 Updated README

The README files for each of the functions have been updated to include a detailed description of any data or look-up table files used. 3.9 Function Name Changes

Several functions have changed their names. These functions are:

matmul - was matmul16

fft512sq - was fft_512x512

ift512sq - was ifft_512x512

fft256sq - was fft_256x256

ift256sq - was ifft_256x256

p_warp8 - was p_warp 4.0 Bug Fixes All functions Problem: At the MP-C level, if the MP function waits for the PPs to be free BEFORE the error checking, then the function will hang the next time the function is invoked. Solution: The correct form is: mp_function () { /* * CHECK parameters */ .... /* Wait for the required number of PP's to be free */ for (i=0; i<ADD8_NUM_PPS; i++) { TaskWaitSema(sema_pp_in_use[i]); TaskResetSema(sema_pp_executing[i]); } .... Perspective Warp Problem: The p_warp function will sometimes not complete or return, even though an output appears on the display. Solution: To fix the reported bug, the tight loop was modified to clip the inverse mapped coordinate at 0. The function had expected a positive value only. If a negative value was generated, the offset address became very large and an invalid memory space could be accessed. Along with this change, several additional changes were made: · User-specified coordinate points are now referenced from the source and destination addresses. · The perspective warp function now handles translation. · The MP no longer calculates row parameters for each output image line. Instead, the MP passes some parameters to the PPs which describe the output image. · Rather than using 8 bits for bilinear interpolation, Q15 format is used. This change increases the interpolation accuracy. By making use of rounded multiplications, the tight loop has been optimized further. · Since the PP-C "for loop" which calls the tight loop and the assembly level tight loop are quite long, instruction cache misses occurred. The use of the "align" directive has been added to remove any tight loop instruction cache misses. The 10-ms penalty without this change is significant. Perspective Warp Problem: A new error code, NOT_A_QUADRILATERAL, was used, but not defined in the mp_icl.h include file (now renamed to mp_icl_err.h). Solution: The error code is now defined in the include file. Histogram Equalization Problem: The function had limited error checking and would not work for image sizes that were not a power of two. Solution: The function was modified and now will work for many image sizes. Man Pages for add8, add16, sub8, sub16, absdiff8, absdiff16 Problem: The man pages for add8, add16, sub8, sub16, absdiff8, and absdiff16 referred to the incorrect parameter structure name. Solution: The man pages were updated. Shrinkxy Problem: The variable new_width was tested for validity in the shrinkxy function, yet it was not set before being tested. Solution: The functions now sets the value of new_width before it is tested. Shrinkxy, Magnify, Magnifyxy Problem: Allocated space assigned to variable next was never freed. Solution: The functions were modified to free the space. Include file: mp_pp_icl_gsp5.h Problem: The include file contained one too few #endif statements. Solution: The missing #endif statement has been added. Perspective Warp Problem: The perspective warp function incorrectly used an unsigned shift and associated instructions to update row parameters which were needed to describe which pixels to process in the output image. Solution: The function now uses a signed shift and clip to 0 to update the row parameters. . Template Problem: This function referenced image files that were not included in the distribution. Solution: The missing image files are now included in the distribution. Wavelet Problem: The wavelet functions assumed that the transform output, which is 16-bit coefficients, would be put in DRAM where the pitch is twice the image width. The reason for this was that displaying 16 bit data and sending the output to VRAM did not make a lot of sense. Based on this assumption, the offset of a section of data from the start address was calculated by multiplying the height with the twice the width. If the output was in VRAM, where the pitch is 0x800, for example, the code would not work correctly. . Solution: The calculation of the offset is modified so that the output can be directed to both DRAM and VRAM. For example: width_of_level * height_of_level has been replaced by msg->dst_pitch * height_of_level . Binary Morphology Problem: The binary morphology functions used the output address as a temporary address to place the packed result. During unpacking in DRAM for the final result, the unpacked result was overwriting the packed temporary result before the unpacking was complete. This was not a problem if the output address was in the VRAM as the pitch was sufficient to avoid overwriting. Solution: The fix involved adding a new parameter, tmp_address, to be input to the MP function which will now store the intermediate packed images. The user needs to make sure that the tmp_address, contains at least (image_size/8) of free storage area. Another small change in the MP function was that a missing "return(result)" was added at the end of the function. .