Table 1 is a complete list of the functions included in release version 4.0 of the MAP UWICL (November 30, 2001) and associated performance on the MAP chip. The performance number of each function is based on the following conditions/assumptions:
1. MAP running at 270 MHz with 135 MHz SDRAM on 512 x 512 8-bit images (unless otherwise specified).
2. The performance number includes all the times (including input and output): (1) the source image data are read from the external memory, (2) processed by the MAP, and then (3) stored back to the external memory.
3. All the MAP UWICL functions have been implemented to work on any image size. Supporting arbitrary (large) image sizes and removing the granuality limitation introduce overhead in the data flow and tight loop code. Removing the overhead of supporting the arbitrary image size will result in 5-10% improved performance than that listed in this table.
4. In order to ensure the data coherency when a library functions is called, we use the coherent data fetch mode for the Data Streamer, which results in a slower data transfer rate than the non-coherent mode.
Table 1. List of functions in the release 4.0 MAP UWICL and their performance
|
|
Function |
Language |
Performance |
Condition |
|
|
|
|
used |
cycles |
ms |
|
|
Arithmetic |
Invert |
C |
234k |
0.87 |
|
|
|
Clip |
C |
236k |
0.87 |
|
|
|
Floor |
C |
240k |
0.89 |
|
|
|
Offset |
C |
237k |
0.88 |
|
|
|
Scale |
C |
296k |
1.10 |
|
|
|
Absolute value |
C |
273k |
1.01 |
|
|
|
Add |
C |
386k |
1.43 |
|
|
|
Subtract |
C |
346k |
1.28 |
|
|
|
Add (16-bit images) |
C |
681k |
2.52 |
|
|
|
Subtract (16-bit images) |
C |
681k |
2.52 |
|
|
|
Multiply |
C |
459k |
1.70 |
|
|
|
Divide |
C |
901k |
3.34 |
|
|
|
Absolute difference |
C |
423k |
1.57 |
|
|
|
Normalize |
C |
433k |
1.60 |
|
|
|
Interpolation |
C |
466k |
1.73 |
|
|
|
Matrix multiply |
C |
45.4M |
168.15 |
|
|
|
Cordic |
C |
3.99M |
14.78 |
|
|
|
Composite |
C |
587k |
2.17 |
|
|
|
Maximum selection |
C |
402k |
1.49 |
|
|
|
Magnitude and phase |
C |
9.67M |
35.81 |
|
|
|
Signed 32-bit fixed-point division |
C |
277 |
0.0010 |
(2^31-1)/1 |
|
|
Unsigned 32-bit fixed-point division |
C |
276 |
0.0010 |
(2^32-1)/1 |
|
Logical |
Bitwise AND(&) |
C |
421k |
1.56 |
|
|
|
Bitwise OR(|) |
C |
423k |
1.57 |
|
|
|
Bitwise XOR(^) |
C |
423k |
1.57 |
|
|
|
Scalar AND(&) |
C |
234k |
0.87 |
|
|
|
Scalar OR(|) |
C |
234k |
0.87 |
|
|
|
Scalar XOR(^) |
C |
240k |
0.89 |
|
|
Signal |
FIR filter (16-bit in, 16-bit out) |
C |
2.84M |
10.52 |
24-tap |
|
processing |
|
Assembly |
2.37M |
8.78 |
with |
|
|
FIR filter (16-bit in, 8-bit out) |
C |
3.91M |
14.48 |
padding |
|
|
|
Assembly |
2.07M |
7.67 |
|
|
|
FIR filter (8-bit in, 16-bit out) |
C |
2.06M |
7.63 |
|
|
|
|
Assembly |
1.13M |
4.19 |
|
|
|
FIR filter (8-bit in, 8-bit out) |
C |
2.08M |
7.70 |
|
|
|
|
Assembly |
1.06M |
3.93 |
|
|
Spatial filter |
8-bit 2D convolution |
C |
4.82M |
17.85 |
7
x 7 kernel |
|
|
|
Assembly |
1.74M |
6.44 |
with |
|
|
16-bit 2D convolution |
C |
|||