UW ICL

MAP UWICL Function List and Performance

---------------------------

Table 1 is a complete list of the functions included in release version 4.0 of the MAP UWICL (November 30, 2001) and associated performance on the MAP chip. The performance number of each function is based on the following conditions/assumptions:

1. MAP running at 270 MHz with 135 MHz SDRAM on 512 x 512 8-bit images (unless otherwise specified).

2. The performance number includes all the times (including input and output): (1) the source image data are read from the external memory, (2) processed by the MAP, and then (3) stored back to the external memory.

3. All the MAP UWICL functions have been implemented to work on any image size. Supporting arbitrary (large) image sizes and removing the granuality limitation introduce overhead in the data flow and tight loop code. Removing the overhead of supporting the arbitrary image size will result in 5-10% improved performance than that listed in this table.

4. In order to ensure the data coherency when a library functions is called, we use the coherent data fetch mode for the Data Streamer, which results in a slower data transfer rate than the non-coherent mode.

Table 1. List of functions in the release 4.0 MAP UWICL and their performance 

 

 

Function

Language

Performance

Condition

 

 

used

cycles

ms

 

Arithmetic

Invert

C

234k

0.87

 

 

Clip

C

236k

0.87

 

 

Floor

C

240k

0.89

 

 

Offset

C

237k

0.88

 

 

Scale

C

296k

1.10

 

 

Absolute value

C

273k

1.01

 

 

Add

C

386k

1.43

 

 

Subtract

C

346k

1.28

 

 

Add (16-bit images)

C

681k

2.52

 

 

Subtract (16-bit images)

C

681k

2.52

 

 

Multiply

C

459k

1.70

 

 

Divide

C

901k

3.34

 

 

Absolute difference

C

423k

1.57

 

 

Normalize

C

433k

1.60

 

 

Interpolation

C

466k

1.73

 

 

Matrix multiply

C

45.4M

168.15

 

 

Cordic

C

3.99M

14.78

 

 

Composite

C

587k

2.17

 

 

Maximum selection

C

402k

1.49

 

 

Magnitude and phase

C

9.67M

35.81

 

 

Signed 32-bit fixed-point division

C

277

0.0010

(2^31-1)/1

 

Unsigned 32-bit fixed-point division

C

276

0.0010

(2^32-1)/1

Logical

Bitwise AND(&)

C

421k

1.56

 

 

Bitwise OR(|)

C

423k

1.57

 

 

Bitwise XOR(^)

C

423k

1.57

 

 

Scalar AND(&)

C

234k

0.87

 

 

Scalar OR(|)

C

234k

0.87

 

 

Scalar XOR(^)

C

240k

0.89

 

Signal

FIR filter (16-bit in, 16-bit out)

C

2.84M

10.52

24-tap

processing

 

Assembly

2.37M

8.78

with

 

FIR filter (16-bit in, 8-bit out)

C

3.91M

14.48

padding

 

 

Assembly

2.07M

7.67

 

 

FIR filter (8-bit in, 16-bit out)

C

2.06M

7.63

 

 

 

Assembly

1.13M

4.19

 

 

FIR filter (8-bit in, 8-bit out)

C

2.08M

7.70

 

 

 

Assembly

1.06M

3.93

 

Spatial filter

8-bit 2D convolution

C

4.82M

17.85

7 x 7 kernel

 

 

Assembly

1.74M

6.44

with

 

16-bit 2D convolution

C