Madhura Purnaprajna

## Run-time Reconfigurable Multiprocessors

## Contents

| 1 | Intro | duction 1                                               |  |  |  |  |  |
|---|-------|---------------------------------------------------------|--|--|--|--|--|
|   | 1.1   | Contributions                                           |  |  |  |  |  |
|   | 1.2   | Organisation                                            |  |  |  |  |  |
| 2 | Arch  | chitecture 7                                            |  |  |  |  |  |
|   | 2.1   | Architectural Models                                    |  |  |  |  |  |
|   | 2.2   | Architectural Flexibility                               |  |  |  |  |  |
|   |       | 2.2.1 Classifying Customisations                        |  |  |  |  |  |
|   |       | 2.2.2 Cost of Flexibility 12                            |  |  |  |  |  |
|   | 2.3   | Architectural Design Space Exploration                  |  |  |  |  |  |
|   |       | 2.3.1 Classifying Architectural Explorations            |  |  |  |  |  |
|   |       | 2.3.2 Ranking Architectural Merits 18                   |  |  |  |  |  |
|   | 2.4   | Multi-core Architectures                                |  |  |  |  |  |
|   |       | 2.4.1 Commercial Multi-core Processors                  |  |  |  |  |  |
|   |       | 2.4.2 Limitations of Existing Multi-core architectures  |  |  |  |  |  |
|   | 2.5   | Initiatives for Customisable Multi-core Processors      |  |  |  |  |  |
|   | 2.6   | The Concept: Run-time Reconfigurable Multiprocessors 26 |  |  |  |  |  |
|   |       | 2.6.1 Reconfiguration Mechanism                         |  |  |  |  |  |
|   |       | 2.6.2 Advantages of the New Reconfiguration Mechanism   |  |  |  |  |  |
|   | 2.7   | Summary 29                                              |  |  |  |  |  |
| 3 | Арр   | lication 31                                             |  |  |  |  |  |
|   | 3.1   | Programmability                                         |  |  |  |  |  |
|   | 3.2   | Methods of Application Description                      |  |  |  |  |  |
|   |       | 3.2.1 Application Description for Parallel Processors   |  |  |  |  |  |
|   |       | 3.2.2 Managing Communication and Synchronisation        |  |  |  |  |  |
|   |       | 3.2.3 Drawbacks of Existing Methods                     |  |  |  |  |  |
|   | 3.3   | Architecture-Independent Application Characteristics    |  |  |  |  |  |
|   |       | 3.3.1 Model for Computation                             |  |  |  |  |  |
|   |       | 3.3.2 Model for Synchronisation                         |  |  |  |  |  |
|   |       | 3.3.3 Model for Communication                           |  |  |  |  |  |
|   | 3.4   | Comparing Application-specific Attributes               |  |  |  |  |  |
|   |       | 3.4.1 DSP Applications                                  |  |  |  |  |  |

|   |     | 3.4.2    | Multiplier used in Elliptic Curve Cryptography                 | 6  |
|---|-----|----------|----------------------------------------------------------------|----|
|   |     | 3.4.3    | Self-organising Maps                                           | 7  |
|   |     | 3.4.4    | Priorities: Computation, Communication, or Synchronisation 4   | 8  |
|   | 3.5 | Restati  | ing Amdahl's Law                                               | 2  |
|   |     | 3.5.1    | Speedup: Comparison to Amdahl's Law                            | 2  |
|   |     | 3.5.2    | Power: Comparison to Amdahl's Law                              | 5  |
|   |     | 3.5.3    | Impact on Energy                                               | 6  |
|   | 3.6 | Summ     | ary                                                            | 8  |
| 4 | Арр | lication | to Architectural Mapping 6                                     | 1  |
|   | 4.1 | Applic   | ations and Architectures: Fixed vs. Alterable 6                | 2  |
|   |     | 4.1.1    | Fixed Applications, Fixed Architecture                         | 3  |
|   |     | 4.1.2    | Alterable Applications, Fixed Architecture                     | 4  |
|   |     | 4.1.3    | Fixed Application, Alterable Architectures                     | 5  |
|   |     | 4.1.4    | Alterable Applications, Alterable Architecture                 | 6  |
|   | 4.2 | Applic   | ation Mapping: Objectives and Methods                          | 7  |
|   |     | 4.2.1    | Compilation Flow                                               | 7  |
|   |     | 4.2.2    | FPGA Flow                                                      | 0  |
|   |     | 4.2.3    | Comparing the two Design Flows                                 | 2  |
|   |     | 4.2.4    | Merging Compilation and Synthesis Design Flows                 | 3  |
|   |     | 4.2.5    | Considerations for Merging Spatial and Temporal Design Flows 7 | 5  |
|   |     | 4.2.6    | Optimisation Objectives                                        | 6  |
|   |     | 4.2.7    | Cost Function                                                  | 6  |
|   | 4.3 | Adapti   | ive Mapping in Reconfigurable Multiprocessors                  | 7  |
|   |     | 4.3.1    | Reconfiguration for Application Mapping                        | 8  |
|   |     | 4.3.2    | Advantages of the Multi-dimensional Mapping Approach 8         | 2  |
|   | 4.4 | Summ     | ary                                                            | 3  |
| 5 | Qua | droCore  | e: Architecture 8                                              | 5  |
|   | 5.1 | Recon    | figuration Design Space                                        | 6  |
|   |     | 5.1.1    | Instruction to Control Reconfiguration                         | 7  |
|   |     | 5.1.2    | Synchronisation                                                | 9  |
|   |     | 5.1.3    | Communication                                                  | 1  |
|   |     | 5.1.4    | MIMD and SIMD operation                                        | 4  |
|   |     | 5.1.5    | Word-length Configurability                                    | 6  |
|   |     | 5.1.6    | Additional Instructions for Co-operative Multiprocessing 9     | 17 |
|   |     | 5.1.7    | Compilation Flow                                               | 8  |
|   | 5.2 | Time a   | and Power Characteristics                                      | 8  |
|   |     | 5.2.1    | Timing Characteristics                                         | 8  |
|   |     | 5.2.2    | QuadroCore Power Distribution                                  | 0  |
|   |     |          | -                                                              |    |

|    |                         | 5.2.3 Time and Power variations in the Reconfiguration Design Space | 10  |  |  |  |
|----|-------------------------|---------------------------------------------------------------------|-----|--|--|--|
|    | 5.3                     | Instruction-level Power Model                                       | 10  |  |  |  |
|    |                         | 5.3.1 Instruction Life Cycle                                        | 10  |  |  |  |
|    |                         | 5.3.2 Memory Accesses                                               | 104 |  |  |  |
|    |                         | 5.3.3 Register Accesses                                             | 10: |  |  |  |
|    |                         | 5.3.4 ALU Accesses                                                  | 10: |  |  |  |
|    |                         | 5.3.5 Multiprocessor Synchronisation                                | 10  |  |  |  |
|    |                         | 5.3.6 Instruction Set Characterisation                              | 10  |  |  |  |
|    | 5.4                     | Impact of Compilation Techniques                                    | 109 |  |  |  |
|    | 5.5                     | Implementation and Performance Measurements                         | 11  |  |  |  |
|    |                         | 5.5.1 Standard Cell Implementation                                  | 11. |  |  |  |
|    |                         | 5.5.2 Post-layout Implementation Reports                            | 11′ |  |  |  |
|    |                         | 5.5.3 FPGA Reports                                                  | 118 |  |  |  |
|    | 5.6                     | Summary                                                             | 119 |  |  |  |
| 6  | Qua                     | droCore: Applications                                               | 12  |  |  |  |
|    | 6.1                     | Design Flow for Resource Efficiency                                 | 122 |  |  |  |
|    | 6.2                     | Applications Mapped to QuadroCore                                   | 122 |  |  |  |
|    |                         | 6.2.1 Timing Advantage of Reconfiguration                           | 124 |  |  |  |
|    |                         | 6.2.2 DSP Algorithms                                                | 125 |  |  |  |
|    |                         | 6.2.3 Multiplier used in Elliptic Curve Cryptography                | 129 |  |  |  |
|    |                         | 6.2.4 Self-organising Maps                                          | 13  |  |  |  |
|    |                         | 6.2.5 Comparison: Parallelism, Speedup, Energy                      | 135 |  |  |  |
|    |                         | 6.2.6 Comparable Architectures                                      | 136 |  |  |  |
|    | 6.3                     | Extending the QuadroCore Multiprocessor                             | 138 |  |  |  |
|    |                         | 6.3.1 Platform for Validating Parallel Programs                     | 139 |  |  |  |
|    |                         | 6.3.2 Environment for Run-time Processor Customisation              | 14( |  |  |  |
|    | 6.4                     | Summary                                                             | 14( |  |  |  |
| 7  | Con                     | clusions and Future Work                                            | 14: |  |  |  |
|    | 7.1                     | Summary                                                             | 144 |  |  |  |
|    | 7.2                     | Future Work                                                         | 140 |  |  |  |
| Gl | Glossary                |                                                                     |     |  |  |  |
| Li | st of I                 | Figures                                                             | 154 |  |  |  |
| Li | List of Tables          |                                                                     |     |  |  |  |
| Re | References              |                                                                     |     |  |  |  |
| Αι | Author's Publications 1 |                                                                     |     |  |  |  |