Indirect code generation for VLIW architectures and a hardware optimization algorithm
Autoren
Mehr zum Buch
Many modern embedded applications, such as mobile communication or multi-media, pose high requirements on the computing performance of the underlying system architecture as well as on the power consumption. Contemporary processor architectures based on Very Long Instruction Word (VLIW) can satisfy these requirements by intensely exploiting instruction-level parallelism and thus simultaneously increasing computation speed and reducing power consumption. Unfortunately, many applications possess an unsatisfactory performance, nonetheless. This is because the code generation of traditional compilers is not suited to exploit the specific characteristics and irregularities of VLIW architectures efficiently. To improve the code performance an indirect code generation is proposed in this thesis for VLIW architectures to translate the efficient compiled assembly code for RISC architectures into the target VLIW assembly. The indirect approach consists of three steps: 1) efficient compilation of the source code for an arbitrary RISC architecture supported by a traditional optimizing compiler; 2) a „pre-processing“ step to further optimize the RISC assembly and translate it into a RISC-like assembly that is already closely related to the target VLIW architecture using a static binary translator; 3) a ”post-processing” step as target specific compiler back-end with VLIW specific optimizations. This indirect code generation approach enables the exploitation of the architectural benefits of VLIW processors while the highly efficient optimizations of traditional RISC compiler techniques can be obtained simultaneously without compromising each other. Furthermore, the importance of joint software and hardware design is well-known to achieve high system efficiency. Therefore, an additional algorithm is proposed within the scope of this thesis that is intended to support the hardware exploration of the target VLIW architecture. A fuzzy control system (based on fuzzy set theory) is used to find an optimal application-specific processor configuration with respect to the necessary degree of parallelism. Different code- as well as hardware-specific performance parameters are considered by the optimization algorithm in the step. To prove the quality of the proposed indirect code generation and hardware exploration, a low-power scalar VLIW processor is implemented in this thesis based on Synchronous Transfer Architecture (STA). This architecture exhibits special irregular characteristics, and is thus well suited as candidate for this prove-of-concept.