Autonomic Operation of a Large High Performance On-Line Compute Cluster
Not peer reviewed
MetadataShow full item record
A High-Level Trigger (HLT) system is composed of both hardware and software. Designing the physical layout of the cluster mainly concerns the hardware; node distribution, network layout, estimation of power requirements, defining hardware properties of the nodes and so on. These are to a great extent derived from the processing topology to be used in the software application, which in turn is chosen based on the nature of the data to be processed.
In a project the size of HLT, it is challenging to predict the specifications of the hardware to be bought in the future, while accounting for requirements that may change as the project matures over time. The first part of the thesis is a review and an evaluation of the stages from early design to a fully operational HLT, presented from an instrumentation and software engineering point of view.
Differences between computational science and software engineering became apparent early on. The existing literature on the topic helps to understand the observations, and from this understanding, suggestions for possible improvements that could benefit similar projects in the future are made. Suitable concepts, technology and practices have been identified by researching current trends and looking to other relevant fields of study.
The solutions that could be implemented and evaluated during the course of the work on the thesis, are verified in prototypes. Autonomic computing has been an important inspiration as well as the management specifications from the Distributed Management Task Force (DMTF). A general observation is that there seems to be much that potentially could be learned from software engineering, a field that has been working on large scale software systems for a long time. Although one must be cautious and critical in what is adopted, since not everything will apply. It goes without saying, that scientific computing has its own contributions to the generic computing field. The prototypes are described in the last part of the thesis, where also the acceptance criteria and other results can be found.