To enhance performance, common processors feature relaxed memory models
that reorder instructions. However, the correctness of concurrent
programs is often dependent on the preservation of the program order of
certain instructions. Thus, the instruction set architectures offer
memory fences. Using fences is a subtle task with performance and
correctness implications: using too few can compromise correctness and
using too many can hinder performance. Thus, fence insertion algorithms
that given the required program orders can automatically find the
optimum fencing can enhance the ease of programming, reliability, and
performance of concurrent programs. In this paper, we consider the
class of programs with structured branch and loop statements and
present a greedy and polynomial-time optimum fence insertion algorithm.
The algorithm incrementally reduces fence insertion for a control-flow
graph to fence insertion for a set of paths. In addition, we show that
the minimum fence insertion problem with multiple types of fence
instructions is NP-hard even for straight-line programs.