Parallel computing attempts to solve many complex problems by using multiple computing resources simultaneously. This review paper is intended to address some of the major operating systems’ design issues for shared memory parallel computers like SMPs. Parallel computers can be classified according to the level at which the architecture supports parallelism, with multi-core and multi-processor computers The paper proceeds by specifying key design issues of operating system: like processes synchronization, memory management, communication, concurrency control, and scheduling in case of shared memory SMPs. It also elaborates some concerns of Linux scheduler, for shared memory SMPs parallel computing. The basic objective of the paper is to provide a quick overview of problems that may arise in designing parallel computing operating system.
SMP, Scheduling, Parallel, Processes, Operating System
Parallel computing is an evolution of serial computing, and it attempts to solve many complex and larger problems, for high performance computing. Multiple computing resources are utilized simultaneously to solve a computational problem. Traditional way to increase performance is to use multiple computing resources, which can execute in parallel to share a load.
A classification first introduced by Flynn  is most common way of categorizing systems with parallel computing capabilities. Flynn’s proposed the below categorization of computer systems:
Single instruction, single data (SISD) stream
Single instruction, multiple data (SIMD) stream
Multiple instruction, single data (MISD) stream
Multiple instructions, multiple data (MIMD) streams
The single-instruction-single-data (SISD) classification is entirely sequential execution. The single-instruction-multiple-data (SIMD) classification does the same process repeatedly over a large data set. Multiple-instruction-single-data (MISD) is a rarely used classification. Multiple-instruction-multiple-data (MIMD) forms the most common type of parallel programs.
Parallel computers can be classified according to the level at which the architecture supports parallelism, with multi-core and multi-processor computers having multiple processing elements within a single machine, while Clusters, MPPs, and Grids use multiple computers to work on the same task.
A multicore processor is a processor that includes multiple execution units. A symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory.
A distributed computer is a distributed memory computer system in which the processing elements are connected by a network. A cluster is a group of loosely coupled computers that work together closely, to give illusion of a single computer. A massively parallel processor (MPP) is a single computer with many networked processors distributed computing is the most distributed form of parallel computing. It makes use of computers communicating over the internet to work on a given problem.
Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units. Currently, the most common type of parallel computers is supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs.
In order to accomplish parallel computing, the system is controlled by operating system, which provides the interaction between processors and processes. Parallel operating system is mainly concerned with managing the resources of parallel machines. It comes across with many challenges; hence the operating system must be compatible.
Parallel computing can be classified in number of ways; one of the most common ways is to classify w.r.t. memory. The memory can be shared or distributed among the processing elements. SMPs fall under shared memory parallel computer architecture, while Clusters and MPPs fall under shared memory parallel computer architecture.
The paper proceeds by describing operating system’s key design issues for scheduling, synchronization, process and memory management for the shared memory symmetric multiprocessor system.
2. DESIGN REQUIREMENTS --- SMP OPERATING SYSTEM
Parallel Operating System\s design is based on parallel computers’ architecture. For example, operating system design requirements for shared memory parallel computers like SMPs may be different from distributed memory parallel computers like Clusters; as message passing is most highlighted requirement is Clusters but not frequently required in SMPs.
In order to design operating system for parallel computing, there are many components which need to be parallelized. There are different aspects to the categorization of parallel computing operating system such as degree of coordination, memory and process management, concurrency and synchronization.
For diversion from serial to parallel computing architecture, operating system also needs some changes as requirements for its design. But there are many issues and problems associated with accomplishment of these requirements. These operating system design issues vary, depending on selected parallel computation architecture.
By going through major operating system requirements, this section addresses operating system design issues which have been specified in the referenced review papers
Synchronization of parallel tasks in real time usually involves waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase . There are two categories of processor synchronization: mutual exclusion and producer-consumer . For mutual exclusion, only one of a number of current activities at a time should be allowed to update some shared mutable state. For producer-consumer synchronization, a consumer must wait until the producer has generated required value. Barriers, which synchronize many consumers with many producers, are also typically built using locks on conventional SMPs. Locking schemes cannot form the basis of a general parallel programming model.
Critical sections  are very common in parallel computing. Consider the producer-consumer example (as shown in Figure 1) to illustrate the critical section problem. A process is producing elements which are stored in a shared buffer calling a function store_ element (). The full buffer condition is controlled using a shared variable counter. On other hand, a consumer process obtains elements from this buffer by using a function obtain_element (), that also requires a critical section, therefore decreasing the counter in a unit. Apart from the access to the buffer, the producer-consumer example has another critical section on the access to the counter variable, which is also a shared resource. The code in Figure 1 has race conditions in the access to the buffer when it is either full or empty.
illustration not visible in this excerpt
Figure 1: Producer-Consumer example with race conditions
For parallel computing SMPs, the ready queue of operating systems is a data structure in the kernel that contains all processes that are ready to be executed and await an available processor. Typically processes are queued in a ready queue and are scheduled to the earliest available processor. The scheduler executes an algorithm, which places the running tasks in the ready queue, selects a ready task from the ready queue and starts its execution. Whenever a process is to be removed from a processor, this processor runs the scheduling algorithm to select a new running process. Since the scheduler can be executed by any processor, it must be run as a critical section, as there is main problem with the simultaneous choice of a process  by two separate executions of the scheduler.
1.1 Cost of Communication
Parallel tasks typically need to exchange data. There are several ways this can be done, such as through a shared memory bus, however the actual event of data exchange is commonly referred to as communications. Inter-task communication always implies overhead. Machine cycles and resources that could be used for computation are instead used to package and transmit data. Communications frequently require some type of synchronization between tasks, which can result in tasks spending time "waiting" instead of doing work. Competing communication traffic can saturate the available network bandwidth, further enhancing performance problems.
In parallel computing, granularity  is a qualitative measure of the ratio of computation to communication. Periods of computation are typically separated from periods of communication by synchronization events. In Fine-Grain parallelism: relatively small amounts of computational work are done between communication events. Hence, there is low computation to communication ratio. It implies high communication overhead and less opportunity for performance enhancement. If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation.