2011年6月19日 星期日

111

6.11 The Sleeping-Barber Problem.Abarbershop consists of awaiting room
with n chairs and a barber room with one barber chair. If there are no
customers to be served, the barber goes to sleep. If a customer enters
the barbershop and all chairs are occupied, then the customer leaves the
shop. If the barber is busy but chairs are available, then the customer sits
in one of the free chairs. If the barber is asleep, the customer wakes up
the barber.Write a program to coordinate the barber and the customers.
Answer: Please refer to the supportingWeb site for source code solution.
38 Chapter 6 Process Synchronization
6.12 Demonstrate that monitors and semaphores are equivalent insofar as
they can be used to implement the same types of synchronization problems.
Answer: Asemaphore can be implemented using the following monitor
code:
monitor semaphore {
int value = 0;
condition c;
semaphore increment() {
value++;
c.signal();
}
semaphore decrement() {
while (value == 0)
c.wait();
value--;
}
}
A monitor could be implemented using a semaphore in the following
manner. Each condition variable is represented by a queue of threads
waiting for the condition. Each thread has a semaphore associated with
its queue entry. When a thread performs a wait operation, it creates
a new semaphore (initialized to zero), appends the semaphore to the
queue associated with the condition variable, and performs a blocking
semaphore decrement operation on the newly created semaphore.When
a thread performs a signal on a condition variable, the first process in the
queue is awakened by performing an increment on the corresponding
semaphore.
6.13 Write a bounded-buffer monitor in which the buffers (portions) are embedded
within the monitor itself.
Answer:
monitor bounded_buffer {
int items[MAX_ITEMS];
int numItems = 0;
condition full, empty;
void produce(int v) {
while (numItems == MAX_ITEMS)
full.wait();
items[numItems++] = v;
empty.signal();
}
int consume() {
int retVal;
while (numItems == 0)
Exercises 39
empty.wait();
retVal = items[--numItems];
full.signal();
return retVal;
}
}
6.14 The strictmutual exclusion within a monitor makes the bounded-buffer
monitor of Exercise 6.13 mainly suitable for small portions.
a. Explain why this is true.
b. Design a new scheme that is suitable for larger portions.
Answer: The solution to the bounded buffer problem given above
copies the produced value into the monitors local buffer and copies
it back from the monitors local buffer to the consumer. These copy
operations could be expensive if onewere using large extents of memory
for each buffer region. The increased cost of copy operation means that
the monitor is held for a longer period of time while a process is in the
produce or consume operation. This decreases the overall throughput
of the system. This problem could be alleviated by storing pointers to
buffer regions within the monitor instead of storing the buffer regions
themselves. Consequently, one could modify the code given above to
simply copy the pointer to the buffer region into and out of the monitors
state. This operation should be relatively inexpensive and therefore the
period of time that the monitor is being held will be much shorter,
thereby increasing the throughput of the monitor.
6.15 Discuss the tradeoff between fairness and throughput of operations in
the readers-writers problem. Propose a method for solving the readerswriters
problem without causing starvation.
Answer: Throughput in the readers-writers problem is increased by
favoring multiple readers as opposed to allowing a single writer to
exclusively access the shared values.Onthe other hand, favoring readers
could result in starvation for writers. The starvation in the readerswriters
problem could be avoided by keeping timestamps associated
with waiting processes. When a writer is finished with its task, it would
wake up the process that has been waiting for the longest duration.
When a reader arrives and notices that another reader is accessing the
database, then it would enter the critical section only if there are no
waiting writers. These restrictions would guarantee fairness.
6.16 How does the signal() operation associated with monitors differ from
the corresponding operation defined for semaphores?
Answer: The signal() operations associated with monitors is not
persistent in the following sense: if a signal is performed and if there are
no waiting threads, then the signal is simply ignored and the system does
not remember the fact that the signal took place. If a subsequent wait
operation is performed, then the corresponding thread simply blocks. In
semaphores, on the other hand, every signal results in a corresponding
40 Chapter 6 Process Synchronization
increment of the semaphore value even if there are nowaiting threads.A
future wait operation would immediately succeed because of the earlier
increment.
6.17 Suppose the signal() statement can appear only as the last statement
in a monitor procedure. Suggest how the implementation described in
Section 6.7 can be simplified.
Answer: If the signal operation were the last statement, then the lock
could be transferred from the signalling process to the process that is the
recipient of the signal. Otherwise, the signalling process would have to
explicitly release the lock and the recipient of the signal would have to
compete with all other processes to obtain the lock to make progress.
6.18 Consider a system consisting of processes P1, P2, ..., Pn, each ofwhich has
a unique priority number. Write a monitor that allocates three identical
line printers to these processes, using the priority numbers for deciding
the order of allocation.
Answer: Here is the pseudocode:
monitor printers {
int num avail = 3;
int waiting processes[MAX PROCS];
int num waiting;
condition c;
void request printer(int proc number) {
if (num avail > 0) {
num avail--;
return;
}
waiting processes[num waiting] = proc number;
num waiting++;
sort(waiting processes);
while (num avail == 0 &&
waiting processes[0] != proc number)
c.wait();
waiting processes[0] =
waiting processes[num waiting-1];
num waiting--;
sort(waiting processes);
num avail--;
}
void release printer() {
num avail++;
c.broadcast();
}
}
6.19 A file is to be shared among different processes, each of which has
a unique number. The file can be accessed simultaneously by several
Exercises 41
processes, subject to the following constraint: The sum of all unique
numbers associated with all the processes currently accessing the file
must be less than n.Write a monitor to coordinate access to the file.
Answer: The pseudocode is as follows:
monitor file access {
int curr sum = 0;
int n;
condition c;
void access file(int my num) {
while (curr sum + my num >= n)
c.wait();
curr sum += my num;
}
void finish access(int my num) {
curr sum -= my num;
c.broadcast();
}
}
6.20 When a signal is performed on a condition inside amonitor, the signaling
process can either continue its execution or transfer control to the process
that is signaled. How would the solution to the preceding exercise differ
with the two different ways in which signaling can be performed?
Answer: The solution to the previous exercise is correct under both
situations. However, it could suffer from the problem that a process
might be awakened only to find that it is still not possible for it to make
forward progress either because there was not sufficient slack to begin
with when a process was awakened or if an intervening process gets
control, obtains the monitor and starts accessing the file. Also, note that
the broadcast operation wakes up all of the waiting processes. If the
signal also transfers control and the monitor from the current thread to
the target, then one could check whether the target would indeed be
able to make forward progress and perform the signal only if it it were
possible. Then the while loop for the waiting thread could be replaced
by if condition since it is guaranteed that the condition will be satisfied
when the process is woken up.
6.21 Suppose we replace the wait() and signal() operations of monitors
with a single construct await(B), where B is a general Boolean expression
that causes the process executing it to wait until B becomes true.
a. Write a monitor using this scheme to implement the readers.
writers problem.
b. Explain why, in general, this construct cannot be implemented
efficiently.
c. What restrictions need to be put on the await statement so that
it can be implemented efficiently? (Hint: Restrict the generality of
B; see Kessels [1977].)
42 Chapter 6 Process Synchronization
Answer:
. The readers-writers problem could be modified with the following
more generate await statements: A reader can perform
await(active writers == 0 && waiting writers == 0) to check that
there are no active writers and there are no waiting writers before it
enters the critical section. The writer can perform a
await(active writers == 0 && active readers == 0) check to ensure
mutually exclusive access.
. The system would have to check which one of the waiting threads
have to be awakened by checking which one of their waiting
conditions are satisfied after a signal. This requires considerable
complexity as well as might require some interaction with the
compiler to evaluate the conditions at different points in time. One
could restrict the boolean condition to be a disjunction of
conjunctions with each component being a simple check (equality
or inequality with respect to a static value) on a program variable.
In that case, the boolean condition could be communicated to the
runtime system, which could perform the check every time it needs
to determine which thread to be awakened.
6.22 Write a monitor that implements an alarm clock that enables a calling program
to delay itself for a specified number of time units (ticks). You may
assume the existence of a real hardware clock that invokes a procedure
tick in your monitor at regular intervals.
Answer: Here is a pseudocode for implementing this:
monitor alarm {
condition c;
void delay(int ticks) {
int begin time = read clock();
while (read clock() < begin time + ticks)
c.wait();
}
void tick() {
c.broadcast();
}
}
6.23 Why do Solaris, Linux, and Windows 2000 use spinlocks as a synchronization
mechanism only on multiprocessor systems and not on singleprocessor
systems?
Answer: Solaris, Linux, and Windows 2000 use spinlocks as a synchronization
mechanism only on multiprocessor systems. Spinlocks are
not appropriate for single-processor systems because the condition that
would break a process out of the spinlock could be obtained only by
executing a different process. If the process is not relinquishing the processor,
other processes do not get the opportunity to set the program
Exercises 43
condition required for the first process to make progress. In a multiprocessor
system, other processes execute on other processors and thereby
modify the program state in order to release the first process from the
spinlock.
6.24 In log-based systems that provide support for transactions, updates to
data items cannot be performed before the corresponding entries are
logged. Why is this restriction necessary?
Answer: If the transaction needs to be aborted, then the values of
the updated data values need to be rolled back to the old values. This
requires the old values of the data entries to be logged before the updates
are performed.
6.25 Show that the two-phase locking protocol ensures conflict serializability.
Answer: A schedule refers to the execution sequence of the operations
for one or more transactions. A serial schedule is the situation where
each transaction of a schedule is performed atomically. If a schedule
consists of two different transactions where consecutive operations from
the different transactions access the same data and at least one of the
operations is a write, then we have what is known as a conflict. If a
schedule can be transformed into a serial schedule by a series of swaps
on nonconflicting operations, we say that such a schedule is conflict
serializable.
The two-phase locking protocol ensures conflict serializabilty because
exclusive locks (which are used for write operations) must be acquired
serially,without releasing any locks during the acquire (growing) phase.
Other transactions that wish to acquire the same locks must wait for
the first transaction to begin releasing locks. By requiring that all locks
must first be acquired before releasing any locks, we are ensuring that
potential conflicts are avoided.
6.26 What are the implications of assigning a new timestamp to a transaction
that is rolled back? How does the system process transactions that were
issued after the rolled-back transaction but that have timestamps smaller
than the new timestamp of the rolled-back transaction?
Answer: If the transactions that were issued after the rolled-back transaction
had accessed variables thatwere updated by the rolled-back transaction,
then these transactions would have to rolled-back as well. If they
have not performed such operations (that is, there is no overlapwith the
rolled-back transaction in terms of the variables accessed), then these
operations are free to commit when appropriate.
6.27 Assume that a finite number of resources of a single resource type must
be managed. Processes may ask for a number of these resources and
.once finished.will return them. As an example, many commercial
software packages provide a given number of licenses, indicating the
number of applications that may run concurrently.When the application
is started, the license count is decremented. When the application is
terminated, the license count is incremented. If all licenses are in use,
requests to start the application are denied. Such requests will only be
granted when an existing license holder terminates the application and
a license is returned.
44 Chapter 6 Process Synchronization
The following program segment is used to manage a finite number of
instances of an available resource. The maximum number of resources
and the number of available resources are declared as follows:
#define MAX RESOURCES 5
int available resources = MAX RESOURCES;
When a process wishes to obtain a number of resources, it invokes the
decrease count() function:
/* decrease available resources by count resources */
/* return 0 if sufficient resources available, */
/* otherwise return -1 */
int decrease count(int count) {
if (available resources < count)
return -1;
else {
available resources -= count;
return 0;
}
}
When a process wants to return a number of resources, it calls the decrease
count() function:
/* increase available resources by count */
int increase count(int count) {
available resources += count;
return 0;
}
The preceding program segment produces a race condition. Do the following:
a. Identify the data involved in the race condition.
b. Identify the location (or locations) in the code where the race
condition occurs.
c. Using a semaphore, fix the race condition.
Answer:
. Identify the data involved in the race condition: The variable
available resources.
. Identify the location (or locations) in the code where the race
condition occurs: The code that decrements available resources and
the code that increments available resources are the statements that
could be involved in race conditions.
Exercises 45
. Using a semaphore, fix the race condition: Use a semaphore to
represent the available resources variable and replace increment
and decrement operations by semaphore increment and semaphore
decrement operations.
6.28 The decrease count() function in the previous exercise currently returns
0 if sufficient resources are available and -1 otherwise. This leads
to awkward programming for a process that wishes obtain a number of
resources:
while (decrease count(count) == -1)
;
Rewrite the resource-manager code segment using a monitor and condition
variables so that the decrease count() function suspends the
process until sufficient resources are available. This will allow a process
to invoke decrease count() by simply calling
decrease count(count);
The process will only return from this function call when sufficient
resources are available.
Answer:
monitor resources
{
int available_resources;
condition resources_avail;
int decrease_count(int count)
{
while (available_resources < count)
resources_avail.wait();
available_resources -= count;
}
int increase_count(int count)
{
available_resources += count;
resources_avail.signal();
}
CH7A P T E R
Deadlocks
Deadlock is a problem that can only arise in a system with multiple active
asynchronous processes. It is important that the students learn the three basic
approaches to deadlock: prevention, avoidance, and detection (although the
terms prevention and avoidance are easy to confuse).
It can be useful to pose a deadlock problem in human terms and ask why
human systems never deadlock. Can the students transfer this understanding
of human systems to computer systems?
Projects can involve simulation: create a list of jobs consisting of requests
and releases of resources (single type ormultiple types). Ask the students to allocate
the resources to prevent deadlock. This basically involves programming
the Bankers Algorithm.
The survey paper by Coffman, Elphick, and Shoshani [1971] is good supplemental
reading, but youmight also consider having the students go back to
the papers by Havender [1968], Habermann [1969], and Holt [1971a]. The last
two were published in CACM and so should be readily available.
Exercises
7.1 Consider the traffic deadlock depicted in Figure 7.1.
a. Show that the four necessary conditions for deadlock indeed hold
in this example.
b. State a simple rule for avoiding deadlocks in this system.
Answer:
a. The four necessary conditions for a deadlock are (1) mutual exclusion;
(2) hold-and-wait; (3) no preemption; and (4) circular wait.
The mutual exclusion condition holds as only one car can occupy
a space in the roadway. Hold-and-wait occurs where a car holds
onto their place in the roadway while they wait to advance in
47
48 Chapter 7 Deadlocks
...
...
...
...
Figure 7.1 Traffic deadlock for Exercise 7.1.
the roadway. A car cannot be removed (i.e. preempted) from its
position in the roadway. Lastly, there is indeed a circular wait as
each car is waiting for a subsequent car to advance. The circular
wait condition is also easily observed from the graphic.
b. A simple rule that would avoid this traffic deadlock is that a car
may not advance into an intersection if it is clear they will not be
able to immediately clear the intersection.
7.2 Consider thedeadlock situation that could occur in thedining-philosophers
problemwhen the philosophers obtain the chopsticks one at a time. Discuss
how the four necessary conditions for deadlock indeed hold in this
setting.Discuss how deadlocks could be avoided by eliminating any one
of the four conditions.
Answer: Deadlock is possible because the four necessary conditions
hold in the following manner: 1) mutual exclusion is required for chopsticks,
2) the philosophers tend to hold onto the chopstick in hand while
they wait for the other chopstick, 3) there is no preemption of chopsticks
in the sense that a chopstick allocated to a philosopher cannot be forcibly
taken away, and 4) there is a possibility of circularwait. Deadlocks could
be avoided by overcoming the conditions in the following manner: 1)
allow simultaneous sharing of chopsticks, 2) have the philosophers relinquish
the first chopstick if they are unable to obtain the other chopstick,
3) allow for chopsticks to be forcibly taken away if a philosopher has
had a chopstick for a long period of time, and 4) enforce a numbering of
the chopsticks and always obtain the lower numbered chopstick before
obtaining the higher numbered one.
Exercises 49
7.3 A possible solution for preventing deadlocks is to have a single, higherorder
resource that must be requested before any other resource. For
example, if multiple threads attempt to access the synchronization objects
A   E, deadlock is possible. (Such synchronization objects may
include mutexes, semaphores, condition variables, etc.) We can prevent
the deadlock by adding a sixth object F.Whenever a thread wants to acquire
the synchronization lock for any object A   E, itmust first acquire
the lock for object F. This solution is known as containment: The locks
for objects A   E are contained within the lock for object F. Compare
this scheme with the circular-wait scheme of Section 7.4.4.
Answer: This is probably not a good solution because it yields too large
a scope. It is better to define a locking policy with as narrow a scope as
possible.
7.4 Compare the circular-wait schemewith the deadlock-avoidance schemes
(like the bankers algorithm) with respect to the following issues:
a. Runtime overheads
b. System throughput
Answer: A deadlock-avoidance scheme tends to increase the runtime
overheads due to the cost of keep track of the current resource allocation.
However, a deadlock-avoidance scheme allows formore concurrent use
of resources than schemes that statically prevent the formation of deadlock.
In that sense, a deadlock-avoidance scheme could increase system
throughput.
7.5 In a real computer system, neither the resources available nor the demands
ofprocesses for resources are consistent over long periods (months).
Resources break or are replaced, new processes come and go, new resources
are bought and added to the system. If deadlock is controlled
by the bankers algorithm, which of the following changes can be made
safely (without introducing the possibility of deadlock), and underwhat
circumstances?
a. Increase Available (new resources added).
b. Decrease Available (resource permanently removed from system)
c. Increase Max for one process (the process needs more resources
than allowed, it may want more)
d. Decrease Max for one process (the process decides it does not need
that many resources)
e. Increase the number of processes.
f. Decrease the number of processes.
Answer:
a. Increase Available (new resources added) - This could safely be
changed without any problems.
b. Decrease Available (resource permanently removed from system)
- This could have an effect on the systemand introduce the possi50
Chapter 7 Deadlocks
bility of deadlock as the safety of the system assumed there were
a certain number of available resources.
c. Increase Max for one process (the process needs more resources
than allowed, it may want more) - This could have an effect on
the system and introduce the possibility of deadlock.
d. Decrease Max for one process (the process decides it does not need
that many resources) - This could safely be changed without any
problems.
e. Increase the number of processes - This could be allowed assuming
that resources were allocated to the new process(es) such that
the system does not enter an unsafe state.
f. Decrease the number of processes - This could safely be changed
without any problems.
7.6 Consider a system consisting of four resources of the same type that are
shared by three processes, each of which needs at most two resources.
Show that the system is deadlock-free.
Answer: Suppose the system is deadlocked. This implies that each
process is holding one resource and is waiting for one more. Since there
are three processes and four resources, one process must be able to obtain
two resources. This process requires no more resources and, therefore it
will return its resources when done.
7.7 Consider a system consisting of m resources of the same type, being
shared by n processes. Resources can be requested and released by processes
only one at a time. Show that the system is deadlock free if the
following two conditions hold:
a. The maximum need of each process is between 1 and m resources
b. The sumof all maximumneeds is less than m + n
Answer: Using the terminology of Section 7.6.2, we have:
a. n
i =1 Maxi < m + n
b. Maxi . 1 for all i
Proof: Needi = Maxi . Alloca tioni
If there exists a deadlock state then:
c. n
i =1 Alloca tioni = m
Use a. to get: Needi + Alloca tioni = Maxi < m + n
Use c. to get: Needi + m < m + n
Rewrite to get: n
i =1 Needi < n
This implies that there exists a process Pi such that Needi = 0. Since
Maxi . 1 it follows that Pi has at least one resource that it can release.
Hence the system cannot be in a deadlock state.
7.8 Consider the dining-philosophers problem where the chopsticks are
placed at the center of the table and any two of them could be used
by a philosopher. Assume that requests for chopsticks are made one
at a time. Describe a simple rule for determining whether a particular
Exercises 51
request could be satisfied without causing deadlock given the current
allocation of chopsticks to philosophers.
Answer: The following rule prevents deadlock: when a philosopher
makes a request for the first chopstick, do not satisfy the request only
if there is no other philosopher with two chopsticks and if there is only
one chopstick remaining.
7.9 Consider the same setting as the previous problem. Assume now that
each philosopher requires three chopsticks to eat and that resource requests
are still issued separately. Describe some simple rules for determining
whether a particular request could be satisfied without causing
deadlock given the current allocation of chopsticks to philosophers.
Answer: When a philosopher makes a request for a chopstick, allocate
the request if: 1) the philosopher has two chopsticks and there is at least
one chopstick remaining, 2) the philosopher has one chopstick and there
is at least two chopsticks remaining, 3) there is at least one chopstick
remaining, and there is at least one philosopher with three chopsticks, 4)
the philosopher has no chopsticks, there are two chopsticks remaining,
and there is at least one other philosopher with two chopsticks assigned.
7.10 We can obtain the bankers algorithm for a single resource type from
the general bankers algorithm simply by reducing the dimensionality
of the various arrays by 1. Show through an example that the multipleresource-
type bankers scheme cannot be implemented by individual
application of the single-resource-type scheme to each resource type.
Answer: Consider a system with resources A, B, and C and processes
P0, P1, P2, P3, and P4 with the following values of Allocation:
Allocation
A B C
P0 0 1 0
P1 3 0 2
P2 3 0 2
P3 2 1 1
P4 0 0 2
And the following value of Need:
Need
A B C
P0 7 4 3
P1 0 2 0
P2 6 0 0
P3 0 1 1
P4 4 3 1
If the value of Available is (2 3 0), we can see that a request from process
P0 for (0 2 0) cannot be satisfied as this lowers Available to (2 1 0) and no
process could safely finish.
52 Chapter 7 Deadlocks
However, if we were to treat the three resources as three single-resource
types of the bankers algorithm, we get the following:
For resource A(which we have 2 available),
Allocated Need
P0 0 7
P1 3 0
P2 3 6
P3 2 0
P4 0 4
Processes could safely finish in the order of P1, P3, P4, P2, P0.
For resource B (which we now have 1 available as 2 were assumed
assigned to process P0),
Allocated Need
P0 3 2
P1 0 2
P2 0 0
P3 1 1
P4 0 3
Processes could safely finish in the order of P2, P3, P1, P0, P4.
And finally, for For resource C (which we have 0 available),
Allocated Need
P0 0 3
P1 2 0
P2 2 0
P3 1 1
P4 2 1
Processes could safely finish in the order of P1, P2, P0, P3, P4.
As we can see, if we use the bankers algorithm for multiple resource
types, the request for resources (0 2 0) from process P0 is denied as it
leaves the system in an unsafe state. However, ifwe consider the bankers
algorithm for the three separate resourceswhere we use a single resource
type, the request is granted. Therefore, if we have multiple resource
types, we must use the bankers algorithm for multiple resource types.
7.11 Consider the following snapshot of a system:
Allocation Max Available
A B C D A B C D A B C D
P0 0 0 1 2 0 0 1 2 1 5 2 0
P1 1 0 0 0 1 7 5 0
P2 1 3 5 4 2 3 5 6
P3 0 6 3 2 0 6 5 2
P4 0 0 1 4 0 6 5 6
Exercises 53
7.12 Answer the following questions using the bankers algorithm:
a. What is the content of the matrix Need?
b. Is the system in a safe state?
c. If a request from process P1 arrives for (0,4,2,0), can the request
be granted immediately?
Answer:
a. What is the content of the matrix Need? The values of Need for
processes P0 through P4 respectively are (0, 0, 0, 0), (0, 7, 5, 0), (1,
0, 0, 2), (0, 0, 2, 0), and (0, 6, 4, 2).
b. Is the system in a safe state? Yes.With Available being equal to (1,
5, 2, 0), either process P0 or P3 could run. Once process P3 runs, it
releases its resources which allow all other existing processes to
run.
c. If a request from process P1 arrives for (0,4,2,0), can the request
be granted immediately? Yes it can. This results in the value of
Available being (1, 1, 0, 0).One ordering of processes that can finish
is P0, P2, P3, P1, and P4.
7.13 What is the optimistic assumption made in the deadlock-detection algorithm?
How could this assumption be violated?
Answer: The optimistic assumption is that there will not be any form
of circular-wait in terms of resources allocated and processes making
requests for them. This assumption could be violated if a circular-wait
does indeed in practice.
7.14 Write a multithreaded Java program that implements the bankers algorithm
discussed in Section 7.5.3. Create n threads that request and
release resources from the banker. A banker will only grant the request
if it leaves the system in a safe state. Ensure that access to shared data is
thread-safe by employing Java thread synchronization as discussed in
Section 7.8.
Answer: (Please refer to the supporting Web site for source code solution.)
7.15 A single-lane bridge connects the two Vermont villages of North Tunbridge
and South Tunbridge. Farmers in the two villages use this bridge
to deliver their produce to the neighboring town. The bridge can become
deadlocked if both a northbound and a southbound farmer get on
the bridge at the same time (Vermont farmers are stubborn and are unable
to back up.) Using semaphores, design an algorithm that prevents
deadlock. Initially, do not be concerned about starvation (the situation
in which northbound farmers prevent southbound farmers from using
the bridge, or vice versa).
Answer:
54 Chapter 7 Deadlocks
semaphore ok to cross = 1;
void enter bridge() {
ok to cross.wait();
}
void exit bridge() {
ok to cross.signal();
}
7.16 Modify your solution to Exercise 7.15 so that it is starvation-free.
Answer:
monitor bridge {
int num waiting north = 0;
int num waiting south = 0;
int on bridge = 0;
condition ok to cross;
int prev = 0;
void enter bridge north() {
num waiting north++;
while (on bridge ||
(prev == 0 && num waiting south > 0))
ok to cross.wait();
num waiting north--;
prev = 0;
}
void exit bridge north() {
on bridge = 0;
ok to cross.broadcast();
}
void enter bridge south() {
num waiting south++;
while (on bridge ||
(prev == 1 && num waiting north > 0))
ok to cross.wait();
num waiting south--;
prev = 1;
}
void exit bridge south() {
on bridge = 0;
ok to cross.broadcast();
}
}
CH8A P T E R
Memory
Management
Although many systems are demand paged (discussed in Chapter 10), there
are still many that are not, and in many cases the simplermemorymanagement
strategies may be better, especially for small dedicated systems. We want the
student to learn about all of them: resident monitor, swapping, partitions,
paging, and segmentation.
Exercises
8.1 Explain the difference between internal and external fragmentation.
Answer: Internal Fragmentation is the area in a region or a page that
is not used by the job occupying that region or page. This space is
unavailable for use by the system until that job is finished and the page
or region is released.
8.2 Consider the following process for generating binaries. A compiler is
used to generate the object code for individual modules, and a linkage
editor is used to combine multiple objectmodules into a single program
binary. How does the linkage editor change the binding of instructions
and data to memory addresses? What information needs to be passed
from the compiler to the linkage editor to facilitate the memory binding
tasks of the linkage editor?
Answer: The linkage editor has to replace unresolved symbolic addresses
with the actual addresses associated with the variables in the
final program binary. In order to perform this, the modules should keep
track of instructions that refer to unresolved symbols. During linking,
each module is assigned a sequence of addresses in the overall program
binary andwhen this has been performed, unresolved references to symbols
exported by this binary could be patched in other modules since
every other module would contain the list of instructions that need to
be patched.
55
56 Chapter 8 Memory Management
8.3 Given five memory partitions of 100 KB, 500 KB, 200 KB, 300 KB, and
600 KB (in order), how would each of the first-fit, best-fit, and worst-fit
algorithms place processes of 212 KB, 417 KB, 112 KB, and 426 KB (in
order)?Which algorithm makes the most efficient use of memory?
Answer:
a. First-fit:
b. 212K is put in 500K partition
c. 417K is put in 600K partition
d. 112K is put in 288K partition (new partition 288K = 500K - 212K)
e. 426K must wait
f. Best-fit:
g. 212K is put in 300K partition
h. 417K is put in 500K partition
i. 112K is put in 200K partition
j. 426K is put in 600K partition
k. Worst-fit:
l. 212K is put in 600K partition
m. 417K is put in 500K partition
n. 112K is put in 388K partition
o. 426K must wait
In this example, Best-fit turns out to be the best.
8.4 Most systems allow programs to allocate more memory to its address
space during execution.Data allocated in the heap segments of programs
is an example of such allocated memory. What is required to support
dynamic memory allocation in the following schemes:
a. contiguous-memory allocation
b. pure segmentation
c. pure paging
Answer:
. contiguous-memory allocation: might require relocation of the
entire program since there is not enough space for the program to
grow its allocated memory space.
. pure segmentation: might also require relocation of the segment
that needs to be extended since there is not enough space for the
segment to grow its allocated memory space.
. pure paging: incremental allocation of new pages is possible in this
scheme without requiring relocation of the programs address space.
Exercises 57
8.5 Compare the mainmemoryorganization schemes of contiguous-memory
allocation, pure segmentation, and pure paging with respect to the following
issues:
a. external fragmentation
b. internal fragmentation
c. ability to share code across processes
Answer: ontiguous memory allocation scheme suffers from external
fragmentation as address spaces are allocated contiguously and holes
develop as old processes dies and new processes are initiated. It also
does not allow processes to share code, since a processs virtual memory
segment is not broken into non-contiguous finegrained segments. Pure
segmentation also suffers fromexternal fragmentation as a segment of a
process is laid out contiguously in physical memory and fragmentation
would occur as segments of dead processes are replaced by segments of
new processes. Segmentation, however, enables processes to share code;
for instance, two different processes could share a code segment but have
distinct data segments. Pure paging does not suffer from external fragmentation,
but instead suffers frominternal fragmentation. Processes are
allocated in page granularity and if a page is not completely utilized, it
results in internal fragmentation and a corresponding wastage of space.
Paging also enables processes to share code at the granularity of pages.
8.6 On a system with paging, a process cannot access memory that it does
not own; why? How could the operating system allow access to other
memory?Why should it or should it not?
Answer: An address on a paging system is a logical page number and
an offset. The physical page is found by searching a table based on the
logical page number to produce a physical page number. Because the
operating system controls the contents of this table, it can limit a process
to accessing only those physical pages allocated to the process. There is
no way for a process to refer to a page it does not own because the page
will not be in the page table. To allow such access, an operating system
simply needs to allow entries for non-process memory to be added to
the processs page table. This is useful when two or more processes
need to exchange data.they just read and write to the same physical
addresses (which may be at varying logical addresses). This makes for
very efficient interprocess communication.
8.7 Compare pagingwith segmentationwith respect to the amount ofmemory
required by the address translation structures in order to convert
virtual addresses to physical addresses.
Answer: Paging requires more memory overhead to maintain the translation
structures. Segmentation requires just two registers per segment:
one to maintain the base of the segment and the other to maintain the
extent of the segment. Paging on the other hand requires one entry per
page, and this entry provides the physical address in which the page is
located.
58 Chapter 8 Memory Management
8.8 Program binaries in many systems are typically structured as follows.
Code is stored starting with a small fixed virtual address such as 0. The
code segment is followed by the data segment that is used for storing
the program variables. When the program starts executing, the stack is
allocated at the other end of the virtual address space and is allowed to
grow towards lower virtual addresses. What is the significance of the
above structure on the following schemes:
a. contiguous-memory allocation
b. pure segmentation
c. pure paging
Answer: 1) Contiguous-memory allocation requires the operating system
to allocate the entire extent of the virtual address space to the program
when it starts executing. This could bemuch higher than the actual
memory requirements of the process. 2) Pure segmentation gives the
operating system flexibility to assign a small extent to each segment at
program startup time and extend the segment if required. 3) Pure paging
does not require the operating system to allocate themaximum extent of
the virtual address space to a process at startup time, but it still requires
the operating system to allocate a large page table spanning all of the
programs virtual address space. When a program needs to extend the
stack or the heap, it needs to allocate a new page but the corresponding
page table entry is preallocated.
8.9 Consider a paging system with the page table stored in memory.
a. If a memory reference takes 200 nanoseconds, how long does a
paged memory reference take?
b. If we add associative registers, and 75 percent of all page-table
references are found in the associative registers, what is the effective
memory reference time? (Assume that finding a page-table
entry in the associative registers takes zero time, if the entry is
there.)
Answer:
a. 400 nanoseconds; 200 nanoseconds to access the page table and
200 nanoseconds to access the word in memory.
b. Effective access time = 0.75  (200 nanoseconds) + 0.25  (400
nanoseconds) = 250 nanoseconds.
8.10 Why are segmentation andpaging sometimes combined into one scheme?
Answer: Segmentation and paging are often combined in order to improve
upon each other. Segmented paging is helpfulwhen the page table
becomes very large. A large contiguous section of the page table that is
unused can be collapsed into a single segment table entry with a pagetable
address of zero. Paged segmentation handles the case of having
very long segments that require a lot of time for allocation. By paging
the segments, we reduce wasted memory due to external fragmentation
as well as simplify the allocation.
Exercises 59
8.11 Explain why it is easier to share a reentrant module using segmentation
than it is to do so when pure paging is used.
Answer: Since segmentation is based on a logical division of memory
rather than a physical one, segments of any size can be shared with only
one entry in the segment tables of each user.With paging there must be
a common entry in the page tables for each page that is shared.
8.12 Consider the following segment table:
Segment Base Length
0 219 600
1 2300 14
2 90 100
3 1327 580
4 1952 96
What are the physical addresses for the following logical addresses?
a. 0,430
b. 1,10
c. 2,500
d. 3,400
e. 4,112
Answer:
a. 219 + 430 = 649
b. 2300 + 10 = 2310
c. illegal reference, trap to operating system
d. 1327 + 400 = 1727
e. illegal reference, trap to operating system
8.13 What is the purpose of paging the page tables?
Answer: In certain situations the page tables could becomelarge enough
that by paging the page tables, one could simplify the memory allocation
problem (by ensuring that everything is allocated as fixed-size pages
as opposed to variable-sized chunks) and also enable the swapping of
portions of page table that are not currently used.
8.14 Consider the hierarchical paging scheme used by the VAX architecture.
How many memory operations are performed when an user program
executes a memory load operation?
Answer: When a memory load operation is performed, there are three
memory operations that might be performed. One is to translate the
position where the page table entry for the page could be found (since
page tables themselves are paged). The second access is to access the
page table entry itself, while the third access is the actual memory load
operation.
60 Chapter 8 Memory Management
8.15 Compare the segmented paging scheme with the hashed page tables
scheme for handling large address spaces. Under what circumstances is
one scheme preferrable over the other?
Answer: When a program occupies only a small portion of its large
virtual address space, a hashed page table might be preferred due to
its smaller size. The disadvantage with hashed page tables however is
the problem that arises due to conflicts in mapping multiple pages onto
the same hashed page table entry. If many pages map to the same entry,
then traversing the list corresponding to that hash table entry could incur
a significant overhead; such overheads are minimal in the segmented
paging scheme where each page table entry maintains information regarding
only one page.
8.16 Consider the Intel address translation scheme shown in Figure 8.22.
a. Describe all the steps that the Intel 80386 takes in translating a
logical address into a physical address.
b. What are the advantages to the operating system of hardware that
provides such complicated memory translation hardware?
c. Are there any disadvantages to this address-translation system? If
so,what are they? If not,why is it not used by everymanufacturer?
Answer:
a. The selector is an index into the segment descriptor table. The segment
descriptor result plus the original offset is used to produce
a linear address with a dir, page, and offset. The dir is an index
into a page directory. The entry fromthe page directory selects the
page table, and the page field is an index into the page table. The
entry from the page table, plus the offset, is the physical address.
b. Such a page translation mechanism offers the flexibility to allow
most operating systems to implement their memory scheme in
hardware, instead of having to implement some parts in hardware
and some in software. Because it can be done in hardware, it is
more efficient (and the kernel is simpler).
c. Address translation can take longer due to the multiple table
lookups it can invoke. Caches help, but there will still be cache
misses.
CH9A P T E R
Virtual
Memory
Virtual memory can be a very interesting subject since it has so many different
aspects: page faults, managing the backing store, page replacement, frame
allocation, thrashing, page size. The objectives of this chapter are to explain
these concepts and show how paging works.
A simulation is probably the easiest way to allow the students to program
several of the page-replacement algorithms and see how they really work.
If an interactive graphics display can be used to display the simulation as it
works, the students may be better able to understand how paging works. We
also present an exercise that asks the student to develop a Java program that
implements the FIFO and LRU page replacement algorithms.
Exercises
9.1 Give an example that illustrates the problem with restarting the block
move instruction (MVC) on the IBM 360/370 when the source and destination
regions are overlapping.
Answer: Assume that the page boundary is at 1024 and the move
instruction is moving values from a source region of 800:1200 to a target
region of 700:1100. Assume that a page fault occurs while accessing
location 1024. By this time the locations of 800:923 have been overwritten
with the new values and therefore restarting the block move instruction
would result in copying the new values in 800:923 to locations 700:823,
which is incorrect.
9.2 Discuss the hardware support required to support demand paging.
Answer: For every memory access operation, the page table needs to
be consulted to check whether the corresponding page is resident or
not and whether the program has read or write privileges for accessing
the page. These checks would have to be performed in hardware. A
TLB could serve as a cache and improve the performance of the lookup
operation.
61
62 Chapter 9 Virtual Memory
9.3 What is the copy-on-write feature and under what circumstances is it
beneficial to use this feature? What is the hardware support required to
implement this feature?
Answer: When two processes are accessing the same set of program
values (for instance, the code segment of the source binary), then it is
useful to map the corresponding pages into the virtual address spaces
of the two programs in a write-protected manner. When a write does
indeed take place, then a copy must be made to allow the two programs
to individually access the different copies without interfering with each
other. The hardware support required to implement is simply the following:
on each memory access, the page table needs to be consulted to
checkwhether the page iswrite-protected. If it is indeedwrite-protected,
a trap would occur and the operating system could resolve the issue.
9.4 A certain computer provides its users with a virtual-memory space of
232 bytes. The computer has 218 bytes of physical memory. The virtual
memory is implemented by paging, and the page size is 4096 bytes.
A user process generates the virtual address 11123456. Explain how
the system establishes the corresponding physical location. Distinguish
between software and hardware operations.
Answer: The virtual address in binary form is
0001 0001 0001 0010 0011 0100 0101 0110
Since the page size is 212, the page table size is 220. Therefore the loworder
12 bits 0100 0101 0110 are used as the displacement into the page,
while the remaining 20 bits 0001 0001 0001 0010 0011 are used as the
displacement in the page table.
9.5 Assume we have a demand-paged memory. The page table is held in
registers. It takes 8 milliseconds to service a page fault if an empty page
is available or the replaced page is not modified, and 20 milliseconds if
the replaced page is modified. Memory access time is 100 nanoseconds.
Assume that the page to be replaced is modified 70 percent of the time.
What is the maximum acceptable page-fault rate for an effective access
time of no more than 200 nanoseconds?
Answer:
0.2 sec = (1 . P)  0.1 sec + (0.3P)  8 millisec + (0.7P)  20 millisec
0.1 = .0.1P + 2400 P + 14000 P
0.1 16,400 P
P 0.000006
9.6 Assume that you are monitoring the rate at which the pointer in the
clock algorithm (which indicates the candidate page for replacement)
moves. What can you say about the system if you notice the following
behavior:
a. pointer is moving fast
b. pointer is moving slow
Exercises 63
Answer: If the pointer is moving fast, then the program is accessing a
large number of pages simultaneously. It is most likely that during the
period between the point at which the bit corresponding to a page is
cleared and it is checked again, the page is accessed again and therefore
cannot be replaced. This results in more scanning of the pages before
a victim page is found. If the pointer is moving slow, then the virtual
memory system is finding candidate pages for replacement extremely
efficiently, indicating that many of the resident pages are not being accessed.
9.7 Discuss situationsunderwhich the least frequentlyused page-replacement
algorithm generates fewer page faults than the least recently used pagereplacement
algorithm. Also discuss under what circumstance does the
opposite holds.
Answer: Consider the following sequence of memory accesses in a
system that can hold four pages in memory. Sequence: 1 1 2 3 4 5 1.
When page 5 is accessed, the least frequently used page-replacement
algorithm would replace a page other than 1, and therefore would not
incur a page fault when page 1 is accessed again. On the other hand, for
the sequence 1 2 3 4 5 2, the least recently used algorithm performs
better.
9.8 Discuss situationsunderwhich the most frequently used page-replacement
algorithm generates fewer page faults than the least recently used pagereplacement
algorithm. Also discuss under what circumstance does the
opposite holds.
Answer: Consider the sequence in a system that holds four pages in
memory: 1 2 3 4 4 4 5 1. The most frequently used page replacement algorithm
evicts page 4 while fetching page 5, while the LRUalgorithm evicts
page 1. This is unlikely to happen much in practice. For the sequence 1
2 3 4 4 4 5 1, the LRUalgorithmmakes the right decision.
9.9 The VAX/VMS system uses a FIFO replacement algorithm for resident
pages and a free-frame pool of recently used pages. Assume that the
free-frame pool is managed using the least recently used replacement
policy. Answer the following questions:
a. If a page fault occurs and if the page does not exist in the freeframe
pool, how is free space generated for the newly requested
page?
b. If a page fault occurs and if the page exists in the free-frame pool,
how is the resident page set and the free-frame pool managed to
make space for the requested page?
c. What does the system degenerate to if the number of resident
pages is set to one?
d. What does the system degenerate to if the number of pages in the
free-frame pool is zero?
Answer:
64 Chapter 9 Virtual Memory
. When a page fault occurs and if the page does not exist in the
free-frame pool, then one of the pages in the free-frame pool is
evicted to disk, creating space for one of the resident pages to be
moved to the free-frame pool. The accessed page is then moved to
the resident set.
. When a page fault occurs and if the page exists in the free-frame
pool, then it is moved into the set of resident pages, while one of the
resident pages is moved to the free-frame pool.
. When the number of resident pages is set to one, then the system
degenerates into the page replacement algorithm used in the
free-frame pool, which is typically managed in a LRU fashion.
. When the number of pages in the free-frame pool is zero, then the
system degenerates into a FIFO page replacement algorithm.
9.10 Consider a demand-paging system with the following time-measured
utilizations:
CPU utilization 20%
Paging disk 97.7%
Other I/O devices 5%
Which (if any) of the following will (probably) improve CPU utilization?
Explain your answer.
a. Install a faster CPU.
b. Install a bigger paging disk.
c. Increase the degree of multiprogramming.
d. Decrease the degree of multiprogramming.
e. Install more main memory.
f. Install a faster hard disk or multiple controllers with multiple
hard disks.
g. Add prepaging to the page fetch algorithms.
h. Increase the page size.
Answer: The system obviously is spending most of its time paging,
indicating over-allocation of memory. If the level of multiprogramming
is reduced resident processes would page fault less frequently and the
CPU utilization would improve. Another way to improve performance
would be to get more physical memory or a faster paging drum.
a. Get a faster CPU.No.
b. Get a bigger paging drum.No.
c. Increase the degree of multiprogramming.No.
d. Decrease the degree of multiprogramming.Yes.
Exercises 65
e. Installmoremainmemory.Likely to improve CPU utilization as
more pages can remain resident and not require paging to or from
the disks.
f. Install a faster hard disk, or multiple controllers with multiple
hard disks.Also an improvement, for as the disk bottleneck is
removed by faster response and more throughput to the disks,
the CPU will get more data more quickly.
g. Add prepaging to the page fetch algorithms.Again, the CPU will
getmore data faster, so it will be more in use. This is only the case
if the paging action is amenable to prefetching (i.e., some of the
access is sequential).
h. Increase the page size.Increasing the page size will result in
fewer page faults if data is being accessed sequentially. If data
access is more or less random, more paging action could ensue
because fewer pages can be kept in memory and more data is
transferred per page fault. So this change is as likely to decrease
utilization as it is to increase it.
9.11 Suppose that a machine provides instructions that can access memory
locations using the one-level indirect addressing scheme. What is the
sequence of page faults incurred when all of the pages of a program
are currently non-resident and the first instruction of the program is
an indirect memory load operation? What happens when the operating
system is using a per-process frame allocation technique and only two
pages are allocated to this process?
Answer: The following page faults take place: page fault to access the
instruction, a page fault to access the memory location that contains a
pointer to the target memory location, and a page fault when the target
memory location is accessed. The operating system will generate three
page faults with the third page replacing the page containing the instruction.
If the instruction needs to be fetched again to repeat the trapped
instruction, then the sequence of page faults will continue indefinitely.
If the instruction is cached in a register, then it will be able to execute
completely after the third page fault.
9.12 Suppose that your replacement policy (in a paged system) is to examine
each page regularly and to discarding that page if it has not been used
since the last examination. What would you gain and what would you
lose by using this policy rather than LRU or second-chance replacement?
Answer: Such an algorithm could be implemented with the use of a
reference bit. After every examination, the bit is set to zero; set back
to one if the page is referenced. The algorithm would then select an
arbitrary page for replacement from the set of unused pages since the
last examination.
The advantage of this algorithm is its simplicity - nothing other than a
reference bit need be maintained. The disadvantage of this algorithm is
that it ignores locality by only using a short time frame for determining
whether to evict a page or not. For example, a page may be part of
the working set of a process, but may be evicted because it was not
66 Chapter 9 Virtual Memory
referenced since the last examination (i.e. not all pages in the working
set may be referenced between examinations.)
9.13 A page-replacement algorithm should minimize the number of page
faults. We can do this minimization by distributing heavily used pages
evenly over all of memory, rather than having them compete for a small
number of page frames.We can associatewith each page frame a counter
of the number of pages that are associated with that frame. Then, to
replace a page, we search for the page frame with the smallest counter.
a. Define a page-replacement algorithm using this basic idea. Specifically
address the problems of (1) what the initial value of the
counters is, (2) when counters are increased, (3) when counters
are decreased, and (4) how the page to be replaced is selected.
b. Howmany page faults occur for your algorithm for the following
reference string, for four page frames?
1, 2, 3, 4, 5, 3, 4, 1, 6, 7, 8, 7, 8, 9, 7, 8, 9, 5, 4, 5, 4, 2.
c. What is the minimum number of page faults for an optimal pagereplacement
strategy for the reference string in part b with four
page frames?
Answer:
a. Define a page-replacement algorithmaddressing the problems of:
1. Initial value of the counters.0.
2. Counters are increased.whenever a new page is associated
with that frame.
3. Counters are decreased.whenever one of the pages associated
with that frame is no longer required.
4. How the page to be replaced is selected.find a frame with
the smallest counter. Use FIFO for breaking ties.
b. 14 page faults
c. 11 page faults
9.14 Consider a demand-paging system with a paging disk that has an average
access and transfer time of 20milliseconds.Addresses are translated
through a page table inmainmemory, with an access time of 1microsecond
per memory access. Thus, eachmemory reference through the page
table takes two accesses. To improve this time, we have added an associativememory
that reduces access time to one memory reference, if the
page-table entry is in the associative memory.
Assume that 80 percent of the accesses are in the associative memory
and that, of the remaining, 10 percent (or 2 percent of the total) cause
page faults.What is the effective memory access time?
Answer:
Exercises 67
effective access time = (0.8)  (1 sec)
+ (0.1)  (2 sec) + (0.1)  (5002 sec)
= 501.2 sec
= 0.5 millisec
9.15 What is the cause of thrashing? How does the system detect thrashing?
Once it detects thrashing, what can the system do to eliminate this
problem?
Answer: Thrashing is caused by underallocation of theminimum number
of pages required by a process, forcing it to continuously page fault.
The system can detect thrashing by evaluating the level of CPU utilization
as compared to the level of multiprogramming. It can be eliminated
by reducing the level of multiprogramming.
9.16 Is it possible for a process to have two working sets? One representing
data and another representing code? Explain.
Answer: Yes, in fact many processors provide two TLBs for this very
reason. As an example, the code being accessed by a process may retain
the same working set for a long period of time. However, the data the
code accesses may change, thus reflecting a change in the working set
for data accesses.
9.17 Consider the parameter used to define the working-set window in
the working-set model. What is the effect of setting to a small value
on the page fault frequency and the number of active (non-suspended)
processes currently executing in the system? What is the effect when
is set to a very high value?
Answer: When is set to a small value, then the set of resident pages for
a process might be underestimated allowing a process to be scheduled
even though all of its required pages are not resident. This could result
in a large number of page faults. When is set to a large value, then
a processs resident set is overestimated and this might prevent many
processes from being scheduled even though their required pages are
resident.However, once a process is scheduled, it is unlikely to generate
page faults since its resident set has been overestimated.
9.18 Assume there is an initial 1024 KB segment where memory is allocated
using the Buddy System. Using Figure 9.27 as a guide, draw the tree
illustrating how the following memory requests are allocated:
. request 240 bytes
. request 120 bytes
. request 60 bytes
. request 130 bytes
Next, modify the tree for the following releases of memory. Perform
coalescing whenever possible:
. release 240 bytes
. release 60 bytes
68 Chapter 9 Virtual Memory
. release 120 bytes
Answer: The following allocation is made by the Buddy system: The
240 byte request is assigned a 256 byte segment. The 120 byte request is
assigned a 128 byte segement, the 60 byte request is assigned a 64 byte
segment and the 130 byte request is assigned a 256 byte segment. After
the allocation, the following segment sizes are available: 64 bytes, 256
bytes, 1K, 2K, 4K, 8K, 16K, 32K, 64K, 128K, 256K, and 512K.
After the releases of memory, the only segment in use would be a 256
byte segment containing 130 bytes of data. The following segments will
be free: 256 bytes, 512 bytes, 1K, 2K, 4K, 8K, 16K, 32K, 64K, 128K, 256K,
and 512K.
9.19 The slab allocation algorithm uses a separate cache for each different
object type. Assuming there is one cache per object type, explain why
this doesnt scalewellwithmultiple CPUs.What couldbedone toaddress
this scalability issue?
Answer: This had long been a problem with the slab allocator - poor
scalability with multiple CPUs. The issue comes from having to lock the
global cache when it is being accesses. This has the effect of serializing
cache accesses on multiprocessor systems. Solaris has addressed this by
introducing a per-CPU cache, rather than a single global cache.
9.20 Consider a system that allocates pages of different sizes to its processes.
What are the advantages of such a paging scheme? What modifications
to the virtual memory system are provide this functionality?
Answer: The program could have a large code segment or use largesized
arrays as data. These portions of the program could be allocated to
larger pages, thereby decreasing the memory overheads associated with
a page table. The virtual memory system would then have to maintain
multiple free lists of pages for the different sizes and should also need
to have more complex code for address translation to take into account
different page sizes.
9.21 Write a program that implements the FIFO and LRU page-replacement
algorithms presented in this chapter. First, generate a random pagereference
string where page numbers range from 0..9. Apply the random
page-reference string to each algorithm and record the number
of page faults incurred by each algorithm. Implement the replacement
algorithms such that the number of page frames can vary from 1..7.
Assume that demand paging is used.
Answer: Please refer to the supportingWeb site for source code solution.
9.22 The Catalan numbers are an integer sequence Cn that appear in tree
enumeration problems. The first Catalan numbers for n = 1, 2, 3, ... are
1, 2, 5, 14, 42, 132, .... A formula generating Cn is
Cn = 1
(n+1) 2n
n = (2n)!
(n+1)!n!
Design two programs that communicate with shared memory using
the Win32 API as outlined in Section 9.7.2. The producer process will
Exercises 69
generate the Catalan sequence and write it to a shared memory object.
The consumer process will then read and output the sequence from
shared memory.
In this instance, the producer process will be passed an integer parameter
on the command line specifying the number of Catalan numbers
to produce, i.e. providing 5 on the command line means the producer
process will generate the first 5 Catalan numbers.
Answer: Please refer to the supportingWeb site for source code solution.
C1HA0P T E R
File-System
Interface
Files are the most obvious object that operating systems manipulate. Everything
is typically stored in files: programs, data, output, etc. The student should
learn what a file is to the operating system and what the problems are (providing
naming conventions to allow files to be found by user programs, protection).
Two problems can crop up with this chapter. First, terminology may be
different between your system and the book. This can be used to drive home
the point that concepts are important and terms must be clearly defined when
you get to a new system. Second, it may be difficult to motivate students
to learn about directory structures that are not the ones on the system they
are using. This can best be overcome if the students have two very different
systems to consider, such as a single-user system for a microcomputer and a
large, university time-shared system.
Projects might include a report about the details of the file system for the
local system. It is also possible to write programs to implement a simple file
system either in memory (allocate a large block of memory that is used to
simulate a disk) or on top of an existing file system. In many cases, the design
of a file system is an interesting project of its own.
Exercises
10.1 Consider a file system where a file can be deleted and its disk space
reclaimedwhile links to that file still exist. What problems may occur if
a new file is created in the same storage area or with the same absolute
path name? How can these problems be avoided?
Answer: Let F1 be the old file and F2 be the new file. A userwishing
to access F1 through an existing link will actually access F2. Note that
the access protection for file F1 is used rather than the one associated
with F2.
71
72 Chapter 10 File-System Interface
This problem can be avoided by insuring that all links to a deleted file
are deleted also. This can be accomplished in several ways:
a. maintain a list of all links to a file, removing each of them when
the file is deleted
b. retain the links, removing them when an attempt is made to
access a deleted file
c. maintain a file reference list (or counter), deleting the file only
after all links or references to that file have been deleted.
10.2 The open-file table is used to maintain information about files that are
currently open. Should the operating system maintain a separate table
for each user or just maintain one table that contains references to files
that are being accessed by all users at the current time? If the same file
is being accessed by two different programs or users, should there be
separate entries in the open file table?
Answer: By keeping a central open-file table, the operating system
can perform the following operation that would be infeasible otherwise.
Consider a file that is currently being accessed by one or more
processes. If the file is deleted, then it should not be removed from
the disk until all processes accessing the file have closed it. This check
could be performed only if there is centralized accounting of number
of processes accessing the file. On the other hand, if two processes are
accessing the file, then separate state needs to be maintained to keep
track of the current location ofwhich parts of the file are being accessed
by the two processes. This requires the operating system to maintain
separate entries for the two processes.
10.3 What are the advantages and disadvantages of a system providing
mandatory locks instead of providing advisory locks whose usage is
left to the users discretion?
Answer: In many cases, separate programs might bewilling to tolerate
concurrent access to a file without requiring the need to obtain locks and
thereby guaranteeing mutual exclusion to the files. Mutual exclusion
could be guaranteed by other program structures such as memory locks
or other forms of synchronization. In such situations, the mandatory
locks would limit the flexibility in how files could be accessed and
might also increase the overheads associated with accessing files.
10.4 What are the advantages and disadvantages of recording the name
of the creating program with the files attributes (as is done in the
Macintosh Operating System)?
Answer: By recording the name of the creating program, the operating
system is able to implement features (such as automatic program
invocation when the file is accessed) based on this information. It does
add overhead in the operating system and require space in the file
descriptor, however.
10.5 Some systems automatically open a file when it is referenced for the first
time, and close the filewhen the job terminates. Discuss the advantages
Exercises 73
and disadvantages of this scheme as compared to the more traditional
one, where the user has to open and close the file explicitly.
Answer: Automatic opening and closing of files relieves the user from
the invocation of these functions, and thus makes it more convenient
to the user; however, it requires more overhead than the case where
explicit opening and closing is required.
10.6 If the operating system were to know that a certain application is going
to access the file data in a sequential manner, how could it exploit this
information to improve performance?
Answer: When a block is accessed, the file system could prefetch the
subsequent blocks in anticipation of future requests to these blocks. This
prefetching optimization would reduce the waiting time experienced
by the process for future requests.
10.7 Give an example of an application that could benefit from operating
system support for random access to indexed files.
Answer: An application that maintains a database of entries could
benefit from such support. For instance, if a program is maintaining
a student database, then accesses to the database cannot be modeled
by any predetermined access pattern. The access to records are random
and locating the recordswould be more efficient if the operating system
were to provide some form of tree-based index.
10.8 Discuss the merits and demerits of supporting links to files that cross
mount points (that is, the file link refers to a file that is stored in a
different volume).
Answer: The advantage is that there is greater transparency in the
sense that the user does not need to be aware ofmount points and create
links in all scenarios. The disadvantage however is that the filesystem
containing the link might be mounted while the filesystem containing
the target file might not be, and therefore one cannot provide transparent
access to the file in such a scenario; the error condition would
expose to the user that a link is a dead link and that the link does indeed
cross filesystem boundaries.
10.9 Some systems provide file sharing by maintaining a single copy of a
file; other systems maintain several copies, one for each of the users
sharing the file. Discuss the relative merits of each approach.
Answer: With a single copy, several concurrent updates to a file may
result in user obtaining incorrect information, and the file being left in
an incorrect state. With multiple copies, there is storage waste and the
various copies may not be consistent with respect to each other.
10.10 Discuss the advantages and disadvantages of associating with remote
file systems (stored on file servers) a different set of failure semantics
from that associated with local file systems.
Answer: The advantage is that the application can deal with the failure
condition in a more intelligent manner if it realizes that it incurred an
errorwhile accessing a file stored in a remote filesystem. For instance, a
file open operation could simply fail instead of hangingwhen accessing
a remote file on a failed server and the application could deal with the
74 Chapter 10 File-System Interface
failure in the best possible manner; if the operation were to simply
hang, then the entire application hangs which is not desirable. The
disadvantage however is the lack of uniformity in failure semantics
and the resulting complexity in application code.
10.11 What are the implications of supporting UNIXconsistency semantics for
shared access for those files that are stored on remote file systems.
Answer: UNIXconsistency semantics requires updates to a file to be
immediately available to other processes. Supporting such a semantics
for shared files on remote file systems could result in the following
inefficiencies: all updates by a client have to be immediately reported
to the fileserver instead of being batched (or even ignored if the updates
are to a temporary file), and updates have to be communicated by the
fileserver to clients caching the data immediately again resulting in
more communication.
CH1A1P T E R
File-System
Implementation
In this chapter we discuss various methods for storing information on secondary
storage. The basic issues are device directory, free space management,
and space allocation on a disk.
Exercises
11.1 Consider afile system that uses amodifed contiguous-allocation scheme
with support for extents. A file is a collection of extents, with each extent
corresponding to a contiguous set of blocks. A key issue in such
systems is the degree of variability in the size of the extents. What are
the advantages and disadvantages of the following schemes:
a. All extents are of the same size, and the size is predetermined.
b. Extents can be of any size and are allocated dynamically.
c. Extents can be of a few fixed sizes, and these sizes are predetermined.
Answer: If all extents are of the same size, and the size is predetermined,
then it simplifies the block allocation scheme. A simple bit map
or free list for extentswould suffice. If the extents can be of any size and
are allocated dynamically, then more complex allocation schemes are
required. It might be difficult to find an extent of the appropriate size
and there might be external fragmentation. One could use the Buddy
system allocator discussed in the previous chapters to design an appropriate
allocator. When the extents can be of a few fixed sizes, and
these sizes are predetermined, one would have to maintain a separate
bitmap or free list for each possible size. This scheme is of intermediate
complexity and of intermediate flexibility in comparison to the earlier
schemes.
75
76 Chapter 11 File-System Implementation
11.2 What are the advantages of the variation of linked allocation that uses
a FAT to chain together the blocks of a file?
Answer: The advantage is that while accessing a block that is stored
at the middle of a file, its location can be determined by chasing the
pointers stored in the FAT as opposed to accessing all of the individual
blocks of the file in a sequentialmanner to find the pointer to the target
block. Typically,most of the FAT can be cached inmemory and therefore
the pointers can be determined with just memory accesses instead of
having to access the disk blocks.
11.3 Consider a system where free space is kept in a free-space list.
a. Suppose that the pointer to the free-space list is lost. Can the
system reconstruct the free-space list? Explain your answer.
b. Consider a file system similar to the one used by UNIX with
indexed allocation. How many disk I/O operations might be
required to read the contents of a small local file at /a/b/c?Assume
that none of the disk blocks is currently being cached.
c. Suggest a scheme to ensure that the pointer is never lost as a
result of memory failure.
Answer:
a. In order to reconstruct the free list, it would be necessary to
perform garbage collection. This would entail searching the
entire directory structure to determine which pages are already
allocated to jobs. Those remaining unallocated pages could be
relinked as the free-space list.
b. The free-space list pointer could be stored on the disk, perhaps
in several places.
c.
11.4 Some file systems allow disk storage to be allocated at different levels
of granularity. For instance, a file system could allocate 4 KB of disk
space as a single 4-KB block or as eight 512-byte blocks. How could
we take advantage of this flexibility to improve performance? What
modifications would have to be made to the free-space management
scheme in order to support this feature?
Answer: Such a scheme would decrease internal fragmentation. If a
file is 5KB, then it could be allocated a 4KB block and two contiguous
512-byte blocks. In addition to maintaining a bitmap of free blocks,
one would also have to maintain extra state regarding which of the
subblocks are currently being used inside a block. The allocator would
then have to examine this extra state to allocate subblocks and coallesce
the subblocks to obtain the larger block when all of the subblocks
become free.
11.5 Discuss how performance optimizations for file systems might result
in difficulties inmaintaining the consistency of the systems in the event
of computer crashes.
Exercises 77
Answer: The primary difficulty that might arise is due to delayed
updates of data and metadata. Updates could be delayed in the hope
that the same data might be updated in the future or that the updated
data might be temporary and might be deleted in the near future.
However, if the system were to crash without having committed the
delayed updates, then the consistency of the file system is destroyed.
11.6 Consider a file system on a disk that has both logical and physical block
sizes of 512 bytes. Assume that the information about each file is already
inmemory. For each of the three allocation strategies (contiguous,
linked, and indexed), answer these questions:
a. How is the logical-to-physical address mapping accomplished
in this system? (For the indexed allocation, assume that a file is
always less than 512 blocks long.)
b. If we are currently at logical block 10 (the last block accessedwas
block 10) and want to access logical block 4, how many physical
blocks must be read from the disk?
Answer: Let Z be the starting file address (block number).
a. Contiguous. Divide the logical address by 512 with X and Y the
resulting quotient and remainder respectively.
1. Add X to Z to obtain the physical block number. Y is the
displacement into that block.
2. 1
b. Linked. Divide the logical physical address by 511 with X and Y
the resulting quotient and remainder respectively.
1. Chase down the linked list (getting X + 1 blocks). Y + 1
is the displacement into the last physical block.
2. 4
c. Indexed. Divide the logical address by 512 with X and Y the
resulting quotient and remainder respectively.
1. Get the index block into memory. Physical block address
is contained in the index block at location X. Y is the displacement
into the desired physical block.
2. 2
11.7 Fragmentation on a storage device could be eliminated by recompaction
of the information. Typical disk devices do not have relocation
or base registers (such as are used when memory is to be compacted),
so how can we relocate files? Give three reasonswhy recompacting and
relocation of files often are avoided.
Answer: Relocation of files on secondary storage involves considerable
overhead.data blocks would have to be read into main memory and
written back out to their new locations. Furthermore, relocation registers
apply only to sequential files, and many disk files are not sequential.
For this same reason, many new files will not require contiguous disk
space; even sequential files can be allocated noncontiguous blocks if
78 Chapter 11 File-System Implementation
links between logically sequential blocks are maintained by the disk
system.
11.8 In what situations would using memory as a RAM disk be more useful
than using it as a disk cache?
Answer: In cases where the user (or system) knows exactly what data
is going to be needed. Caches are algorithm-based, while a RAM disk is
user-directed.
11.9 Consider the following augmentation of a remote-file-access protocol.
Each client maintains a name cache that caches translations from file
names to corresponding file handles. What issues should we take into
account in implementing the name cache?
Answer: One issue is maintaining consistency of the name cache. If
the cache entry becomes inconsistent, then it should be either updated
or its inconsistency should be detected when it is used next. If the inconsistency
is detected later, then there should be a fallbackmechanism
for determining the new translation for the name. Also, another related
issue is whether a name lookup is performed one element at a time
for each subdirectory in the pathname or whether it is performed in a
single shot at the server. If it is perfomed one element at a time, then
the client might obtain more information regarding the translations for
all of the intermediate directories. On the other hand, it increases the
network traffic as a single name lookup causes a sequence of partial
name lookups.
11.10 Explain why logging metadata updates ensures recovery of a file system
after a file system crash.
Answer: For a file system to be recoverable after a crash, it must
be consistent or must be able to be made consistent. Therefore, we
have to prove that logging metadata updates keeps the file system in
a consistent or able-to-be-consistent state. For a file system to become
inconsistent, themetadata must be written incompletely or in the wrong
order to the file system data structures. With metadata logging, the
writes aremade to a sequential log. The complete transaction is written
there before it is moved to the file system structures. If the system
crashes during file system data updates, the updates can be completed
based on the information in the log. Thus, logging ensures that file
system changes are made completely (either before or after a crash).
The order of the changes are guaranteed to be correct because of the
sequential writes to the log. If a change was made incompletely to the
log, it is discarded, with no changes made to the file system structures.
Therefore, the structures are either consistent or can be trivially made
consistent via metadata logging replay.
11.11 Consider the following backup scheme:
. Day 1. Copy to a backup medium all files from the disk.
. Day 2. Copy to another medium all files changed since day 1.
. Day 3. Copy to another medium all files changed since day 1.
Exercises 79
This contrasts to the schedule given in Section 11.7.2 by having all
subsequent backups copy all files modified since the first full backup.
What are the benefits of this system over the one in Section 11.7.2?
What are the drawbacks? Are restore operations made easier or more
difficult? Explain your answer.
Answer: Restores are easier because you can go to the last backup
tape, rather than the full tape. No intermediate tapes need be read.
More tape is used as more files change.
Mass C1HA2P T E R
Storage
Structure
In this chapter we describe the internal data structures and algorithms used by
the operating system to implement this interface. We also discuss the lowest
level of the file system the secondary storage structure.We first describe diskhead-
scheduling algorithms. Next we discuss disk formatting and management
of boot blocks, damaged blocks, and swap space. We end with coverage
of disk reliability and stable-storage.
The basic implementation of disk scheduling should be fairly clear: requests,
queues, servicing, so the main new consideration is the actual algorithms:
FCFS, SSTF, SCAN, C-SCAN, LOOK, C-LOOK. Simulation may be the best
way to involve the student with the algorithms exercise 12.6 provides a question
amenable to a small but open-ended simulation study.
The paper by Worthington et al. [1994] gives a good presentation of the
disk-scheduling algorithms and their evaluation. Be suspicious of the results of
the disk scheduling papers fromthe 1970s, such as Teory and Pinkerton [1972],
because they generally assume that the seek time function is linear, rather than
a square root. The paper by Lynch [1972b] shows the importance of keeping
the overall system context in mind when choosing scheduling algorithms.
Unfortunately, it is fairly difficult to find.
Chapter 2 introduced the concept of primary, secondary, and tertiary storage.
In this chapter, we discuss tertiary storage in more detail. Firstwe describe
the types of storage devices used for tertiary storage. Next, we discuss the issues
that arise when an operating system uses tertiary storage. Finally, we
consider some performance aspects of tertiary storage systems.
Exercises
12.1 None of the disk-scheduling disciplines, except FCFS, is truly fair (starvation
may occur).
a. Explain why this assertion is true.
81
82 Chapter 12 Mass-Storage Structure
b. Describe a way to modify algorithms such as SCAN to ensure
fairness.
c. Explain why fairness is an important goal in a time-sharing system.
d. Give three or more examples of circumstances in which it is
important that the operating system be unfair in serving I/O
requests.
Answer:
a. New requests for the track overwhich the head currently resides
can theoretically arrive as quickly as these requests are being
serviced.
b. All requests older than some predeterminedage could be forced
to the top of the queue, and an associated bit for each could be
set to indicate that no new request could be moved ahead of
these requests. For SSTF, the rest of the queue would have to be
reorganized with respect to the last of these old requests.
c. To prevent unusually long response times.
d. Paging and swapping should take priority over user requests.
It may be desirable for other kernel-initiated I/O, such as the
writing of file system metadata, to take precedence over user
I/O. If the kernel supports real-time process priorities, the I/O
requests of those processes should be favored.
12.2 Suppose that a disk drive has 5000 cylinders, numbered 0 to 4999. The
drive is currently serving a request at cylinder 143, and the previous
request was at cylinder 125. The queue of pending requests, in FIFO
order, is
86, 1470, 913, 1774, 948, 1509, 1022, 1750, 130
Starting from the current head position, what is the total distance (in
cylinders) that the disk arm moves to satisfy all the pending requests,
for each of the following disk-scheduling algorithms?
a. FCFS
b. SSTF
c. SCAN
d. LOOK
e. C-SCAN
Answer:
a. The FCFS schedule is 143, 86, 1470, 913, 1774, 948, 1509, 1022,
1750, 130. The total seek distance is 7081.
b. The SSTF schedule is 143, 130, 86, 913, 948, 1022, 1470, 1509, 1750,
1774. The total seek distance is 1745.
Exercises 83
c. The SCAN schedule is 143, 913, 948, 1022, 1470, 1509, 1750, 1774,
4999, 130, 86. The total seek distance is 9769.
d. The LOOK schedule is 143, 913, 948, 1022, 1470, 1509, 1750, 1774,
130, 86. The total seek distance is 3319.
e. The C-SCAN schedule is 143, 913, 948, 1022, 1470, 1509, 1750, 1774,
4999, 86, 130. The total seek distance is 9813.
f. (Bonus.) The C-LOOK schedule is 143, 913, 948, 1022, 1470, 1509,
1750, 1774, 86, 130. The total seek distance is 3363.
12.3 From elementary physics, we know that when an object is subjected to
a constant acceleration a, the relationship between distance d and time
t is given by d = 12
at2. Suppose that, during a seek, the disk in Exercise
12.2 accelerates the disk arm at a constant rate for the first half of the
seek, then decelerates the disk arm at the same rate for the second half
of the seek. Assume that the disk can perform a seek to an adjacent
cylinder in 1 millisecond and a full-stroke seek over all 5000 cylinders
in 18 milliseconds.
a. The distance of a seek is the number of cylinders that the head
moves. Explain why the seek time is proportional to the square
root of the seek distance.
b. Write an equation for the seek time as a function of the seek
distance. This equation should be of the form t = x + y

L,
where t is the time in milliseconds and L is the seek distance in
cylinders.
c. Calculate the total seek time for each of the schedules in Exercise
12.2. Determine which schedule is the fastest (has the smallest
total seek time).
d. The percentage speedup is the time saved divided by the original
time.What is the percentage speedup of the fastest schedule over
FCFS?
Answer:
a. Solving d = 12
at2 for t gives t = (2d/a).
b. Solve the simultaneous equations t = x + y

L that result from
(t = 1, L = 1) and (t = 18, L = 4999) to obtain t = 0.7561 +
0.2439

L.
c. The total seek times are: FCFS 65.20; SSTF 31.52; SCAN 62.02; LOOK
40.29; C-SCAN 62.10; (and C-LOOK 40.42). Thus, SSTF is fastest
here.
d. (65.20 . 31.52)/65.20 = 0.52 The percentage speedup of SSTF
over FCFS is 52%, with respect to the seek time. If we include the
overhead of rotational latency and data transfer, the percentage
speedup will be less.
12.4 Suppose that the disk in Exercise 12.3 rotates at 7200 RPM.
84 Chapter 12 Mass-Storage Structure
a. What is the average rotational latency of this disk drive?
b. What seek distance can be covered in the time that you found
for part a?
Answer:
a. 7200 rpm gives 120 rotations per second. Thus, a full rotation
takes 8.33ms, and the average rotational latency (a half rotation)
takes 4.167 ms.
b. Solving t = 0.7561 + 0.2439

L for t = 4.167 gives L = 195.58,
so we can seek over 195 tracks (about 4% of the disk) during an
average rotational latency.
12.5 Write a Java program for disk scheduling using the SCAN and C-SCAN
disk-scheduling algorithms.
Answer: Please refer to the supportingWeb site for source code solution.
12.6 Compare the performance of C-SCAN and SCAN scheduling, assuming
a uniform distribution of requests. Consider the average response
time (the time between the arrival of a request and the completion of
that requests service), the variation in response time, and the effective
bandwidth.Howdoes performance depend on the relative sizes of seek
time and rotational latency?
Answer:
There is no simple analytical argument to answer the first part of this
question. It would make a good small simulation experiment for the
students. The answer can be found in Figure 2 of Worthington et al.
[1994]. (Worthington et al. studied the LOOK algorithm, but similar
results obtain for SCAN. Figure 2 in Worthington et al. shows that CLOOK
has an average response time just a few percent higher than
LOOK but that C-LOOK has a significantly lower variance in response
time for medium and heavy workloads. The intuitive reason for the
difference in variance is that LOOK (and SCAN) tend to favor requests
near the middle cylinders, whereas the C-versions do not have this
imbalance. The intuitive reason for the slower response time of C-LOOK
is the circular seek from one end of the disk to the farthest request
at the other end. This seek satisfies no requests. It only causes a small
performance degradation because the square-root dependency of seek
time on distance implies that a long seek isnt terribly expensive by
comparison with moderate length seeks.
For the second part of the question, we observe that these algorithms
do not schedule to improve rotational latency; therefore, as seek times
decrease relative to rotational latency, the performance differences between
the algorithms will decrease.
12.7 Requests are not usually uniformly distributed. For example, a cylinder
containing the file system FAT or inodes can be expected to be accessed
more frequently than a cylinder that only contains files. Suppose you
know that 50 percent of the requests are for a small, fixed number of
cylinders.
Exercises 85
a. Would any of the scheduling algorithms discussed in this chapter
be particularly good for this case? Explain your answer.
b. Propose a disk-scheduling algorithm that gives even better performance
by taking advantage of this hot spot on the disk.
c. File systems typically find data blocks via an indirection table,
such as a FAT in DOS or inodes in UNIX. Describe one or more
ways to take advantage of this indirection to improve the disk
performance.
Answer:
a. SSTF would take greatest advantage of the situation. FCFS could
cause unnecessary head movement if references to the highdemand
cylinders were interspersed with references to cylinders
far away.
b. Here are some ideas. Place the hot data near the middle of the
disk. Modify SSTF to prevent starvation. Add the policy that if
the disk becomes idle for more than, say, 50 ms, the operating
system generates an anticipatory seek to the hot region, since the
next request is more likely to be there.
c. Cache the metadata in primary memory, and locate a files data
and metadata in close physical proximity on the disk. (UNIX
accomplishes the latter goal by allocating data and metadata in
regions called cylinder groups.)
12.8 Could a RAID Level 1 organization achieve better performance for read
requests than a RAID Level 0 organization (with nonredundant striping
of data)? If so, how?
Answer: Yes, a RAID Level 1 organization could achieve better performance
for read requests. When a read operation is performed, a RAID
Level 1 system can decide which of the two copies of the block should
be accessed to satisfy the request. This choice could be based on the
current location of the disk head and could therefore result in performance
optimizations by choosing a disk head which is closer to the
target data.
12.9 Consider a RAID Level 5 organization comprising five disks, with the
parity for sets of four blocks on four disks stored on the fifth disk. How
many blocks are accessed in order to perform the following?
a. A write of one block of data
b. A write of seven continuous blocks of data
Answer: 1) A write of one block of data requires the following: read
of the parity block, read of the old data stored in the target block,
computation of the new parity based on the differences between the
new and old contents of the target block, and the write of the parity
block and the target block. 2) Assume that the seven contiguous blocks
begin at a four-block boundary. A write of seven contiguous blocks
of data could be performed by writing the seven contiguous blocks,
86 Chapter 12 Mass-Storage Structure
writing the parity block of the first four blocks, reading the eight block,
computing the parity for the next set of four blocks and writing the
corresponding parity block onto disk.
12.10 Compare the throughput achieved by a RAID Level 5 organization with
that achieved by a RAID Level 1 organization for the following:
a. Read operations on single blocks
b. Read operations on multiple contiguous blocks
Answer: 1) The amount of throughput depends on the number of
disks in the RAID system. A RAID Level 5 comprising of a parity block
for every set of four blocks spread over five disks can support four
to five operations simultaneously. A RAID Level 1 comprising of two
disks can support two simultaneous operations. Of course, there is
greater flexibility in RAID Level 1 as to which copy of a block could be
accessed and that could provide performance benefits by taking into
account position of disk head. 2) RAID Level 5 organization achieves
greater bandwidth for accesses to multiple contiguous blocks since the
adjacent blocks could be simultaneously accessed. Such bandwidth
improvements are not possible in RAID Level 1.
12.11 Compare the performance ofwrite operations achieved by a RAID Level
5 organization with that achieved by a RAID Level 1 organization.
Answer: RAID Level 1 organization can perform writes by simply
issuing the writes to mirrored data concurrently. RAID Level 5, on the
other hand, would require the old contents of the parity block to be
read before it is updated based on the new contents of the target block.
This results in more overhead for the write operations on a RAID Level
5 system.
12.12 Assume that you have a mixed configuration comprising disks organized
as RAID Level 1 and as RAID Level 5 disks. Assume that the system
has flexibility in deciding which disk organization to use for storing a
particular file. Which files should be stored in the RAID Level 1 disks
and which in the RAID Level 5 disks in order to optimize performance?
Answer: Frequently updated data need to be stored on RAID Level
1 disks while data which is more frequently read as opposed to being
written should be stored in RAID Level 5 disks.
12.13 Is there any way to implement truly stable storage? Explain your answer.
Answer: Truly stable storage would never lose data. The fundamental
technique for stable storage is to maintain multiple copies of the data,
so that if one copy is destroyed, some other copy is still available for
use. But for any scheme, we can imagine a large enough disaster that
all copies are destroyed.
12.14 The reliability of a hard-disk drive is typically described in terms of a
quantity called mean time between failures (MTBF). Although this quantity
is called a time, the MTBF actually is measured in drive-hours per
failure.
Exercises 87
a. If a disk farm contains 1000 drives, each of which has a 750,000
hour MTBF, which of the following best describes how often a
drive failure will occur in that disk farm: once per thousand
years, once per century, once per decade, once per year, once per
month, once per week, once per day, once per hour, once per
minute, or once per second?
b. Mortality statistics indicate that, on the average, a U.S. resident
has about 1 chance in 1000 of dying between ages 20 and 21 years.
Deduce the MTBF hours for 20 year olds. Convert this figure from
hours to years.What does this MTBF tell you about the expected
lifetime of a 20 year old?
c. The manufacturer claims a 1-million hour MTBF for a certain
model of disk drive. What can you say about the number of
years that one of those drives can be expected to last?
Answer:
a. 750,000 drive-hours per failure divided by 1000 drives gives 750
hours per failure.about 31 days or once per month.
b. The human-hours per failure is 8760 (hours in a year) divided
by 0.001 failure, giving a value of 8,760,000 hours for the MTBF.
8760,000 hours equals 1000 years. This tells us nothing about the
expected lifetime of a person of age 20.
c. The MTBF tells nothing about the expected lifetime. Hard disk
drives are generally designed to have a lifetime of 5 years. If such
a drive truly has a million-hour MTBF, it is very unlikely that the
drive will fail during its expected lifetime.
12.15 Discuss the relative advantages and disadvantages of sector sparing
and sector slipping.
Answer:
Sector sparing can cause an extra track switch and rotational latency,
causing an unlucky request to require an extra 8 ms of time. Sector
slipping has less impact during future reading, but at sector remapping
time it can require the reading and writing of an entire tracks worth of
data to slip the sectors past the bad spot.
12.16 Discuss the reasons why the operating system might require accurate
information on how blocks are stored on a disk. How could the operating
system improve file system performance with this knowledge?
Answer: While allocating blocks for a file, the operating system could
allocate blocks that are geometrically close by on the disk if it hadmore
information regarding the physical location of the blocks on the disk.
In particular, it could allocate a block of data and then allocate the
second block of data in the same cylinder but on a different surface at a
rotationally optimal place so that the access to the next block could be
made with minimal cost.
12.17 The operating system generally treats removable disks as shared file
systems but assigns a tape drive to only one application at a time.
88 Chapter 12 Mass-Storage Structure
Give three reasons that could explain this difference in treatment of
disks and tapes. Describe additional features that would be required
of the operating system to support shared file-system access to a tape
jukebox. Would the applications sharing the tape jukebox need any
special properties, or could they use the files as though the files were
disk-resident? Explain your answer.
Answer:
a. Disks have fast random-access times, so they give good performance
for interleaved access streams. By contrast, tapes have
high positioning times. Consequently, if two users attempt to
share a tape drive for reading, the drive will spend most of its
time switching tapes and positioning to the desired data, and relatively
little time performing data transfers. This performance
problem is similar to the thrashing of a virtual memory system
that has insufficient physical memory.
b. Tape cartridges are removable. The owner of the data may wish
to store the cartridge off-site (far away from the computer) to
keep a copy of the data safe from a fire at the location of the
computer.
c. Tape cartridges are often used to send large volumes of data
from a producer of data to the consumer. Such a tape cartridge
is reserved for that particular data transfer and cannot be used
for general-purpose shared storage space.
To support shared file-system access to a tape jukebox, the operating
system would need to perform the usual file-system duties, including
. Manage a file-system name space over the collection of tapes
. Perform space allocation
. Schedule the I/O operations
The applications that access a tape-resident file system would need to
be tolerant of lengthy delays. For improved performance, it would be
desirable for the applications to be able to disclose a large number of
I/O operations so that the tape-scheduling algorithms could generate
efficient schedules.
12.18 What would be the effect on cost and performance if tape storage were
to achieve the same areal density as disk storage? (Areal density is the
number of gigabits per square inch.)
Answer: To achieve the same areal density as a magnetic disk, the areal
density of a tape would need to improve by two orders of magnitude.
This would cause tape storage to be much cheaper than disk storage.
The storage capacity of a tape would increase to more than 1 terabyte,
which could enable a single tape to replace a jukebox of tapes in todays
technology, further reducing the cost. The areal density has no direct
bearing on the data transfer rate, but the higher capacity per tape might
reduce the overhead of tape switching.
Exercises 89
12.19 You can use simple estimates to compare the cost and performance
of a terabyte storage system made entirely from disks with one that
incorporates tertiary storage. Suppose that magnetic disks each hold
10 gigabytes, cost $1000, transfer 5 megabytes per second, and have an
average access latency of 15 milliseconds. Suppose that a tape library
costs $10 per gigabyte, transfers 10 megabytes per second, and has
an average access latency of 20 seconds. Compute the total cost, the
maximum total data rate, and the average waiting time for a pure disk
system. If you make any assumptions about the workload, describe
and justify them.Now, suppose that 5 percent of the data are frequently
used, so theymust reside on disk, but the other 95 percent are archived
in the tape library. Further suppose that 95 percent of the requests are
handled by the disk system and the other 5 percent are handled by the
library. What are the total cost, the maximum total data rate, and the
average waiting time for this hierarchical storage system?
Answer: First lets consider the pure disk system. One terabyte is
1024 GB. To be correct, we need 103 disks at 10 GB each. But since
this question is about approximations, we will simplify the arithmetic
by rounding off the numbers. The pure disk system will have 100
drives. The cost of the disk drives would be $100,000, plus about 20%
for cables, power supplies, and enclosures, i.e., around $120,000. The
aggregate data rate would be 100  5 MB/s, or 500 MB/s. The average
waiting time depends on the workload. Suppose that the requests are
for transfers of size 8 KB, and suppose that the requests are randomly
distributed over the disk drives. If the system is lightly loaded, a typical
request will arrive at an idle disk, so the response time will be 15 ms
access time plus about 2 ms transfer time. If the system is heavily
loaded, the delay will increase, roughly in proportion to the queue
length.
Now lets consider the hierarchical storage system. The total disk space
required is 5% of 1 TB, which is 50 GB. Consequently, we need 5 disks,
so the cost of the disk storage is $5,000 (plus 20%, i.e., $6,000). The cost
of the 950 GB tape library is $9500. Thus the total storage cost is $15,500.
The maximum total data rate depends on the number of drives in the
tape library.We suppose there is only 1 drive. Then the aggregate data
rate is 6  10 MB/s, i.e., 60 MB/s. For a lightly loaded system, 95% of
the requests will be satisfied by the disks with a delay of about 17 ms.
The other 5% of the requests will be satisfied by the tape library, with a
delay of slightly more than 20 seconds. Thus the average delay will be
(950.017+520)/100, or about 1 second. Even with an empty request
queue at the tape library, the latency of the tape drive is responsible
for almost all of the systems response latency, because 1/20th of the
workload is sent to a device that has a 20 second latency. If the system
is more heavily loaded, the average delay will increase in proportion
to the length of the queue of requests waiting for service from the tape
drive.
The hierarchical system is much cheaper. For the 95% of the requests
that are served by the disks, the performance is as good as a pure-disk
system. But the maximum data rate of the hierarchical system is much
worse than for the pure-disk system, as is the average response time.
90 Chapter 12 Mass-Storage Structure
12.20 Imagine that a holographic storage drive has been invented. Suppose
that a holographic drive costs $10,000 and has an average access time
of 40 milliseconds. Suppose that it uses a $100 cartridge the size of
a CD. This cartridge holds 40,000 images, and each image is a square
black-and-white picture with resolution 6000  6000 pixels (each pixel
stores 1 bit). Suppose that the drive can read or write 1 picture in 1
millisecond. Answer the following questions.
a. What would be some good uses for this device?
b. How would this device affect the I/O performance of a computing
system?
c. Which other kinds of storage devices, if any, would become obsolete
as a result of this device being invented?
Answer: First, calculate performance of the device. 60006000 bits per
millisecond = 4394KB per millisecond = 4291 MB/sec(!). Clearly this
is orders of magnitude greater than current hard disk technology, as
the best production hard drives do less than 40MB/sec. The following
answers assume that the device cannot store data in smaller chunks
than 4MB.
a. This device would find great demand in the storage of images,
audio files, and other digital media.
b. Assuming that interconnection speed to this device would equal
its throughput ability (that is, the other components of the system
could keep it fed), large-scale digital load and store performance
would be greaty enchanced. Manipulation time of the digital
object would stay the same of course. The result would be greatly
enhanced overall performance.
c. Currently, objects of that size are stored on optical media, tape
media, and disk media. Presumably, demand for those would
decrease as the holographic storage became available. There are
likely to be uses for all of those media even in the presence of
holographic storage, so it is unlikely that any would become
obsolete. Hard disks would still be used for random access to
smaller items (such as user files). Tapes would still be used for
off-site, archiving, and disaster recovery uses, and optical disks
(CDRWfor instance) for easy interchange with other computers,
and low cost bulk storage.
Depending on the size of the holographic device, and its power
requirements, itwould also find use in replacing solid statememory
for digital cameras,MP3 players, and hand-held computers.
12.21 Suppose that a one-sided 5.25-inch optical-disk cartridge has an areal
density of 1 gigabit per square inch. Suppose that a magnetic tape has
an areal density of 20 megabits per square inch, and is 1/2 inchwide and
1800 feet long. Calculate an estimate of the storage capacities of these
two kinds of storage cartridges. Suppose that an optical tape exists that
has the same physical size as the tape, but the same storage density
as the optical disk. What volume of data could the optical tape hold?
Exercises 91
What would be a marketable price for the optical tape if the magnetic
tape cost $25?
Answer: The area of a 5.25 inch disk is about 19.625 square inches.
If we suppose that the diameter of the spindle hub is 1.5 inches, the
hub occupies an area of about 1.77 square inches, leaving 17.86 square
inches for data storage. Therefore, we estimate the storage capacity of
the optical disk to be 2.2 gigabytes.
The surface area of the tape is 10,800 square inches, so its storage
capacity is about 26 gigabytes.
If the 10,800 square inches of tape had a storage density of 1 gigabit per
square inch, the capacity of the tape would be about 1,350 gigabytes, or
1.3 terabytes. If we charge the same price per gigabyte for the optical
tape as for magnetic tape, the optical tape cartridge would cost about
50 times more than the magnetic tape, i.e., $1,250.
12.22 Discuss how an operating system could maintain a free-space list for a
tape-resident file system. Assume that the tape technology is appendonly,
and that it uses the EOT mark and locate, space, and read position
commands as described in Section 12.9.2.1.
Answer: Since this tape technology is append-only, all the free space
is at the end of the tape. The location of this free space does not need
to be stored at all, because the space command can be used to position
to the EOT mark. The amount of available free space after the EOT mark
can be represented by a single number. It may be desirable to maintain
a second number to represent the amount of space occupied by files
that have been logically deleted (but their space has not been reclaimed
since the tape is append-only) so that we can decidewhen itwould pay
to copy the nondeleted files to a new tape in order to reclaim the old
tape for reuse.We can store the free and deleted space numbers on disk
for easy access. Another copy of these numbers can be stored at the end
of the tape as the last data block. We can overwrite this last data block
when we allocate new storage on the tape.
C1HA3P T E R
I/O Systems
The role of the operating system in computer I/O is to manage and control I/O
operations and I/O devices. Although related topics appear in other chapters,
herewe bring together the pieces to paint a complete picture. In this chapterwe
describe I/O Structure, Devices, Device Drivers, Caching, and Terminal I/O.
Exercises
13.1 When multiple interrupts from different devices appear at about the
same time, a priority scheme could be used to determine the order in
which the interrupts would be serviced. Discuss what issues need to
be considered in assigning priorities to different interrupts.
Answer: A number of issues need to be considered in order to determine
the priority scheme to beused todetermine the order inwhich the
interrupts need to be serviced. First, interrupts raised by devices should
be given higher priority than traps generated by the user program; a
device interrupt can therefore interrupt code used for handling system
calls. Second, interrupts that control devicesmight be given higher priority
than interrupts that simply perform tasks such as copying data
served up a device to user/kernel buffers, since such tasks can always
be delayed. Third, devices that have realtime constraints on when its
data is handled should be given higher priority than other devices.
Also, devices that do not have any form of buffering for its data would
have to be assigned higher priority since the data could be available
only for a short period of time.
13.2 What are the advantages and disadvantages of supporting memorymapped
I/O to device control registers?
Answer: The advantage of supportingmemory-mapped I/O to device
control registers is that it eliminates the need for special I/O instructions
from the instruction set and therefore also does not require the
enforcement of protection rules that prevent user programs from ex-
93
94 Chapter 13 I/O Systems
ecuting these I/O instructions. The disadvantage is that the resulting
flexibility needs to be handled with care; the memory translation units
need to ensure that the memory addresses associated with the device
control registers are not accessible by user programs in order to ensure
protection.
13.3 Consider the following I/O scenarios on a single-user PC.
a. A mouse used with a graphical user interface
b. A tape drive on a multitasking operating system (assume no
device preallocation is available)
c. A disk drive containing user files
d. A graphics card with direct bus connection, accessible through
memory-mapped I/O
For each of these I/O scenarios, would you design the operating system
to use buffering, spooling, caching, or a combination? Would you use
polled I/O, or interrupt-driven I/O? Give reasons for your choices.
Answer:
a. A mouse used with a graphical user interface
Buffering may be needed to record mouse movement during
times when higher-priority operations are taking place. Spooling
and caching are inappropriate. Interrupt driven I/O is most
appropriate.
b. A tape drive on a multitasking operating system (assume no
device preallocation is available)
Buffering may be needed to manage throughput difference between
the tape drive and the source or destination of the I/O,
Caching can be used to hold copies of data that resides on the
tape, for faster access. Spooling could be used to stage data to
the device when multiple users desire to read fromor write to it.
Interrupt driven I/O is likely to allow the best performance.
c. A disk drive containing user files
Buffering can be used to hold data while in transit from user
space to the disk, and visa versa. Caching can be used to hold
disk-resident data for improved performance. Spooling is not
necessary because disks are shared-access devices. Interruptdriven
I/O is best for devices such as disks that transfer data
at slow rates.
d. A graphics card with direct bus connection, accessible through
memory-mapped I/O
Buffering may be needed to control multiple access and for performance
(double-buffering can be used to hold the next screen
image while displaying the current one). Caching and spooling
are not necessary due to the fast and shared-access natures of
the device. Polling and interrupts are only useful for input and
for I/O completion detection, neither of which is needed for a
memory-mapped device.
Exercises 95
13.4 In most multiprogrammed systems, user programs access memory
through virtual addresses, while the operating system uses raw physical
addresses to access memory.What are the implications of this design
on the initiation of I/O operations by the user program and their execution
by the operating system?
Answer: The user program typically specifies a buffer for data to be
transmitted to or from a device. This buffer exists in user space and
is specified by a virtual address. The kernel needs to issue the I/O
operation and needs to copy data between the user buffer and its own
kernel buffer before or after the I/Ooperation. In order to access the user
buffer, the kernel needs to translate the virtual address provided by the
user program to the corresponding physical addresswithin the context
of the user programs virtual address space. This translation is typically
performed in software and therefore incurs overhead. Also, if the user
buffer is not currently present in physical memory, the corresponding
page(s) need to obtained from the swap space. This operation might
require careful handling and might delay the data copy operation.
13.5 What are the various kinds of performance overheads associated with
servicing an interrupt?
Answer: When an interrupt occurs the currently executing process
is interrupts and its state is stored in the appropriate process control
block. The interrupt service routine is then dispatch in order to deal
with the interrupt. On completion of handling of the interrupt, the
state of the process is restored and the process is resumed. Therefore, the
performance overheads include the cost of saving and restoring process
state and the cost of flushing the instruction pipeline and restoring the
instructions into the pipeline when the process is restarted.
13.6 Describe three circumstances underwhich blocking I/O should be used.
Describe three circumstances under which nonblocking I/O should be
used. Why not just implement nonblocking I/O and have processes
busy-wait until their device is ready?
Answer: Generally, blocking I/O is appropriate when the process will
only bewaiting for one specific event. Examples include a disk, tape, or
keyboard read by an application program. Non-blocking I/O is useful
when I/O may come from more than one source and the order of the
I/O arrival is not predetermined. Examples include network daemons
listening to more than one network socket, window managers that accept
mousemovement aswell as keyboard input, and I/O-management
programs, such as a copy command that copies data between I/O devices.
In the last case, the program could optimize its performance by
buffering the input and output and using non-blocking I/O to keep
both devices fully occupied.
Non-blocking I/O is more complicated for programmers, because of
the asynchonous rendezvous that is needed when an I/O occurs. Also,
busy waiting is less efficient than interrupt-driven I/O so the overall
system performance would decrease.
13.7 Typically, at the completion of a device I/O, a single interrupt is raised
and appropriately handled by the host processor. In certain settings,
96 Chapter 13 I/O Systems
however, the code that is to be executed at the completion of the I/O
can be broken into two separate pieces, one of which executes immediately
after the I/O completes and schedules a second interrupt for
the remaining piece of code to be executed at a later time. What is the
purpose of using this strategy in the design of interrupt handlers?
Answer: The purpose of this strategy is to ensure that themost critical
aspect of the interrupt handling code is performed first and the less
critical portions of the code is delayed for the future. For instance,
when a device finishes an I/O operation, the device control operations
corresponding to declaring the device as no longer being busy are
more important in order to issue future operations. However, the task
of copying the data provided by the device to the appropriate user or
kernel memory regions can be delayed for a future point when the CPU
is idle. In such a scenario, a latter lower priority interrupt handler is
used to perform the copy operation.
13.8 Some DMA controllers support direct virtualmemory access,where the
targets of I/O operations are specified as virtual addresses and a translation
from virtual to physical address is performed during the DMA.
How does this design complicate the design of the DMA controller?
What are the advantages of providing such a functionality?
Answer: Direct virtual memory access allows a device to perform a
transfer from two memory-mapped devices without the intervention
of the CPU or the use of main memory as a staging ground; the device
simply issue memory operations to the memory-mapped addresses of
a target device and the ensuing virtual address translation guarantees
that the data is transferred to the appropriate device. This functionality,
however, comes at the cost of having to support virtual address translation
on addresses accessed by a DMA controller and requires the addition
of an address translation unit to the DMA controller. The address
translation results in both hardware and software costs and might also
result in coherence problems between the data structures maintained
by the CPU for address translation and corresponding structures used
by the DMA controller. These coherence issues would also need to be
dealt with and results in further increase in system complexity.
13.9 UNIX coordinates the activities of the kernel I/O components by manipulating
shared in-kernel data structures, whereasWindows NT uses
object-oriented message passing between kernel I/O components. Discuss
three pros and three cons of each approach.
Answer: Three pros of the UNIX method: Very efficient, low overhead
and low amount of data movement
Fast implementation.no coordination neededwith other kernel components
Simple, so less chance of data loss
Three cons: No data protection, and more possible side-effects from
changes so more difficult to debug
Difficult to implement new I/O methods: new data structures needed
rather than just new objects
Complicated kernel I/O subsystem, full of data structures, access routines,
and locking mechanisms
Exercises 97
13.10 Write (in pseudocode) an implementation of virtual clocks, including
the queuing and management of timer requests for the kernel and
applications. Assume that the hardware provides three timer channels.
Answer: Each channel would run the following algorithm:
/** data definitions **/
// a list of interrupts sorted by earliest-time-first order
List interruptList
// the list that associates a request with an entry in interruptList
List requestList
// an interrupt-based timer
Timer timer
while (true) {
/** Get the next earliest time in the list **/
timer.setTime = interruptList.next();
/** An interrupt will occur at time timer.setTime **/
/** now wait for the timer interrupt
i.e. for the timer to expire **/
notify( requestList.next() );
}
13.11 Discuss the advantages and disadvantages of guaranteeing reliable
transfer of data between modules in the STREAMS abstraction.
Answer: Reliable transfer of data requires modules to check whether
space is available on the target module and to block the sending module
if space is not available. This check requires extra communication
between the modules, but the overhead enables the system to provide
a stronger abstraction than one which does not guarantee reliable
transfer. The stronger abstraction typically reduces the complexity of
the code in the modules. In the STREAMS abstraction, however, there
is unreliability introduced by the driver end, which is allowed to drop
messages if the corresponding device cannot handle the incoming data.
Consequently, even if there is reliable transfer of data between themodules,
messages could be dropped at the device if the corresponding
buffer fills up. This requires retransmission of data and special code for
handling such retransmissions, thereby somewhat limiting the advantages
that are associated with reliable transfer between the modules.