CMPSC 473 - Project #1 - Interprocess Communication

Due Date: February 3, 2009. 100 points

Single person project. Do your own work!

In this project, you will learn about various forms of interprocess communication (IPC) available to application processes. You will build IPC channels UNIX pipes, Linux IPC, and via shared memory, and enable two processes (a sender and a receiver) to communicate a file via these IPC channels.

The system consists of a single program that creates two processes, a parent and a child where the child is created via the fork system call. The program causes a parent process to read a file's contents and send them all to the child process via an IPC channel. The child process writes the received contents to another file.

In this project, you will have to use a variety of Linux system calls to open/read/write files, to use pipes, to use Linux IPC, and to use Linux shared memory. You will also have to build/use a library that provides abstraction of pipes over shared memory (you will have a fair bit of guidance for this).

All versions of the program work as follows: type program-name input-file output-file at the prompt. The input files will be provided. The output file created must match the input file (diff must report no difference).

Conceptually, the tasks in the project are straightforward given the directions below, but there are a number of places that you can make mistakes. In particular, you will need to understand how to use the system concepts correctly or various errors can occur. You should use gdb to debug these problems. There are ways to "attach" the debugger to either the parent or the child.

The project will consist of the following tasks:

Download the following tarball Project 1 Code to your CSE account file space. You should have one file p1-ipc.tgz. There is a Makefile. You should familiarize yourself with its format and using the make command. You can build all the programs using the command make at the prompt or build individual programs, via make cse473-p1, make cse473-ipc, and make cse473-pipe.
Implement a pipe-based IPC mechanism between the parent and the child (part 1). First, create a pipe to share between the parent and child. Read up on the meaning of a pipe (man pipe and via the Internet). A pipe is a bidirectional communication mechanism, but we will only use one direction; the parent will write to the pipe, and the child will read from the pipe.
This version is to be deployed in cse473-p1.c. You will need to implement the following functions. See the function main (provided) for the overall flow. We will use the same main for all three parts.
- Create the pipe via the pipe system call in create_ipc. Creating a pipe creates 2 file descriptors, the first to read from the pipe and the second to write to the pipe. You will return the pipe index to the main function via arguments read and write.
- After the fork, the parent will call setup_ipc_parent. This function needs to redirect stdout (file descriptor == 1, in a #define STDOUT) to refer to the write-side of the pipe (write). Return the value of stdout via the variable new. You must also close the read-side of the pipe (read), in the parent.
- Also, the child will call setup_ipc_child. Here, the child will redirect stdin (file descriptor == 0, in a #define STDIN) to refer to the read-side of the pipe (read). Return the value of stdin via the variable new. You must also close the write-side of the pipe (write).
- You will then construct send_file (running in the parent) to read the input-file (first argument on command line) and send it via stdout. You can use the stat system call to get the size of the file. You will need to use the file system calls to read the file. You may send the entire file in one go.
- You will then construct rcv_file (running in the child) to read the data from stdin and write to the output-file (second argument). You must account for potential race conditions by reading multiple times (in case the child reads before the parent writes). The child must terminate after writing the complete file (the last byte is the special character EOF) . Again, you will need to use file system calls to write the data to the file.
- Test files are here
- HINT: A big hint for how to do setup these pipes is given in pipes overview. Your task will be to adapt this implementation to the specified interface (cannot change this), to use the file system calls to manipulate the input and output files, and detect that the file transfer is complete (check for EOF).
In the second task (part 2), implement a similar mechanism (parent send's file data to child) using Linux IPC. This version will be implemented in file cse473-p1-ipc.c. You will need to learn the Linux IPC system. See Linux IPC for information. The implementation for Linux IPC consists of the following steps:
- Create the Linux IPC object in create_ipc. You will need to study Linux IPC to determine the system calls to use. A single index to an IPC object is created that you will use for reading and writing (i.e., assign the same index to the arguments read and write.
- After the fork, the parent and child will call setup_ipc_parent/child as above. As we already have an IPC index, just assign the value to the variable new in each.
- You will then construct send_file (running in the parent) to read the input-file (first argument on command line) and send it via the IPC channel. In this case, the IPC buffer is limited (see the structure msgbuf_t used to send IPCs), so you will send the file data one line at a time (I promise not to make the lines too long).
  
  Unlike above, you will need to tell the child when the last message has been sent, so it can terminate. The IPC message types are used for this. Use MTYPE for file data messages and MEND for the termination message.
- You will then construct rcv_file to read the data from the IPC channel and write to the output-file (second argument). Unlike part 1, there will need to be multiple reads. Further, you will need to distinguish between two types of IPC messages: (1) MTYPE, which tells the client that this is file data and (2) MEND, which tells the client that there is no more data, so the client can terminate. Again, you will need to use file system calls to write the data to the file (basically same as part 1).
- Test files are same as for part one -- here
  test.
- HINT: To read each line, I wrote a little readline that uses fgetc to get a character at a time and checks for the end of the line (character '\n') and EOF to take appropriate action. Check the header file cse473-p1.h for how to use it.
- HINT2: I found that leftover information in the file read buffer and the message buffer msg.text could cause some bugs, so I zero these on each line using memset.
- HINT3: If the reader (child in rcv_file) is ahead of the writer (parent in send_file), then the reader may spin on reading. So, if there is no message ready (either MTYPE or MEND), then I do a sched_yield which stops the child and starts the parent. An example of this is given in the part 3 code shell.
In the third task (part 3), we will implement yet another IPC mechanism, this time based on Linux shared memory. This version will be implemented in file cse473-p1-shm.c. Unlike the first two tasks where we use the system primitives directly, we will build a new pipe interface that uses Linux shared memory. Therefore, you will implement an IPC library called mypipe in cse473-p1-mypipe.c that you use in cse473-p1-shm.c. The semantics of mypipe are really IPC, as there is one index to read and write (like part 2). You will need to learn the Linux shared memory. See Linux IPC for information.
The implementation of the mypipe library consists of the following steps:
- Write the function mypipe_init. Here, you need to instantiate the shared memory region. This will generate an index to identify that region. Again, there is only one identifier for the shared memory object, so please return that in both read and write.
- Write the function mypipe_attach. This function will be called by both the child and the parent from their respective setup_ipc functions. You will need to identify the Linux shared memory function that assigns a shared memory region to an address space. Note that the region may be placed in a different location in the child and in the parent, but it will refer to the same memory. Return index 0, which will refer to our one and only pipe, stored in the variable only_pipe. The variable must be assigned separately in the parent and the chlld (i.e., even though it is a global variable, the parent and child will maintain distinct values after the fork).
  
  The structure for accessing and managing shared memory is defined by the struct mypipe data structure. The fields for managing the data come first. Followed by a small buffer. Since the buffer size is limited, when the writer (parent) gets to the end, it continues at the beginning. I provide you with code to prevent the child from reading beyond where the parent has written and to prevent the parent from overwriting data that the child has not yet read (at the beginning of rcv_file and send_file, respectively). ). Each yields the processor when this occurs, so you can write your code assuming that all writes will be legal (although you'll have to update read and write values in only_pipe to account for wrap-around).
  
  Now about the fields in struct mypipe. There is a read field, which tells both the parent and child how far the child has read. NOTE: Only the child should update the read field. There is a write field, which tells both the parent and child how far the parent has written. NOTE: Only the parent should update the write field. The size field says how big the buffer is, and the shm is the buffer where the data will be read and written.
- Write the function mypipe_read. A pipe in shared memory is represented by a single buffer with two indices read and write that maintain the state of a buffer of size SHM_SZ. read shows the place in the buffer to which the child has read, and write shows the place in the buffer to which the parent has written. Since the buffer is of finite size, the read and write pointers may "wrap-around" the buffer to get back to the beginning. Your code needs to account for this.
- Write the function mypipe_write. This is the write operation for the pipe, which copies data and updates the write pointer. It must address the possibility of the buffer wrapping as well.
- Test files are same as for part one -- here
  test.
- HINT: Like the IPC case, I also strongly recommend using memset to clear your buffers (do not clear only_pipe buffer.
- HINT 2: Compute your read and write pointers following wrap-around using the following code (for read): only_pipe->read = ( read + bytes_read ) % only_pipe->size;, where read is the read index prior to this read and bytes_read is how many bytes have been read in this pass. For only_pipe->write, just replace the read value with the write index.
- HINT 3: Please be sure that you only update only_pipe->read in the child and only update only_pipe->write in the parent.
The implementation for Linux shared memory IPC using mypipe consists of the following steps. You must use the mypipe interface defined in cse473-p1-mypipe.c.
- Create the mypipe object in create_ipc by calling mypipe_init. You will return an index to the IPC object via argument index.
- After the fork, the parent and child will call setup_ipc_parent/child as before. You will need to call mypipe_attach to attach the pipe in each address space. Assign new in each.
- You will then construct send_file (running in the parent) to read the input-file (first argument on command line) and send it via the mypipe channel. As in the second task, the IPC buffer is limited, so you will send the file data one line at a time (I promise not to make the lines too long).
  
  You will need to tell the child when the last message has been sent by sending an EOF character at the end of the last write. The child must look for that character, not write it to the file, and terminate.
- You will then construct rcv_file to read the data from mypipe and write to the output-file (second argument). Like the second task, there will need to be multiple reads. Further, you will need to detect the last byte written by looking for EOF in the last byte of each read. The child must write the entire file, with no extra characters, and terminate.
- Later, we will provide the sample files for you to test.
Please answer the following questions regarding the project:
1. Given the storage hierarchy on slide 31 of the 1/13 lecture, what would be the level in the hierarchy that would provide the best IPC performance? Explain as precisely as you can why we don't use that level to implement IPC.
2. In the project Makefile, list the specifications that are used to build the mypipe library and link it into the executable file cse473-mypipe.
3. In the Linux IPC implementation (the second task), what happens to your receiver (child) when it asks to receive a message, but the sender (parent) has not yet sent a message? Why is this the best approach from a performance standpoint?
4. The function time measures the overall time, user-space time, and system computing time for a process. Run time for each version of the IPC program using a file called (to be provided), and include the results. Based on the measurements, use the user and system times to explain the differences in performance among the three implementations.
Your submission will consist of three things: (1) A tarball of your code, made from make tar; (2) the performance results from running htest in question 4 above; and (3) answers to the project questions.
Grading:
- Correct solution for task one: 20 points
- Correct solution for task two: 25 points
- Correct solution for task three: 30 points
- Correct submission: 5 points
- Correct answers to questions: 20 points

Trent Jaeger