153: lab5

New: Your cs153 directory is /class/cs153/cs153_07win/<login>

Building the Kernel: starting from 2.6.9, an up-to-date UML is included in the kernel itself, so we can build UML without any patch.
- Do it at school:
  1. Make sure you are in your cs153 directory:
    cd /class/cs153/cs153_07win/<login>
  2. Untar the kernel source code into your directory:
    tar zxvf /class/cs153/cs153_07win/lgao/linux-2.6.18-uml.tar.gz
  3. Go to the kernel source directory:
    cd linux-2.6.18
  4. Compile the kernel:
    ./compileuml
  5. Run the kernel:
    ./linux
  6. Now you get the login prompt as root.
- Do it at home:
  1. Download the 2.6.18 kernel and uncompress it:
    tar xjvf linux-2.6.18.tar.bz2
  2. Download the config file and save it to the top kernel source directory as .config:
    cp kernel32-2.6.18.config linux-2.6.18/.config
  3. Compile the kernel:
    make oldconfig ARCH=um; make linux ARCH=um
  4. Download the filesystem image, uncompress it to the top kernel source directory, and rename it to be root_fs:
    tar xjvf DSL-2.2-root_fs.bz2; mv DSL-2.2-root_fs linux-2.6.18/root_fs
  5. Run the kernel:
    ./linux
  6. Now you get the login prompt as root.
Exiting the UML environment:
- halt or
- shutdown -h now
Accessing the host filesystem
- Create the mount point if necessary:
  mkdir /mnt/host
- Mount the appropriate file system:
  mount -t hostfs /home /mnt/host
- You should now be able to access your host file system from UML at /mnt/host
Test your own code:
- In the host OS:
  - modify sched.h and sched.c in your cs153 directory, then compile and run the UML
    ./compileuml; ./linux
  - write your own test program (create processes, specify the gid, and read /proc), compile it using -static:
    gcc -static test.c -o test
- In the guest OS: Make sure you can access the executable in your host filesystem, then run that executable.

Data Structures Used by the Scheduler

struct task_struct

Type	Name	Description
long	state	TASK_RUNNING, TASK_(UN)INTERRUPTIBLE, ...
int	prio	dynamic priority based on static_prio and sleep_avg
int	static_prio	static priority
unsigned long	rt_priority	real-time priority
unsigned long	policy	SCHED_NORMAL, SCHED_FIFO, SCHED_RR, SCHED_BATCH
unsigned int	time_slice	ticks left in the time quantum of the process
unsigned int	first_time_slice	1 if never exhasusted quantum, otherwise 0
unsigned long	sleep_avg	average sleep time
unsigned long long	timestamp	time of last context switch that it is replaced or time of last insertion in the runqueue
unsigned long long	last_ran	time of last context switch that it is replaced
struct prio_array *	array	pointer to the runqueue's priority array that inludes the process
struct list_head	run_list	pointers to the next and previous elements in the runqueue list to which the process belongs
gid_t	gid, egid, sgid	group ID of the process

struct rq

Type	Name	Description
spinlock_t	lock	Only one task can modify the runqueue at any time
unsigned long	nr_running	Number of runnable tasks in the runqueue
unsigned long	expired_timestamp	Last time a task is running out of time quantum
unsigned long long	timestamp_last_tick	time of last scheduler tick
int	best_expired_prio	The highest priority of any expired task
struct task_struct *	curr	pointer to the currently running process
struct task_struct *	idle	pointer to the idle process
struct prio_array *	active	Pointer to the lists of active processes
struct prio_array *	expired	Pointer to the lists of expired processes
struct prio_array [2]	arrays	The two sets of active and expired processes

struct prio_array

Type	Name	Description
unsigned int	nr_active	number of tasks in the array
unsigned long [5]	bitmap	priority bitmap
struct list_head [MAX_PRIO]	queue	an array of 140 priority queues (if MAX_PRIO = 140)

Question: How to understand p->array->queue + p->prio and which tasks are pointed to by p->runlist?

Functions Used by the Scheduler
- schedule()
- scheduler_tick()
- effective_prio()
How time_slice is changed?
- In sched_fork(), time_slice is shared between parent and child.
- In scheduler_tick(), time_slice is decremented, if it becomes 0, a new time_slice is calculated depending on different scheduling policies. The task might be moved around in the priority queue.
- In sched_exit(), when a process exits, time_slice is retrieved by its parent.
How static_prio is used?
- It is never changed in the kernel.
- It is used to calculate the nice value (TASK_NICE(p), TASK_USER_PRIO(p), set_user_nice()), the time slices (task_timeslice()), the interactivity (TASK_INTERACTIVE(p)), dynamic priority (__normal_prio()).
- task_timeslice() calculate the time slice values based on static_prio:
  - if static_prio < 120, it returns (140-static_prio) * 20 milliseconds
  - if static_prio >= 120, it returns (140-static_prio) * 5 milliseconds
How prio (dynamic priority) is used?
- It determines which priority array a task will be added/removed:
  Related functions: dequeue_task(), enqueue_task(), requeue_task(), enqueue_task_head()
- It is calculated based on the static_prio but is modified by bonuses/penalties according to sleep_avg:
  prio = max(100, min(static_prio - bonus + 5, 139))
  Related functions: __normal_prio(), normal_prio(), effective_prio(), recalc_task_prio()

likely/unlikely macros: defined in <include/linux/compiler.h>, used for branch prediction.

if (likely(x)) // equivalent to "if (x)"
{ A; } // A is more probable
else
{ B; }

if (unlikely(x)) // equivalent to "if (x)"
{ A; }
else
{ B; } // B is more probable

HZ/jiffies: used to measure time in Linux.
- System timers interrupt the processor at a certain frequency.
- HZ is the number of timer ticks per second, or, the frequency of timer interrupts. It is defined in <include/asm/param.h>. On x86 systems, it is set to 1000 in the 2.6 kernel, so there are 1000 timer interrupts per second, i.e., a timer interrupt happens every millisecond. n*HZ/100 is the number of timer ticks in n millisecons.
- jiffies is the number of timer interrupts since the system booted. If HZ is 1000, jiffies is incremented every millisecond, i.e., a jiffy is only 1-millisecond.
- In sched.h, MIN_TIMESLICE is defined as max(5 * HZ / 1000, 1), which is actually 5ms, DEF_TIMESLICE is defined as (100 * HZ / 1000), which is 100 milliseconds.

Focus on the scheduler code that is crucial for the assignment and ignore the other part. Files of interests are kernel/sched.c and include/linux/sched.h.
- Reuse the data structures and functions as much as possible. For example, dequeue_task(), enqueue_task(), requeue_task(), enqueue_task_head().
- Ignore evertying in #ifdef CONFIG_SMP and #endif, and #ifdef CONFIG_SMT and #endif.
Your scheduler should work together with the existing Linux scheduler, so you should add a new scheduling policy: SCHED_GFS (Group-based Fair Sharing).
- You can set the scheduling policy and the real-time priority of a task via the system call sched_setscheduler() in your test program.
- You can set the static priority via the system call nice.
- Other tasks should be scheduled using their default policies.
- Tips: search for SCHED_BATCH. It is a new policy added from 2.6.16. Processes in this class are scheduled normally, with the exception that they get no "interactivity" bonus when they sleep. Follow similar ways, you'll know how to add your own policy.
- Tips: you can let the SCHED_GFS processes have the same static priority, and ignore their dynamic priority, so that they are always put in the same priority queue and your scheduler can make the decision purely based on the time slice.
- Tips: In the schedule() function, next is the process chosen to run next, prev is the one that is running and to be replace.
For simplicity, you can assume a fixed number of groups and let each group get an equal amount of CPU time (100 ms).
- You should calculate the time slice correctly for each task within a group. time_slice is decremented on each timer interrupt, see the scheduler_tick() function.
Use a round-robin scheme to decide which group to choose. Then decide which task to choose within that group. You can also use round-robin or FIFO to choose the task.
- Search for the real-time scheduling policy SCHED_FIFO and SCHED_RR to get an idea how FIFO and RR is implemented.