Creating New CPU Schedulers with Virtual Time

格式：pdf
大小：45.20 KB
文档页数：4

下载文档原格式

华为突破技术封锁自主研发芯片英语作文

华为突破技术封锁自主研发芯片英语作文Huawei Breaks Through Technology Blockade withSelf-developed ChipsWith the advancement of technology and the increasing competition in the global market, the issue of technological blockade has become more prominent. In recent years, Huawei, a leading global provider of information and communications technology (ICT) infrastructure and smart devices, has continuously faced technological challenges and restrictions. However, Huawei has successfully broken through the technology blockade by developing its own chips.In the face of the technology blockade, Huawei has invested heavily in research and development to develop its own chips. The company has established a strong research and development team comprised of experts in various fields, including chip design, semiconductor technology, and artificial intelligence. Through their collaborative efforts, Huawei has successfully developed a series of cutting-edge chips that have not only improved the performance of its products but also reduced its dependence on foreign suppliers.One of the key achievements of Huawei in breaking through the technology blockade is the development of its Kirin series of chips. The Kirin chips are designed to provide superior performance, energy efficiency, and security for Huawei's smartphones, tablets, and other devices. By using its own chips, Huawei has been able to optimize the performance of its devices and deliver a better user experience to its customers.In addition to the Kirin chips, Huawei has also developed its own Kunpeng and Ascend series of chips for its server and artificial intelligence products. The Kunpeng chips are designed to provide high-performance computing capabilities for Huawei's server products, while the Ascend chips are designed to provide advanced AI processing capabilities for Huawei's AI products. By developing its own chips, Huawei has been able to expand its product offerings and compete more effectively in the global market.Overall, Huawei's success in breaking through the technology blockade with its self-developed chips is a testament to the company's innovation and determination. By investing in research and development and fostering a culture of continuous improvement, Huawei has been able to overcome the technological challenges it faces and emerge as a global leaderin the ICT industry. As Huawei continues to develop new technologies and products, it is poised to further strengthen its position in the global market and drive innovation in the ICT industry.。

虚拟化性能调优之cpu篇

虚拟化性能调优之cpu篇CPU优化分析主要是两个阶段，虚拟化层和宿主机层。

前期主要怀疑是虚拟化层的影响，主要的怀疑点包括：1.超线程的影响关闭超线程之后单核性能有略微提升，但多核性能反而更差，排除超线程的因素2.NUMA架构和核迁移的影响按理说如果不按照NUMA的架构来做核绑定，由于缓存和迁移的影响，或造成较大的性能损失，通过绑定物理核测试发现并没有大的提升，排除该因素3.CPU模式的影响，包括指令集和缓存分析与vmware的差异，发现我们的指令集和cpu缓存与真实物理机不一致，通过cpu-passthrough和替换qemu版本将host cpu的特性透传仍然无法提升cpu性能排除了虚拟化层的影响，后来测试发现宿主机本身才是cpu性能的关键，部署了一个redhat对比环境发现宿主机跑分和redhat未经调优过系统差距很大。

分析了内核配置参数差异（sysctl）和编译参数差异，没有发现可疑的地方。

决定内核行为的并且用户可以干预的只剩下启动参数了，对比发现系统关闭了intel的cstate功能。

写了一个简单的死循环测试对比两个系统的表现，发现redhat内核有负载的cpu频率可以提高到3.1GHz，而当前host机只能达到2.6GHz，即使调整了cpufreq的模式为performance也无法让cpu达到更高的主频。

所以基本可以确认是这个参数导致的。

打开系统中cstate功能，跑speccpu可以达到和redhat类似的性能分数。

解决措施：目前发现cstate功能和调频功能有耦合，需要使能cstate 来解决cpu性能问题，去掉启动参数intel_idle.max_cstate=0 idle=pollintel cpu调频和节能相关的几个机制简介：cpufreq：提供频率调节功能，可以让cpu根据不同负载使用不同的频率，达到性能和功耗的动态可调整，服务器一般配置为performance，个人pc可以配置为ondemand或者powersave模式cstate：cpu深度睡眠节能模式，根据cpu睡眠器件，定义了多种睡眠状态，提供不同程度的节能选择，睡眠模式越高，唤醒代价越大。

MELSOFT-Software GX IEC Developer 产品说明书

MELSOFT-Software – GX IEC Developer Powerful integrated programming toolsGX IEC Developer is more than a powerful IEC 1131.3 programming and documenation package. It supports your entire MELSEC PLC impleentation from the initial project planning to everyday operation, with a wealth of advanced functions that will help you to cut costs and increase your productivity.The sophisticated program architecture comes with a range of new, user-friendly functions, including structured programming and support for function libraries.Top-down application architectureDuring the planning phase GX IEC Developer's structuring tools help you to organise your project efficiently: Use the intuitive graphical tools to identify and display tasks, functional units, dependencies, procedures and application structures. In addition to making your work easier, this also significantly reduces error frequency in later project stages.Flexible implementationIn the engineering phase you then choose the programming language that best matches the structure of your project.Program frequently-used functions in function blocks and organise them in libraries. This gives you the confidence that comes with knowing you are using tested, reliable code. Password support helps you to protect your valuable expertise.Simple configuration of control componentsConfiguration of controller components is performed quickly and efficiently in tables with interactive dialogs and graphical support. And this powerful support is available for standard and special function modules as well as for the controller CPUs. You no longer have to create application programs to configure your system.Setting up the hardware and network configuration Powerful testing and debugging tools provide information on the current status of the controllers and the network you are connected to. Network functions like status and error displays, remote SET/RST functions for controllers and peripherals, Live List, Cycle Time, Connection State and more enable you to locate and correct errors quickly and get your hardware and networks up and running in record time.Setting up the application programGX IEC Developer comes with everything you need to get your applications installed, set up and running as quickly as possible, including comprehensive online programming functions, fast and informative monitoring displays, the ability to manipulate device values with the graphical editors, manual and automatic step mode execution in IL, the display of manipulated device values in the EDM (Entry Data Monitor) and much more.Normal operationDuring normal daily operation you can also use GX IEC Developer to display important system status information, either in stand-alone mode or called by another program in the control room.Installation and maintenanceTop-down architecture, structured programming, comprehensive printed documentation and support for user-defined help for your function blocks all help to reduce the learning curve. You can make the information needed for installing and maintaining the system available to the operators quickly and efficiently, with minimum training overheads.Key features include:•Powerful "Top-down" development environment•Total overview of PLC project and resources•Suited to large and complex projects•One programming software for modular and compact PLCs (Q/A and FX Series) •Flexible program development•Superior program documentation for easy understanding•State-of-the-art PC software technology acc. to IEC 1131.3•Programming languages FBD, AWL, KOP, AS and STC•Powerful offline simulation•Online program modification•Function blocks (FB, FC)•Libraries Minimum downtimes。

NVIDIA 动态并行ISM文档说明书

Introduction to Dynamic Parallelism Stephen JonesNVIDIA CorporationImproving ProgrammabilityDynamic Parallelism Occupancy Simplify CPU/GPU Divide Library Calls from Kernels Batching to Help Fill GPU Dynamic Load Balancing Data-Dependent ExecutionRecursive Parallel AlgorithmsWhat is Dynamic Parallelism?The ability to launch new grids from the GPUDynamicallySimultaneouslyIndependentlyCPU GPU CPU GPU Fermi: Only CPU can generate GPU work Kepler: GPU can generate work for itselfWhat Does It Mean?CPU GPU CPU GPU GPU as Co-ProcessorAutonomous, Dynamic ParallelismData-Dependent ParallelismComputationalPower allocated toregions of interestCUDA Today CUDA on KeplerDynamic Work GenerationInitial GridStatically assign conservativeworst-case gridDynamically assign performancewhere accuracy is requiredFixed GridCPU-Controlled Work Batching CPU programs limited by singlepoint of controlCan run at most 10s of threadsCPU is fully consumed withcontrolling launchesCPU Control Threaddgetf2 dgetf2 dgetf2CPU Control Threaddswap dswap dswap dtrsm dtrsm dtrsmdgemm dgemm dgemmCPU Control ThreadMultiple LU-Decomposition, Pre-KeplerCPU Control ThreadCPU Control ThreadBatching via Dynamic ParallelismMove top-level loops to GPURun thousands of independent tasksRelease CPU for other workCPU Control ThreadCPU Control ThreadGPU Control Threaddgetf2 dswap dtrsm dgemm GPU Control Thread dgetf2 dswap dtrsm dgemm GPU Control Threaddgetf2dswapdtrsmdgemmBatched LU-Decomposition, Kepler__device__ float buf[1024];__global__ void dynamic(float *data) {int tid = threadIdx.x; if(tid % 2)buf[tid/2] = data[tid]+data[tid+1]; __syncthreads();if(tid == 0) {launch<<< 128, 256 >>>(buf); cudaDeviceSynchronize(); }__syncthreads();cudaMemcpyAsync(data, buf, 1024); cudaDeviceSynchronize(); }Programming Model BasicsCode ExampleCUDA Runtime syntax & semantics__device__ float buf[1024];__global__ void dynamic(float *data) {int tid = threadIdx.x; if(tid % 2)buf[tid/2] = data[tid]+data[tid+1]; __syncthreads();if(tid == 0) {launch<<< 128, 256 >>>(buf); cudaDeviceSynchronize(); }__syncthreads();cudaMemcpyAsync(data, buf, 1024); cudaDeviceSynchronize(); }Code ExampleCUDA Runtime syntax & semanticsLaunch is per-thread__device__ float buf[1024];__global__ void dynamic(float *data) {int tid = threadIdx.x; if(tid % 2)buf[tid/2] = data[tid]+data[tid+1]; __syncthreads();if(tid == 0) {launch<<< 128, 256 >>>(buf); cudaDeviceSynchronize(); }__syncthreads();cudaMemcpyAsync(data, buf, 1024); cudaDeviceSynchronize(); }Code ExampleCUDA Runtime syntax & semanticsLaunch is per-threadSync includes all launches by any thread in the block__device__ float buf[1024];__global__ void dynamic(float *data) {int tid = threadIdx.x; if(tid % 2)buf[tid/2] = data[tid]+data[tid+1]; __syncthreads();if(tid == 0) {launch<<< 128, 256 >>>(buf); cudaDeviceSynchronize(); }__syncthreads();cudaMemcpyAsync(data, buf, 1024); cudaDeviceSynchronize(); }CUDA Runtime syntax & semanticsLaunch is per-threadSync includes all launches by any thread in the blockcudaDeviceSynchronize() does not imply syncthreadsCode Example__device__ float buf[1024];__global__ void dynamic(float *data) {int tid = threadIdx.x; if(tid % 2)buf[tid/2] = data[tid]+data[tid+1]; __syncthreads();if(tid == 0) {launch<<< 128, 256 >>>(buf); cudaDeviceSynchronize(); }__syncthreads();cudaMemcpyAsync(data, buf, 1024); cudaDeviceSynchronize(); }Code ExampleCUDA Runtime syntax & semanticsLaunch is per-threadSync includes all launches by any thread in the blockcudaDeviceSynchronize() does not imply syncthreadsAsynchronous launches only__device__ float buf[1024];__global__ void dynamic(float *data) {int tid = threadIdx.x; if(tid % 2)buf[tid/2] = data[tid]+data[tid+1]; __syncthreads();if(tid == 0) {launch<<< 128, 256 >>>(buf); cudaDeviceSynchronize(); }__syncthreads();cudaMemcpyAsync(data, buf, 1024); cudaDeviceSynchronize(); }Code ExampleCUDA Runtime syntax & semanticsLaunch is per-threadSync includes all launches by any thread in the blockcudaDeviceSynchronize() does not imply syncthreadsAsynchronous launches only(note bug in program, here!)__global__ void libraryCall(float *a,float *b, float *c) {// All threads generate datacreateData(a, b);__syncthreads();// Only one thread calls library if(threadIdx.x == 0) {cublasDgemm(a, b, c);cudaDeviceSynchronize();}// All threads wait for dtrsm__syncthreads();// Now continueconsumeData(c);} CPU launcheskernelPer-block datagenerationCall of 3rd partylibrary3rd party libraryexecutes launchParallel useof resultSimple example: QuicksortTypical divide-and-conquer algorithmRecursively partition-and-sort dataEntirely data-dependent executionNotoriously hard to do efficiently on Fermi3 2 6 3 9 14 25 1 8 7 9 2 58 3 2 6 3 9 1 4 2 5 1 8 7 9 2 58 2 1 2 1 2 36 3 94 5 8 7 9 5 8 3 6 3 4 5 8 7 58 1 2 2 2 3 3 4 1 5 6 7 8 8 9 95 eventually...Select pivot valueFor each element: retrieve valueRecurse sort into right-handsubsetStore left if value < pivotStore right if value >= pivotall done?Recurse sort into left-hand subset NoYes__global__ void qsort(int *data, int l, int r) {int pivot = data[0];int *lptr = data+l, *rptr = data+r;// Partition data around pivot valuepartition(data, l, r, lptr, rptr, pivot);// Launch next stage recursively if(l < (rptr-data))qsort<<< ... >>>(data, l, rptr-data); if(r > (lptr-data))qsort<<< ... >>>(data, lptr-data, r); }。

一种虚拟化操作系统基于事件驱动的实时调度方法

网
塑堕墨Ｉ
ＯｕｅｓｔＳｌＯ
上多个操作系统（ｅｔ）相互隔离的内存域（ — ＧｕｓＯＳ在Ｄｏｍａｎ中同时运行。如图１所示，头表示调度。其中的ｉ）箭系统调度采用二级调度策略，拟化操作系统对ＧｕｓＯＳ虚ｅｔ（ｍａｎ进行第一级调度，ｅｔＤｏｉ）ＧｕｓＯＳ对自身的任务进行第
所列。
２．２
实时性分析
以中断处理为例分析中断响应时问，图３所示。从如
图３中可以看到，调度策略的中断响应时间包含：等待原 “
Ｄｏｉ度时间片结束 ” ＋ “ ａｎ切换 ” 新的调度ｍａｎ调Ｄｏｍｉ。
一一
到运行机会。
０类事件 — — 底层硬件中断需要得到上层某个Ｄｏ
ｍａｎ的快速响应处理；ｉ
１类事件 — — ＤｍａｎＧｕｓ（ｓ之间的通信事件需要ｏｉ（ｅｔ））
被另一个Ｄｏｍｉａｎ快速处理；
Ａｂｔａｔｓｒｃ：Ｔｈｓｔｓｓｒｅｒｈｅｃｈｄｕｉｅｈｏｓｄｏｎｔｏｍｂｎｔｏｅｌｔｍｅｅｅｓｄｒｖｅａｉｅｄｒｖｅｆｒｔｅｖｒｕａｉａｉｈｅｉｅｓａｃｓａｓｅｌｎｇｍｔｄｂａｅｈｅｃｉａｉｎｏｆｒａｉｖｎｔｉｎｄｔｍｉｏｈｉｔｌｚ — ｔｏｅｒｔｎｇｓｔｍｓＴｈｅｌｔｍｅｃｐａｌｔｆｓｈｅｌｎｇａｇｏｉｈｍａｂｅｉｐｒｖｅｎｄｕｓｄｉｅｌｔｍｅａｌｃｔｏｆｌｄ，ｅｔｎｉｎｏｐａｉｙｓｅ．ｅｒａ—ｉａｂｉｙｏｃｄｕｉｌｒｔｉｃｎｍｏｄａｅｎｒａ—ｉｐｐｉａｉｎｉｅｍｅｉｇｔｅｖｒｕｌｚｔｏｉｔｎｅｂｅｈｉｔａｉａｉｎｕｔｌｙｉｍｄｄｅｅｌｔｍｅｓｓｅｓｉｄｒａ— ｉｙｔｍ．

操作系统课件第21章

Chapter 21: The Linux System
Chapter 21: The Linux System
Linux History Design Principles Kernel Modules Process Management Scheduling Memory Management File Systems Input and Output Interprocess Communication Network Structure

Other new features included:

Available for Motorola 68000-series processors, Sun Sparc systems, and for PC and PowerMac systems 2.4 and 2.6 increased SMP support, added journaling file system, preemptive kernel, 64-bit memory support
Hale Waihona Puke License (GPL), the terms of which are set out by the Free Software Foundation
Anyone using Linux, or creating their own derivative of Linux, may
not make the derived product proprietary; software released under the GPL may not be redistributed as a binary-only product

Linux下进程绑定多CPU运行

进程绑定多核运行名词CPU affinity：中文称作“CPU亲和力”，是指在CMP架构下，能够将一个或多个进程绑定到一个或多个处理器上运行。

一、在Linux上修改进程的“CPU亲和力”在Linux上，可以通过taskset命令进行修改。

运行如下命令可以安装taskset工具。

在CentOS/Fedora 下安装schedutils：# yum install schedutils在Debian/Ubuntu 下安装schedutils：# apt-get install schedutils如果正在使用CentOS/Fedora/Debian/Ubuntu 的最新版本的话，schedutils/util-linux 这个软件包可能已经装上了。

计算CPU Affinity 和计算SMP IRQ Affinity 差不多：0x00000001 (CPU0)0x00000002 (CPU1)0x00000003 (CPU0+CPU1)0x00000004 (CPU2)...如果想设置进程号(PID)为12212 的进程到CPU0 上的话：# taskset 0x00000001 -p 12212或者关掉任务（MySQL），并用taskset将它启动：# taskset -c 1,2,3 /etc/init.d/mysql start对于其他进程，也可如此处理（nginx除外，详见下文）。

之后用top查看CPU的使用情况。

二、配置nginx绑定CPU刚才说nginx除外，是因为nginx提供了更精确的控制。

在conf/nginx.conf中，有如下一行：worker_processes 1;这是用来配置nginx启动几个工作进程的，默认为1。

而nginx还支持一个名为worker_cpu_affinity的配置项，也就是说，nginx可以为每个工作进程绑定CPU。

我做了如下配置：worker_processes 3;worker_cpu_affinity 0010 0100 1000;这里0010 0100 1000是掩码，分别代表第2、3、4颗cpu核心。

systemd cpuquota 原理

systemd cpuquota 原理全文共四篇示例，供读者参考第一篇示例：Systemd是目前Linux系统中最流行的初始化系统之一，它被设计用来管理系统的启动进程以及后台服务。

Systemd提供了多种功能，其中就包括CPU配额（CPU quota）的管理。

CPU quota是一种可以限制进程使用CPU资源的技术，它可以帮助系统管理员在多任务环境中更好地管理CPU资源，避免某个进程占用过多的CPU导致系统负载过高。

Systemd通过cgroup（控制组）来管理系统资源，其中就包括CPU资源。

cgroup是Linux内核中的一种机制，它可以将一组进程归类到一个cgroup中，并对这个cgroup中的进程实施资源限制。

Systemd利用cgroup来对进程进行CPU配额的管理。

在Systemd中，可以通过设置CPUQuota字段来定义进程的CPU 配额。

CPUQuota字段的取值茹茹为一个百分比，表示进程在一个时间片内可以使用CPU的百分比。

如果将CPUQuota设置为50%，那么表示该进程在一个时间片内只能使用50%的CPU资源。

Systemd会根据CPUQuota的设置来限制进程的CPU使用。

当一个进程被启动时，Systemd会为这个进程创建一个cgroup，并将该进程归类到这个cgroup中。

然后Systemd根据CPUQuota的设定来对这个cgroup中的进程进行CPU资源的调度。

如果某个进程的CPU使用超出了设定的CPUQuota，Systemd会自动降低这个进程的CPU优先级，以确保其他进程能够正常运行。

这样就可以避免某个进程长时间占用CPU资源而导致其他进程无法运行的情况发生。

Systemd的CPUQuota原理是通过cgroup来管理系统资源，通过设置CPUQuota字段来对进程进行CPU配额限制，从而有效地管理系统中的CPU资源。

通过合理地配置CPUQuota，可以确保系统中的各个进程能够公平地竞争CPU资源，提升系统的性能和稳定性。

nerdctl restart策略

nerdctl restart策略nerdctl是一个用于与容器运行时接口（CRI）兼容的容器运行时。

它是Docker CLI的一种替代品，使用类似于Docker CLI的命令进行容器管理。

nerdctl支持包括重启策略在内的各种容器管理功能，为用户提供了更灵活、高效的容器操作体验。

重启策略是指在容器发生故障时，系统如何自动将容器重启起来。

nerdctl支持多种重启策略，可以根据具体情况选择适合的策略来保证容器的高可用性和稳定性。

nerdctl支持的重启策略有以下几种：1. no重启策略：当容器停止时，不会自动重启。

这是默认的重启策略。

2. always重启策略：当容器停止时，会自动重启容器。

这种策略适用于需要保持容器一直运行的场景，比如Web服务器。

3. on-failure重启策略：当容器非正常停止（退出状态码非零）时，会自动重启容器。

可以通过设置--restart flag来指定重启次数，比如--restart=on-failure:5表示在容器失败5次后停止重启。

4. unless-stopped重启策略：容器停止时，除非手动停止容器，否则会自动重启。

这种策略适用于需要容器一直运行，但允许用户手动停止容器的场景。

重启策略是通过在运行容器时使用--restart flag来指定的。

以下是一些使用示例：1.使用no重启策略：```nerdctl run --restart=no my-container```2.使用always重启策略：```nerdctl run --restart=always my-container```3.使用on-failure重启策略：```nerdctl run --restart=on-failure:5 my-container```4.使用unless-stopped重启策略：```nerdctl run --restart=unless-stopped my-container```通过合理选择重启策略，可以在容器发生故障时快速恢复服务，并确保容器的高可用性和稳定性。

mpirun each processor is running a duplicate job

mpirun each processor is running a
duplicate job
这个错误提示表明在使用 `mpirun` 运行 MPI 程序时，每个处理器都在运行相同的任务，而不是分配给不同的任务。

这种情况可能发生的原因有以下几种：
1. 你可能在使用 `mpirun` 命令时没有正确指定任务的数量或分配任务给不同的处理器。

请确保你使用正确的参数来指定要运行的任务数量，并将任务分配给不同的处理器。

2. 你的 MPI 程序可能没有正确地实现任务分配。

请检查你的程序代码，确保在初始化阶段正确地分配任务给不同的处理器。

3. 你的 MPI 实现可能存在问题。

尝试使用不同的 MPI 实现（如 OpenMPI、MPICH 等），看看问题是否仍然存在。

4. 你的硬件环境可能存在问题。

检查你的计算机是否支持 MPI 并行计算，并且确保所有的处理器都正常工作。

为了解决这个问题，你可以尝试以下步骤：
1. 检查你的 `mpirun` 命令参数，确保正确指定了任务数量和处理器分配。

2. 检查你的程序代码，确保在初始化阶段正确地分配任务给不同的处理器。

3. 尝试使用不同的 MPI 实现。

4. 检查你的硬件环境。

如果你仍然遇到问题，请提供更多的上下文和错误信息，以便更深入地了解问题所在并给出更具体的解决方案。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Creating New CPU Schedulers with Virtual TimeAndy BavierDepartment of Computer Science,Princeton UniversityPrinceton,NJ08544acb@AbstractWe propose a design methodology for producing CPU schedulers with provable real-time behaviors.Our ap-proach is grounded in the well-known technique of using virtual time to track aﬂuid model representation of the sys-tem.However,our work aims to go beyond traditional fair sharing schedulers like Weighted Fair Queueing by laying bare the mathematical foundations of virtual time.We show how arbitrary changes described mathematically to the fair sharingﬂuid model can be manifested in real time in the running system.The BERT scheduler is presented as an example of how to use our framework.We hope that this work will enable the creation of new and interesting real-time scheduling algorithms.1IntroductionWe propose a methodology for creating complex and dy-namic schedulers with provable real-time properties.The key to our framework is using virtual time to track aﬂuid model representation of the system.Though this technique has clearly been used to design schedulers(e.g.,WFQ in 1989),we believe that a general theory of virtual time has not been elaborated before.In fact,the BERT algorithm[2], which is the prime example of a scheduler designed in ac-cordance with our theory,actually preceded it.First we cre-ated and implemented BERT and convinced ourselves that it worked;the insight about why it worked came later,and led to our methodology.We realized that the steps we took to create BERT could be used to produce any number of real-time scheduling algorithms.Since modifying virtual-time schedulers to change their behaviors has been the subject of recent work[4,6],we hope that others willﬁnd our ap-proach useful.To create a new scheduling algorithm using our method, the designer follows four steps:1.Mathematically describe changes to how processes ex-ecute in the fair queueingﬂuid model2.Track theﬂuid model changes using virtual time3.Modify the virtual timestamps of affected tasks4.Execute the task set in order of increasing stampsThe rest of this paper summarizes what is involved at each step.We use the BERT algorithm as an example of how to use our method,and we intersperse our description with the explanation of why it works.2BERTThe BERT(Best Effort and Real-Time)scheduler is designed to schedule a mix of real-time and best effort (i.e.,conventional)processes in Scout,a communication-oriented OS.BERT’s focus is on producing good system behavior despite the problems of overload and changing ap-plication requirements that are widespread in multimedia systems.BERT reasons that a best effort process wants a CPU share;on the other hand,a multimedia process would like its deadlines met,regardless of the share needed to do so.When these requirements conﬂict,as they are bound to do in an overloaded system,the system should use the im-portance of processes to the user to resolve conﬂicts in an intuitive way.BERT comprises a virtual-time-based scheduling algo-rithm,a simple policy framework,and a minimal user inter-face.The scheduling algorithm combines the WF Q+fair sharing algorithm[3]and a mechanism called stealing.The policy framework divides all processes into two priority lev-els,important and unimportant,and deﬁnes how processes in each class interact with those in other classes;its main feature is that an important real-time process can steal cy-cles from unimportant processes to meet its deadlines.The user interface includes a button on the frame of each appli-cation window that the user clicks to indicate that she con-siders the application important.In this discussion we will focus on BERT’s scheduling algorithm,and particularly on the stealing mechanism.BERT exploits the relationship between virtual and real time implied by the bounded lag of the WF Q+algorithm.BERT notes that if a task’s deadline falls after the lag bound of the algorithm,then the deadline will be met because thetask will have completed running by then.Furthermore, BERT uses stealing to give an important real-time task extra cycles to meet its deadline when its share is too small.Steal-ing manipulates theﬂuid model and virtual time to explic-itly redistribute the reserved service of unimportant tasks toan important real-time task.Stealing introduces a dynamic dimension into static fair sharing of the CPU.The stealing mechanism is spread across several levels offair sharing theory and implementation.First,it mathemat-ically describes the virtual multiplexing of tasks within the context of theﬂuid model.Second,the stealing mechanismcalculates how virtual timeﬂows for the affected processes in the modiﬁedﬂuid model—one process gets delayed a lit-tle(in virtual time)while the other speeds up.Third,the timestamps of tasks belonging to the processes are modiﬁed to reﬂect the changes in virtual time.Stealing uses virtualtime to track changes in theﬂuid model,resulting in tasks receiving new timestamps.It is crucial that stealing preserves the relationship be-tween virtual and real time on which BERT depends.In the rest of this paper,we demonstrate why stealing works inreal time,and in the process outline a framework for deriv-ing new scheduling mechanisms based on virtual time.3MethodologyThe Fair Queueing Fluid Model(FQFM for short)forms the foundation of fair sharing algorithms like Weighted Fair Queueing.The model describes the real-time behavior of anideal,ﬂuid system in which each process receives at least its reserved rate whenever it is active.The FQFM can begiven a concise mathematical deﬁnition as follows.Let the processes in the system be indexed from to.Each process generates a sequence of tasks that represent chunksof work of known duration.Let be the th process and be the th task it generates.reserves a cycle rate that can be expressed in any units,for example,cycles per second.Let be the total cycles that process has received so far.Also,let be the actual processor rate, and let be the set containing the indices of all currently active processes.At all times,theﬂuid model deﬁnes the instantaneous execution rates of the current task belonging to:(1)The above simply states that the instantaneous execution rate of a process is the proportion of the CPU equal to its re-served rate over the sum of the rates of all active processes. Since admission control ensures that the sum of all rates never exceeds the CPU rate,each running process will al-ways receive at least its reserved rate in the model.Note that the units of the reservation(e.g.,cycles per second)do not matter since the model describes an instantaneous exe-cution rate.BERT provides an example of how to dynamically mod-ify theﬂuid model description of the system.The FQFM provides the base of the BERT algorithm,but BERT de-parts from the FQFM when one process steals from another. BERT describes stealing at the lowest level in terms of mod-ifying theﬂow of theﬂuid model:conceptually,stealing pauses one process in theﬂuid model and gives its alloca-tion to another for a predeﬁned interval.Formally,this is expressed as follows.When process steals from process ,the cycles that would receive during the steal are diverted to.If was idle at the start of stealing,it is considered active(i.e.,)while the stealing is going on.During the stealing interval:(2)process(i.e.,the rate expressed in terms of virtual time)to be constant and equal to the rate the process has reserved. That is,virtual time lets us provide a simpliﬁed description of the system in which each process runs on its own CPU of speed.So,if is the current virtual time,then virtual timeﬂows at the rate:(4)We can combine Eqs.1and4to express the rate of pro-cess in virtual time:(6)If an algorithm dynamically alters theﬂuid model de-scription,as BERT does,then this can change the virtual ﬁnish time of a task that had previously been assigned a timestamp.In this case,it is necessary to change the times-tamp of the affected task so that the ready queue continues to reﬂect theﬂuid model.When one process steals from another,the virtualﬁnish times of tasks are affected as described at the end of Sec-tion3.1.BERT modiﬁes the timestamps of tasks in the sys-tem accordingly—however,care must be taken when doing so.The reason is that some tasks which are still“execut-ing”in theﬂuid model may in reality have already run,and so are no longer in the system.It is not possible to modify the virtual timestamp of such a task and so it must not be stolen from.Rather than checking whether or not a task is in the sys-tem before stealing from it,BERT’s approach is to rely on the known workahead bound of a process.The workahead indicates the amount of a process’s reservation that can be received in the real system in advance of theﬂuid model;in [1]we show that this quantity is bounded for BERT.Prior to stealing,BERT calculates the amount of cycles that can be stolen from a process before a particular deadline.Since the workahead bound represents cycles that a process may have already received,BERT subtracts them from the total. Though conservative,this allows BERT to safely steal from processes without having to track whether particular tasks have already run.3.3Execution OrderVirtual time algorithms execute the task set in order of increasing timestamps.We have outlined the progress of a virtual time algorithm through theﬂuid model deﬁnition, tracking the model using virtual time,and assigning times-tamps.At this point we tie it all together and show how running tasks by increasing timestamps leads to a real-time algorithm that provably conforms to itsﬂuid model descrip-tion.Figueira and Pasquale establish two very powerful re-sults in[5].First,if the eligible task sequence is schedulable under any policy,then it is schedulable under preemptive deadline-ordered scheduling—for our purposes,deadline-oriented scheduling is the same as Earliest Deadline First, or EDF.Second,this same task sequence is-schedulable under nonpreemptive deadline-ordered scheduling.Simply stated,these results mean that if it is possible to meet all deadlines using some scheduling discipline,then preemp-tive EDF will meet them,and nonpreemptive EDF will miss them by no more than a quantity,which is the runtime of the longest task in the system.With these results in hand,the signiﬁcance of the steps in our method becomes clear.Executing tasks by increas-ing timestamps runs them in the same order as theirﬂuid model deadlines,and so is equivalent to EDF.By deﬁnition, theﬂuid model itself shows that there exists a method,al-beit impractical,of scheduling the tasks to meet these dead-lines.Therefore,preemptive scheduling by virtual times-tamps meets allﬂuid model deadlines,and nonpreemptive scheduling misses them by no more than the described above.That is,the preemptive algorithm never lags itsﬂuid model description,and the nonpreemptive algorithm has its lag bounded by.In either case,the actual running system conforms to its idealﬂuid model description in real time in a quantiﬁable way.The progress of a process in theﬂuid model never lags the virtual CPU of the process.The reason is that Eq.4 shows that when the sum of all reserved rates is less than the rate of the CPU.As long as this is true,then vir-tual time(showing progress on the virtual CPU)ﬂows faster than real time;this means that,for any interval of time,the cycles received by the process in theﬂuid model are always at least what it would receive on its dedicated CPU.There-fore,since we have established lag bounds relative to the ﬂuid model,the same lag bounds apply to the virtual CPU description of a process’s progress.This result is at least as powerful as those which bound an algorithm’s lag relative to virtual time.BERT depends entirely on this conformity for its effec-tiveness.As originally described in[2],BERT is a nonpre-emptive scheduling algorithm.When BERT needs to meet the time constraint of a task,itﬁrst assumes that the task’s process will receive no more than its reserved rate in the ﬂuid model and calculates a conservativeﬂuid modelﬁnish time for the task.It then steals enough capacity from less important tasks to ensure that the latestﬂuid modelﬁnish time for the task is at least before the timing constraint. With this accomplished,BERT can guarantee that the con-straint will be met.4Future DirectionsOur methodology provides scheduler designers with two additional degrees of freedom over traditional fair sharing. First,the dynamic behavior of the system can be modiﬁed in speciﬁc and controlled ways on theﬂy.The designer describes changes to aﬂuid model of execution,and uses virtual time to manifest the changes in real time in the run-ning system.BERT is a prime example of this.Second,the mathematical basis of the FQFM and virtual time appears to leave room for a process to request any cycle function as its reservation—for example,it could reserve a sine function or a square wave.We must simply do the math to calculate vir-tual timestamps for tasks according to the reservation func-tion,and theory takes care of the rest.A non-constant rate function may allow a process to describe its resource needs more precisely than a simple“slice of the CPU”.We intend to investigate both of these directions more fully in future work.References[1] A.Bavier and L.Peterson.The power of virtual time for mul-timedia scheduling.In Proceedings of the Tenth International Workshop for Network and Operating System Support for Dig-ital Audio and Video,pages65–74,June2000.[2] A.Bavier,L.Peterson,and D.Mosberger.BERT:A schedulerfor best effort and realtime tasks.Technical Report TR-602-99,Department of Computer Science,Princeton University, Mar.1999.[3]J.C.R.Bennett and H.Zhang.Hierarchical packet fair queue-ing algorithms.In Proceedings of the SIGCOMM’96Sympo-sium,pages143–156,Palo Alto,CA,Aug.1996.ACM. [4]K.J.Duda and D.R.Cheriton.Borrowed-virtual-time(BVT)scheduling:supporting latency-sensitive threads in a general-purpose scheduler.In Proceedings of the17th ACM Sympo-sium on Operating System Principles,Dec.1999.[5]N.R.Figueira and J.Pasquale.A schedulability condition fordeadline-ordered service disciplines.ACM Transactions on Networking,5(2):232–244,Apr.1997.[6]J.Nieh and m.The design,implementation and evalua-tion of SMART:A scheduler for multimedia applications.In Proceedings of the Sixteenth Symposium on Operating System Principles,pages184–197,Oct.1997.。