Consolidate multiple tasks or operations into a single computational unit. This pattern can increase compute resource utilization, and reduce the costs and management overhead associated with performing compute processing in cloud-hosted applications.
合并多個任務或操作成一個單一的計算單元。這種模式可以提高計算資源的利用率,并降低與云托管的應用程序進行計算處理相關的成本和管理開銷。
Context and Problem 背景與問題
A cloud application frequently implements a variety of operations. In some solutions it may make sense initially to follow the design principle of separation of concerns, and divide these operations into discrete computational units that are hosted and deployed individually (for example, as separate roles in a Microsoft Azure Cloud Service, separate Azure Web Sites, or separate Virtual Machines). However, although this strategy can help to simplify the logical design of the solution, deploying a large number of computational units as part of the same application can increase runtime hosting costs and make management of the system more complex.
云應用程序頻繁地實現各種操作。在一些解決方案可能是有意義最初遵循的關注點分離的設計原則,并劃分這些操作成離散的計算單位并進行托管和獨立部署(例如,如在微軟Azure云服務,獨立Azure網站,獨立角色離散計算單元,或單獨的虛擬機)。然而,盡管這種策略可以幫助簡化解決方案的邏輯設計,可部署大量計算單元在同一應用下提高運行時的托管費用,使系統的管理更加復雜。
As an example, Figure 1 shows the simplified structure of a cloud-hosted solution that is implemented using more than one computational unit. Each computational unit runs in its own virtual environment. Each function has been implemented as a separate task (labeled Task A through Task E) running in its own computational unit.
作為一個例子,圖1示出了使用多個計算單元實施了云托管解決方案的簡化的結構。每個計算單元在自己的虛擬環境中運行。每個功能已被實現為自己的計算單元上運行一個單獨的任務(通過任務E標記任務A)。
Figure 1 - Running tasks in a cloud environment by using a set of dedicated computational units
圖1 - 通過使用一組專用的計算單元運行在云環境中的任務
Each computational unit consumes chargeable resources, even when it is idle or lightly used. Therefore, this approach may not always be the most cost-effective solution.
每個計算單元消耗的資源收費,即使是空閑或輕易使用。因此,這種方法可能不總是最有成本效益的解決方案。
Solution 解決方案
To help reduce costs, increase utilization, improve communication speed, and ease the management effort it may be possible to consolidate multiple tasks or operations into a single computational unit.
為了幫助降低成本,提高利用率,提高通信速度,減輕管理工作有可能將多個任務或操作固定成一個單一的計算單元。
Tasks can be grouped according to a variety of criteria based on the features provided by the environment, and the costs associated with these features. A common approach is to look for tasks that have a similar profile concerning their scalability, lifetime, and processing requirements. Grouping these items together allows them to scale as a unit. The elasticity provided by many cloud environments enables additional instances of a computational unit to be started and stopped according to the workload. For example, Azure provides autoscaling that you can apply to roles in a Cloud Service, Web Sites, and Virtual Machines. For more information, see Autoscaling Guidance.
任務可以根據各種基于由環境提供的功能,以及與這些功能相關的成本標準進行分組。一種常見的方法是尋找有關于他們的可擴展性,壽命和加工要求具有相似的任務。分組這些產品一起使它們能夠擴展為一個單元。由許多云環境所提供的彈性使一計算單元的其他實例,以根據業務負載來啟動和停止。例如,Azure提供了自動縮放,可以適用于云服務的角色、網站和虛擬機。欲了解更多信息,請參閱自動縮放指導。
As a counter example to show how scalability can be used to determine which operations should probably not be grouped together, consider the following two tasks:
作為抗衡例子來說明的可擴展性可以如何被用來確定哪些操作可能不應該被分組在一起,考慮以下兩個任務:
- Task 1 polls for infrequent, time-insensitive messages sent to a queue.
- Task 2 handles high-volume bursts of network traffic.
•任務1輪詢發送到隊列的情況較少,時間不敏感的消息。
•任務2處理高容量爆發的網絡流量。
The second task requires elasticity that may involve starting and stopping a large number of instances of the computational unit. Applying the same scaling to the first task would simply result in more tasks listening for infrequent messages on the same queue, and is a waste of resources.
第二任務要求的彈性可能涉及在啟動和停止的大量的計算單元的實例。應用相同的縮放到第一任務只會導致更多的任務上監聽同一隊列不頻繁的消息,并且是一種資源的浪費。
In many cloud environments it is possible to specify the resources available to a computational unit in terms of the number of CPU cores, memory, disk space, and so on. Generally, the more resources specified, the greater the cost. For financial efficiency, it is important to maximize the amount of work an expensive computational unit performs, and not let it become inactive for an extended period.
If there are tasks that require a great deal of CPU power in short bursts, consider consolidating these into a single computational unit that provides the necessary power. However, it is important to balance this need to keep expensive resources busy against the contention that could occur if they are over-stressed. Long-running, compute-intensive tasks should probably not share the same computational unit, for example.
Issues and Considerations 問題和注意事項
Consider the following points when implementing this pattern:
實施這一模式時,請考慮以下幾點:
- Scalability and Elasticity. Many cloud solutions implement scalability and elasticity at the level of the computational unit by starting and stopping instances of units. Avoid grouping tasks that have conflicting scalability requirements in the same computational unit.
- 可擴展性和彈性。許多云解決方案通過啟動和停止的單元實例實現可擴展性和彈性的計算單元的水平。避免分組在同一個計算單元相互矛盾的可擴展性要求的任務。
- Lifetime. The cloud infrastructure may periodically recycle the virtual environment that hosts a computational unit. When executing many long-running tasks inside a computational unit, it may be necessary to configure the unit to prevent it from being recycled until these tasks have finished. Alternatively, design the tasks by using a check-pointing approach that enables them to stop cleanly, and continue at the point at which they were interrupted when the computational unit is restarted.
- 生命周期。云基礎設施會定期回收承載在虛擬環境的計算單元。當執行許多長期運行的任務在一個計算單元內,可能需要配置設備,以防止它被回收,直到這些任務已經完成。另外,設計這些任務通過使用檢查點的方法,使他們停止干凈,并繼續中斷的點在該計算單元重新啟動的時候。
- Release Cadence. If the implementation or configuration of a task changes frequently, it may be necessary to stop the computational unit hosting the updated code, reconfigure and redeploy the unit, and then restart it. This process will also require that all other tasks within the same computational unit are stopped, redeployed, and restarted.
- 釋放節奏。如果一個任務的執行或配置頻繁變化,可能有必要停止計算單元承載更新的代碼、配置和部署的單元,然后重新啟動它。這個過程也將需要相同的計算單元中的所有其他任務被停止,部署和重新啟動。
- Security. Tasks in the same computational unit may share the same security context and be able to access the same resources. There must be a high degree of trust between the tasks, and confidence that that one task is not going to corrupt or adversely affect another. Additionally, increasing the number of tasks running in a computational unit may increase the attack surface of the computational unit; each task is only as secure as the one with the most vulnerabilities.
- 安全。在相同的計算單元的任務可以共享相同的安全上下文,并能夠訪問相同的資源。必須有高度的可信性在任務之間,一個任務是不會損壞或產生不利影響另一個。此外,增加在一個計算單元的任務數目可能增加計算單元的攻擊性;每個任務只是與大多數漏洞一個盡可能安全的。
- Fault Tolerance. If one task in a computational unit fails or behaves abnormally, it can affect the other tasks running within the same unit. For example, if one task fails to start correctly it may cause the entire startup logic for the computational unit to fail, and prevent other tasks in the same unit from running.
- 容錯能力。如果一個計算單元中的某個任務失敗或異常行為,它可以影響運行在同一單位內的其他任務。例如,如果一個任務無法正常啟動它可能導致的整個計算單元的啟動邏輯失敗,且防止運行中的相同單位的其他任務。
- Contention. Avoid introducing contention between tasks that compete for resources in the same computational unit. Ideally, tasks that share the same computational unit should exhibit different resource utilization characteristics. For example, two compute-intensive tasks should probably not reside in the same computational unit, and neither should two tasks that consume large amounts of memory. However, mixing a compute intensive task with a task that requires a large amount of memory may be a viable combination.
- 競爭。避免引入競爭同一計算單元的任務之間爭奪資源。理想情況下,共享相同的計算單元的任務應該表現出不同的資源利用特性。例如,兩個計算密集型任務可能不應該駐留在相同的計算單位,,也不應該這個兩個任務占用大量內存。然而,混合計算密集型任務需要大量內存的任務可能是一個可行的組合。
- Note:注意
- You should consider consolidating compute resources only for a system that has been in production for a period of time so that operators and developers can monitor the system and create a heat map that identifies how each task utilizes differing resources. This map can be used to determine which tasks are good candidates for sharing compute resources.
-
你應該考慮鞏固只對已經生產了一段時間以便運營商和開發者可以監視系統和創建標識每個任務如何利用不同資源的熱圖系統的計算資源。這張地圖可以用于確定哪些任務很適合于共享計算資源。
- Complexity.Combining multiple tasks into a single computational unit adds complexity to the code in the unit, possibly making it more difficult to test, debug, and maintain.
- 復雜性。多個任務合并成一個單一的計算單元中使單元增加了代碼的復雜性,可能使得更難以進行測試,調試和維護。
- Stable Logical Architecture. Design and implement the code in each task so that it should not need to change, even if the physical environment in which task runs does change.
- 穩定的邏輯架構。設計并實現了每個任務中的代碼,以便它不應該需要改變,即使物理環境中運行任務確實發生了改變。
- Other Strategies. Consolidating compute resources is only one way to help reduce costs associated with running multiple tasks concurrently. It requires careful planning and monitoring to ensure that it remains an effective approach. Other strategies may be more appropriate, depending on the nature of the work being performed and the location of the users on whose behalf these tasks are running. For example, functional decomposition of the workload (as described by the Compute Partitioning Guidance) may be a better option.
- 其他策略。整合計算資源是唯一一家以幫助減少同時運行多個任務相關的成本的方法。這需要仔細規劃和監測,以確保它仍然是一個有效的辦法。其他策略可能更適合,取決于正在執行的工作的性質和所代表這些任務正在運行的用戶的位置。例如,工作量(如由計算分區指導所述)的功能分解可能是一個更好的選擇。
When to Use this Pattern 什么時候使用本模式
Use this pattern for tasks that are not cost effective if they run in their own computational units. If a task spends much of its time idle, running this task in a dedicated unit can be expensive.
使用這種模式不是成本有效如果他們運行在自己計算單位的任務。如果一項任務花了很多空閑時間,運行此任務在一個專門的單元是昂貴的。
This pattern might not be suitable for tasks that perform critical fault-tolerant operations, or tasks that process highly-sensitive or private data and require their own security context. These tasks should run in their own isolated environment, in a separate computational unit.
這種模式可能不適合于執行關鍵容錯操作該處理高度敏感的或私人數據,并要求他們自己的安全上下文的任務。這些任務應該在他們自己的分離的環境中運行,在一個單獨的計算單元。
Example 例子
When building a cloud service on Azure, it’s possible to consolidate the processing performed by multiple tasks into a single role. Typically this is a worker role that performs background or asynchronous processing tasks.
當在Azure上構建一個云服務,它可能合并多任務的處理成一個單一的角色。通常,這是執行的背景或異步處理任務的輔助角色。
Note:
In some cases it may be possible to include background or asynchronous processing tasks in the web role. This technique can help to reduce costs and simplify deployment, although it can impact the scalability and responsiveness of the public-facing interface provided by the web role. The article Combining Multiple Azure Worker Roles into an Azure Web Role contains a detailed description of implementing background or asynchronous processing tasks in a web role.
在某些情況下,它可能在 web 角色中包括背景或異步處理任務。這種技術可以幫助降低成本和簡化部署,雖然它可以影響的可伸縮性和響應能力提供 web 角色的面向公眾的界面。文章《結合多個 Azure 工作者角色到 Azure Web 角色》包含 web 角色實施背景或異步處理任務的詳細的說明。
The role is responsible for starting and stopping the tasks. When the Azure fabric controller loads a role, it raises the Start event for the role. You can override the OnStart method of the WebRole or WorkerRole class to handle this event, perhaps to initialize the data and other resources on which the tasks in this method depend.
角色是負責啟動和停止的任務。當Azure結構控制器加載一個角色,它提出了該角色的Start事件。您可以覆蓋WebRole或WorkerRole類的OnStart方法來處理這個事件,也許是初始化數據和其他資源,在這種方法的任務依賴。
When the OnStart method completes, the role can start responding to requests. You can find more information and guidance about using the OnStart and Run methods in a role in theApplication Startup Processes section in the patterns & practices guide Moving Applications to the Cloud.
當OnStart方法完成后,角色就可以開始響應請求。你可以找到更多的信息和有關使用在應用程序啟動一個角色的OnStart和Run方法指導流程的模式與實踐指南移動應用程序到云部分。
Note:注意:
Keep the code in the OnStart method as concise as possible. Azure does not impose any limit on the time taken for this method to complete, but the role will not be able to start responding to network requests sent to it until this method completes.
請盡可能保持OnStart方法的代碼簡潔。 Azure不征收采取這種方法來完成任何時間限制,但角色將不能啟動響應,直到這個方法完成發送給它的網絡請求。
When the OnStart method has finished, the role executes the Run method. At this point, the fabric controller can start sending requests to the role.
當OnStart方法完成后,角色執行Run方法。在這一點上,結構控制器可以開始將請求發送到角色。
Place the code that actually creates the tasks in the Run method. Note that the Run method effectively defines the lifetime of the role instance. When this method completes, the fabric controller will arrange for the role to be shut down.
在Run方法中設置實際創建任務的代碼。注意,Run方法有效地定義角色實例的生命周期。當該方法完成后,結構控制器安排的作用將被關閉。
When a role shuts down or is recycled, the fabric controller prevents any more incoming requests being received from the load balancer and raises the Stop event. You can capture this event by overriding the OnStop method of the role and perform any tidying up required before the role terminates.
當一個角色關閉或回收,結構控制器可以防止從負載平衡器接收任何更多的傳入請求并引發Stop事件。您可以通過覆蓋角色的OnStop方法捕獲這個事件,并執行任何清理行動中的作用終止前必需的。
Note:
Any actions performed in the OnStop method must be completed within five minutes (or 30 seconds if you are using the Azure emulator on a local computer); otherwise the Azure fabric controller assumes that the role has stalled and will force it to stop.
在調用OnStop方法執行的任何動作必須在五分鐘(或30秒,如果您使用的是本地計算機上的Azure仿真器)內完成;否則Azure結構控制器假設角色已經停止,并迫使它停下來。
Figure 2 illustrates the lifecycle of a role, and the tasks and resources that it hosts. The tasks are started by the Run method, which then waits for the tasks to complete. The tasks themselves, which implement the business logic of the cloud service, can respond to messages posted to the role through the Azure load balancer.
圖2示出了角色的生命周期,并且它承載的任務和資源。啟動任務的 Run 方法,然后等待要完成的任務。這實現了云服務的業務邏輯的任務本身,可以通過Azure的負載均衡器發布到角色的消息作出回應。
Figure 2 - The lifecycle of tasks and resources in a role in a Azure cloud service
Note:
The ComputeResourceConsolidation.Worker project is part of the ComputeResourceConsolidation solution that is available for download with this guidance.
ComputeResourceConsolidation.Worker 項目是可供下載本指南的ComputeResourceConsolidation 解決方案的一部分。
In the worker role, code that runs when the role is initialized creates the required cancellation token and a list of tasks to run.
輔助角色中,在運行時初始化作用的代碼創建需要的取消標記CancellationToken和要運行的任務列表。
public class WorkerRole: RoleEntryPoint { // The cancellation token source used to cooperatively cancel running tasks. private readonly CancellationTokenSource cts = new CancellationTokenSource (); // List of tasks running on the role instance. private readonly List<Task> tasks = new List<Task>(); // List of worker tasks to run on this role. private readonly List<Func<CancellationToken, Task>> workerTasks = new List<Func<CancellationToken, Task>> { MyWorkerTask1, MyWorkerTask2 }; ... }
The MyWorkerTask1 and the MyWorkerTask2 methods are provided to illustrate how to perform different tasks within the same worker role. The following code shows MyWorkerTask1. This is a simple task that sleeps for 30 seconds and then outputs a trace message. It repeats this process indefinitely until the task is cancelled. The code in MyWorkerTask2 is very similar.
提供的MyWorkerTask1和MyWorkerTask2方法來說明如何在同一輔助角色內執行不同的任務。下面的代碼顯示MyWorkerTask1。這是一個簡單的任務,休眠30秒,然后輸出一個跟蹤消息。直到任務被取消它無限期地重復這個過程。在MyWorkerTask2代碼非常相似。
// A sample worker role task. private static async Task MyWorkerTask1(CancellationToken ct) { // Fixed interval to wake up and check for work and/or do work. var interval = TimeSpan.FromSeconds(30); try { while (!ct.IsCancellationRequested) { // Wake up and do some background processing if not canceled. // TASK PROCESSING CODE HERE Trace.TraceInformation("Doing Worker Task 1 Work"); // Go back to sleep for a period of time unless asked to cancel. // Task.Delay will throw an OperationCanceledException when canceled. await Task.Delay(interval, ct); } } catch (OperationCanceledException) { // Expect this exception to be thrown in normal circumstances or check // the cancellation token. If the role instances are shutting down, a // cancellation request will be signaled. Trace.TraceInformation("Stopping service, cancellation requested"); // Re-throw the exception. throw; } }
Note:注意:
The approach shown by the sample code is a common implementation of a background process. In a real world application you can follow this same structure, except that you should place your own processing logic in the body of the loop that waits for the cancellation request.
通過示例代碼中顯示的方法是一個后臺進程的共同實現。在實際應用中,你可以按照同樣的結構,但你應該把你自己的處理邏輯在等待取消請求的循環體。
After the worker role has initialized the resources it uses, the Run method starts the two tasks concurrently, as shown here.
隨后輔助角色被初始化當它使用資源時,Run方法同時啟動兩個任務,如下圖所示。
... // RoleEntry Run() is called after OnStart(). // Returning from Run() will cause a role instance to recycle. public override void Run() { // Start worker tasks and add them to the task list. foreach (var worker in workerTasks) tasks.Add(worker(cts.Token)); Trace.TraceInformation("Worker host tasks started"); // The assumption is that all tasks should remain running and not return, // similar to role entry Run() behavior. try { Task.WaitAny(tasks.ToArray()); } catch (AggregateException ex) { Trace.TraceError(ex.Message); // If any of the inner exceptions in the aggregate exception // are not cancellation exceptions then re-throw the exception. ex.Handle(innerEx => (innerEx is OperationCanceledException)); } // If there was not a cancellation request, stop all tasks and return from Run() // An alternative to cancelling and returning when a task exits would be to // restart the task. if (!cts.IsCancellationRequested) { Trace.TraceInformation("Task returned without cancellation request"); Stop(TimeSpan.FromMinutes(5)); } } ...
In this example, the Run method waits for tasks to be completed. If a task is canceled, the Run method assumes that the role is being shut down and waits for the remaining tasks to be canceled before finishing (it waits for a maximum of five minutes before terminating). If a task fails due to an expected exception, the Run method cancels the task.
在這個例子中,Run方法等待要完成的任務。如果任務被取消,Run方法假定作用正在關閉,并等待剩余的任務完成(它等待超過五分鐘結束之前)之前被取消。如果任務失敗,因為預期異常,Run方法取消任務。
Note:
Note that you could implement more comprehensive monitoring and exception handling strategies in the Run method such as restarting tasks that have failed, or including code that enables the role to stop and start individual tasks.
請注意,你可以實現在Run方法更全面的監測和異常處理策略,如重新啟動已失敗的任務,或者包括代碼使角色停止和啟動單個任務。
The Stop method shown in the following code is called when the fabric controller shuts down the role instance (it is invoked from the OnStop method). The code stops each task gracefully by cancelling it. If any task takes more than five minutes to complete, the cancellation processing in the Stop method ceases waiting and the role is terminated.
在下面的代碼所示的Stop方法當結構控制器關閉角色實例(它是從調用OnStop方法調用)被調用。該代碼通過取消其正常停止每項任務。如果任何任務的時間超過五分鐘就能完成,在Stop方法取消處理不再等待和角色被終止。
// Stop running tasks and wait for tasks to complete before returning // unless the timeout expires. private void Stop(TimeSpan timeout) { Trace.TraceInformation("Stop called. Canceling tasks."); // Cancel running tasks. cts.Cancel(); Trace.TraceInformation("Waiting for canceled tasks to finish and return"); // Wait for all the tasks to complete before returning. Note that the // emulator currently allows 30 seconds and Azure allows five // minutes for processing to complete. try { Task.WaitAll(tasks.ToArray(), timeout); } catch (AggregateException ex) { Trace.TraceError(ex.Message); // If any of the inner exceptions in the aggregate exception // are not cancellation exceptions then re-throw the exception. ex.Handle(innerEx => (innerEx is OperationCanceledException)); } }
Related Patterns and Guidance 相關模式和指導
The following patterns and guidance may also be relevant when implementing this pattern:
實施這一模式時,以下模式和指導也可能是相關的:
- Autoscaling Guidance. Autoscaling can be used to start and stop instances of service hosting computational resources, depending on the anticipated demand for processing.
- 自動縮放指引。自動縮放可以用來啟動和停止服務的實例托管計算資源,這取決于要處理的預期需求。
- Compute Partitioning Guidance. This guidance describes how to allocate the services and components in a cloud service in a way that helps to minimize running costs while maintaining the scalability, performance, availability, and security of the service.
- 計算分區指引。該指導說明如何分配在云服務的服務和組件的方式,有助于最小化運行成本,同時保持了可擴展性,性能,可用性,和服務的安全性。
文章列表