CRI-RM Helping Wave AISTATION Improves Yunte Students Workload Performance

Processors and Universal Memory Access Architecture pro and implement the perfect adapter of the artificial intelligent task accelerator to Kubernetes’s cluster environment.

Taiwan, facing deep-class development training scenes, comprehensive integration of AI computing resources, training data resources, and AI development tools.

Taiwan, AI technology development, model training, and applied it to business processes. However, the construction of the AI ??Taiwan is not a chance. From the development of the AI ??model, to eventually enter the production deployment phase, companies will face different challenges from resource management, model testing, etc., and also need to give full play to hardware such as CPUs and improve the AI ??training energy.

Can bottleneck, greatly enhance AI calculations.

The container applications on multiple hosts in Taiwan, implement the unified deployment, planning, update and maintenance of AI resources, can effectively improve the user’s AI resource management rate, can manage, expandable, and use. It is possible to provide users with high energy AI computing resources to achieve efficient calculation, accurate resource management and scheduling, agile data integration, and acceleration, process-based AI scenes and business integration. An important choice for Taiwan, how to provide? Although the GPU is generally used for AI training, this does not mean that the GPU is the only choice.

In fact, in a large number of industry scenarios, users want to make full use of existing CPU computing resources, flexibly meet the requirements of various loads such as AI while reducing capital expenditures. However, in the K8S cluster, use CPUs to train, users will encounter certain bottlenecks. This is because the K8S native CPU management mechanism does not consider the CPU binding and NUMA affinity, the high version of K8S will only take effect on the QoS Guaranteed POD, which may cause the CPU to not fully function in AI training.

After the bottleneck discovered the AI ??force bottleneck on the K8S cluster, the wave launched in-depth cooperation with Intel, and the K8S was optimized using CRI-RM (resource manager based on container runtime interface). The component can be inserted between Kubelet and ContainerRuntime (CR), intercepting requests from the KubletCri protocol, playing a non-transparent agent of Cr, tracking all cluster node container status, better handling processors, memory, IO peripherals, and memory Resources such as controllers are assigned to the application load, thereby effectively improvement.

Can improve [1]. This means that the application of CRI-RM will increase significantly without updating the hardware configuration, so that users can improve the infrastructure without the need for hardware input. Utilize efficiency and saves the overall cost (TCO). Compare can be verified, and plan to use the CPU to carry out more extensive cooperation, through hardware selection, software optimization, system integration, etc., accelerate from the cloud to the edge infrastructure. Artificial intelligence can perform.