Aim

This activity aims at developing a consolidation and scheduling scheme for multiple VMs to accurately predict the VMs’ applications running. This will be achieved by:

  1. Exploring novel machine learning (ML)-based performance prediction approaches based on the identification of the running application inside the VM.
  2. Selecting highly correlated runtime metrics as the ML approaches’ input to accurately predict the performance level of the executed application.
  3. Predicting the performance of applications with highly variable workloads to maximize the use of heterogeneous manycore servers by exploring different performance deterioration indexes and the development of hardware accelerators on reconfigurable hardware.

Technological roadmap

For the implementation of this activity, the following steps will be undertaken:

Phase 1: Deployment of the baseline experimental testbed at the new EPFL DC and an automatic classification approach of new EPFL applications

  1. Creation of a full testbed on the EPFL DC in collaboration with EcoCloud where the different approaches will be developed and assessed. This will include 8 racks with commercial servers of different configurations, provided by EcoCloud in collaboration with different industrial cloud partners, and water-cooled doors (as in the latest production DC at EPFL) as baseline to assess the DC latest performance and cooling capabilities.
  2. Extraction of the traces on the new platform and their analysis along with those provided by Huawei and other industrial cloud vendors affiliated with EcoCloud, as extracted from real public cloud systems.
  3. Incorporation of any kind of application into the prediction model and provides a scheduler to minimize carbon footprint and maximize the use of renewables

Phase 2: Deployment of the baseline experimental testbed at the new EPFL DC and an automatic classification approach of new EPFL applications

  1. Incorporation and demonstration of the trade-offs (on performance vs. energy consumption vs. temperature) for dynamically reconfigurable accelerators in workloads of the EPFL campus. The effective mapping of the different applications and VMs to the existing infrastructure will be explored, including accelerators for the target operating conditions and available energy
  2. Obtain prediction models of resource utilization and execution performance of applications and develop a new carbon-aware scheduling that exploits a new set of application kernels accelerators for a broad applicability in datacenter platforms beyond a specific server configuration.
dc2

Expected outcomes

  1. Maximize energy efficiency of racks and minimize DCs carbon footprint based on EPFL’s energy provisioning context (Target: 50% energy savings w.r.t. 2019)
  2. Recycle EPFL servers to maximize DCs sustainability at EPFL (Target: 7y vs. 3y lifetime for servers – Higher sustainability of servers in EPFL DCs)

The proposed workflow versatility will be able to serve multiple generations of servers and VM configurations from different types of benchmarks (e.g., deep learning, SKA project, etc.), and efficiently adapt their power consumption and carbon footprint to the available electricity.