Replies: 2 comments 3 replies
-
As we discussed previously, we should not accept pre-trained models as they are difficult to validate. Rather, we should accept new data to train a new model or extend an existing one. We can ensure that the new data includes all the necessary scenarios for our purposes. For example, we can verify the dataset has
Also, the dataset should have metadata of the server:
This approach enables us to accurately classify the model for a particular server configuration and validate the data. Since each characteristics will create a different power model. To protect privacy, users will have the option to include or not include their workloads in the dataset, in addition to our chosen set of applications. To prevent sensitive information from being exposed, we should recommend running the benchmarks in isolation to ensure that no private data is leaked. By the way, how can we enhance workload isolation? Disabling scheduling on certain CPU cores might be an area worth investigating. |
Beta Was this translation helpful? Give feedback.
-
As discussed previously, we can set up the profiling/clustering as part of a unsupervised learning pipeline that works separately from the model server proper. This way, we might be able to identify a variety of clusters that we may not have known about. Contributors can contribute directly to the profiling and clustering models (via k means clustering). We can store the data for use in our training pipelines as well. |
Beta Was this translation helpful? Give feedback.
-
We need a contributing process and guideline for obtaining the data/model from the contributors.
(i) What and how?
(i-1-1) kepler metrics: anonymise container name? need license?
(ii-1-2) server metadata? cpuinfo memoryinfo (related to How can we serve a model for the server that has no power measurement? #91 (ii-1))
(iii-1-3) performance values (not low-level counter) reported by benchmark workload (related to How can we serve a model for the server that has no power measurement? #91 (ii-2))
(ii) How to validate and merge data (PRs) from contributors? How to verify contributor? limited to maintainer? github verification?
Beta Was this translation helpful? Give feedback.
All reactions