-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented site resource deduction after scheduling #282
Conversation
please add label "ready-for-review" when PR is ready for review. |
sched.siteCacheInfoSnapshot.FlavorMap = config.FlavorMap | ||
return nil | ||
} | ||
|
||
//This function updates sites' flavor | ||
func (sched *Scheduler) UpdateRegionFlavor(region string, flavorId string) (err error) { | ||
regionFlavorId := region + constants.SiteDelimiter + flavorId | ||
regionFlavorId := region + "||" + flavorId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"||" is probably better stored using a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"||" is declared as FlavorDelimiter at constants.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case let's use FlavorDelimiter here instead of string literal "||"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already changed to "FlavorDelimiter".
@@ -403,6 +411,8 @@ func (n *SiteCacheInfo) UpdateSiteWithVolumePool(volumePool *typed.RegionVolumeP | |||
|
|||
// UpdateSiteWithResInfo update res info | |||
func (n *SiteCacheInfo) UpdateSiteWithResInfo(resInfo types.AllResInfo) error { | |||
klog.Infof("SiteCacheInfo - UpdateSiteWithResInfo - Resource : %v", resInfo) | |||
klog.Infof("SiteCacheInfo-RequestedResources : %v", n.RequestedResources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SiteCacheInfo-RequestedResources missing spaces before and after "-"
these logs probably can have a log level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two logs were modified to
klog.Infof("resource %v of region %v is reuested:",resourceReq, region)
For log level, current we have info, warning, error, fatal, and panic. Could you please tell me if you have any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for less frequent log, could use something like klog.V(4).Infof(...), 4 is an example here. the higher the number, the less important are the logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will use it.
klog.infof() will be klog.V(4).infof()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have to initialize klog in scheduler. Otherwise changing klog.Infof to klog.V(4).Infof won't work.
globalscheduler/pkg/scheduler/framework/plugins/defaultbinder/default_binder.go
Outdated
Show resolved
Hide resolved
globalscheduler/pkg/scheduler/framework/plugins/defaultbinder/default_binder.go
Outdated
Show resolved
Hide resolved
if totalRes == nil { | ||
n.deductFlavor() | ||
return | ||
} else if requestRes == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“} else if” not needed here because the if case above has "return" in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"else" is removed.
reqRes.Memory += res.Memory | ||
n.RequestedResources[resType] = reqRes | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I'd add a continue after line 566 and get rid of the "} else {"
if len(n.RequestedResources) == 0 {
reqRes := types.CPUAndMemory{VCPU: res.VCPU, Memory: res.Memory}
n.RequestedResources[resType] = &reqRes
continue
}
for reqType, reqRes := range n.RequestedResources {
resTypes := strings.Split(reqType, constants.FlavorDelimiter)
if !utils.IsContain(resTypes, resType) {
klog.Infof("!utils.IsContain: %v", !utils.IsContain(resTypes, resType))
continue
}
reqRes.VCPU += res.VCPU
reqRes.Memory += res.Memory
n.RequestedResources[resType] = reqRes
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"continue" is added and "else" is removed
func (n *SiteCacheInfo) DeductSiteResInfo(resInfo types.AllResInfo, regionFlavorMap map[string]*typed.RegionFlavor) error { | ||
var resourceTypes []string | ||
for resType, res := range resInfo.CpuAndMem { | ||
//binding a pod for the first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment seems to be missing some words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment seems to be missing some words
changed the comment as follows:
//resource type is null, assign default resource type (e.g. when binding a pod for the first time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
a few minor comments.
@kimeunju108 please ping me when this is ready for merge. |
I have a concern that if the pod fails to create an openstack vm and has to release resources, do we have any logic to handle it? |
No, this PR is only for deduction. As far as I know ResourceCollector will push resource changes immediately to scheduler. If that requirement was changed, we can discuss about it during meeting today. |
I understand. The resource collector takes 60 seconds to get resource information from openstack while global resource scheduler takes <100 ms to process a pod. If a pod failed, it might take 60 seconds to get the the reduced resource back. I am not sure if we are fine with it. There is a case that first 100 pods with wrong images and then second 100 pods with right images are created, in the old logic the first 100 pods failed but the second 100 pods created. After we implement the deduction here, the first 100 pods claim all the resources and the second 100 pods failed to create because the site does not have any resource available. |
I mean, if possible, can we add syncResource from openstack once we find out that all the resources are claimed? |
What type of PR is this?
What this PR does / why we need it:
This PR proposed the deduction of resource from scheduler cache after scheduling a pod
Which issue(s) this PR fixes:
Fixes #252
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NONE
How to test?:
(1) Run the following script
~/go/src/k8s.io/arktos$ ./hack/globalscheduler/globalscheduler-up.sh
~/go/src/k8s.io/arktos$ cd ~/go/src/k8s.io/arktos/globalscheduler/test/yaml
$ kubectl apply -f globalscheduler/test/yaml/sample_6_clusters.yaml
$ kubectl apply -f sample_6_clusters.yaml
$kubectl apply -f sample_2_schedulers.yaml
$kubectl apply -f sample_2_distributors.yaml
$kubectl apply -f sample_2_dispatchers.yaml
$kubectl apply -f sample_6_pods.yaml
$kubectl get pods
(2) check openstack and log files at /tmp