296 lines
16 KiB
YAML
296 lines
16 KiB
YAML
---
|
|
apiVersion: apiextensions.k8s.io/v1
|
|
kind: CustomResourceDefinition
|
|
metadata:
|
|
annotations:
|
|
controller-gen.kubebuilder.io/version: v0.17.3
|
|
labels:
|
|
app.kubernetes.io/instance: kueue
|
|
app.kubernetes.io/name: kueue
|
|
app.kubernetes.io/version: v0.12.3
|
|
control-plane: controller-manager
|
|
name: cohorts.kueue.x-k8s.io
|
|
spec:
|
|
conversion:
|
|
strategy: Webhook
|
|
webhook:
|
|
clientConfig:
|
|
service:
|
|
name: kueue-webhook-service
|
|
namespace: kueue-system
|
|
path: /convert
|
|
conversionReviewVersions:
|
|
- v1
|
|
group: kueue.x-k8s.io
|
|
names:
|
|
kind: Cohort
|
|
listKind: CohortList
|
|
plural: cohorts
|
|
singular: cohort
|
|
scope: Cluster
|
|
versions:
|
|
- name: v1alpha1
|
|
schema:
|
|
openAPIV3Schema:
|
|
description: |-
|
|
Cohort defines the Cohorts API.
|
|
|
|
Hierarchical Cohorts (any Cohort which has a parent) are compatible
|
|
with Fair Sharing as of v0.11. Using these features together in
|
|
V0.9 and V0.10 is unsupported, and results in undefined behavior.
|
|
properties:
|
|
apiVersion:
|
|
description: |-
|
|
APIVersion defines the versioned schema of this representation of an object.
|
|
Servers should convert recognized schemas to the latest internal value, and
|
|
may reject unrecognized values.
|
|
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
|
|
type: string
|
|
kind:
|
|
description: |-
|
|
Kind is a string value representing the REST resource this object represents.
|
|
Servers may infer this from the endpoint the client submits requests to.
|
|
Cannot be updated.
|
|
In CamelCase.
|
|
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
|
|
type: string
|
|
metadata:
|
|
type: object
|
|
spec:
|
|
description: CohortSpec defines the desired state of Cohort
|
|
properties:
|
|
fairSharing:
|
|
description: |-
|
|
fairSharing defines the properties of the Cohort when
|
|
participating in FairSharing. The values are only relevant
|
|
if FairSharing is enabled in the Kueue configuration.
|
|
properties:
|
|
weight:
|
|
anyOf:
|
|
- type: integer
|
|
- type: string
|
|
default: 1
|
|
description: |-
|
|
weight gives a comparative advantage to this ClusterQueue
|
|
or Cohort when competing for unused resources in the
|
|
Cohort. The share is based on the dominant resource usage
|
|
above nominal quotas for each resource, divided by the
|
|
weight. Admission prioritizes scheduling workloads from
|
|
ClusterQueues and Cohorts with the lowest share and
|
|
preempting workloads from the ClusterQueues and Cohorts
|
|
with the highest share. A zero weight implies infinite
|
|
share value, meaning that this Node will always be at
|
|
disadvantage against other ClusterQueues and Cohorts.
|
|
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
|
|
x-kubernetes-int-or-string: true
|
|
type: object
|
|
parent:
|
|
description: |-
|
|
Parent references the name of the Cohort's parent, if
|
|
any. It satisfies one of three cases:
|
|
1) Unset. This Cohort is the root of its Cohort tree.
|
|
2) References a non-existent Cohort. We use default Cohort (no borrowing/lending limits).
|
|
3) References an existent Cohort.
|
|
|
|
If a cycle is created, we disable all members of the
|
|
Cohort, including ClusterQueues, until the cycle is
|
|
removed. We prevent further admission while the cycle
|
|
exists.
|
|
maxLength: 253
|
|
pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
|
|
type: string
|
|
resourceGroups:
|
|
description: |-
|
|
ResourceGroups describes groupings of Resources and
|
|
Flavors. Each ResourceGroup defines a list of Resources
|
|
and a list of Flavors which provide quotas for these
|
|
Resources. Each Resource and each Flavor may only form part
|
|
of one ResourceGroup. There may be up to 16 ResourceGroups
|
|
within a Cohort.
|
|
|
|
BorrowingLimit limits how much members of this Cohort
|
|
subtree can borrow from the parent subtree.
|
|
|
|
LendingLimit limits how much members of this Cohort subtree
|
|
can lend to the parent subtree.
|
|
|
|
Borrowing and Lending limits must only be set when the
|
|
Cohort has a parent. Otherwise, the Cohort create/update
|
|
will be rejected by the webhook.
|
|
items:
|
|
properties:
|
|
coveredResources:
|
|
description: |-
|
|
coveredResources is the list of resources covered by the flavors in this
|
|
group.
|
|
Examples: cpu, memory, vendor.com/gpu.
|
|
The list cannot be empty and it can contain up to 16 resources.
|
|
items:
|
|
description: ResourceName is the name identifying various resources in a ResourceList.
|
|
type: string
|
|
maxItems: 16
|
|
minItems: 1
|
|
type: array
|
|
flavors:
|
|
description: |-
|
|
flavors is the list of flavors that provide the resources of this group.
|
|
Typically, different flavors represent different hardware models
|
|
(e.g., gpu models, cpu architectures) or pricing models (on-demand vs spot
|
|
cpus).
|
|
Each flavor MUST list all the resources listed for this group in the same
|
|
order as the .resources field.
|
|
The list cannot be empty and it can contain up to 16 flavors.
|
|
items:
|
|
properties:
|
|
name:
|
|
description: |-
|
|
name of this flavor. The name should match the .metadata.name of a
|
|
ResourceFlavor. If a matching ResourceFlavor does not exist, the
|
|
ClusterQueue will have an Active condition set to False.
|
|
maxLength: 253
|
|
pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
|
|
type: string
|
|
resources:
|
|
description: |-
|
|
resources is the list of quotas for this flavor per resource.
|
|
There could be up to 16 resources.
|
|
items:
|
|
properties:
|
|
borrowingLimit:
|
|
anyOf:
|
|
- type: integer
|
|
- type: string
|
|
description: |-
|
|
borrowingLimit is the maximum amount of quota for the [flavor, resource]
|
|
combination that this ClusterQueue is allowed to borrow from the unused
|
|
quota of other ClusterQueues in the same cohort.
|
|
In total, at a given time, Workloads in a ClusterQueue can consume a
|
|
quantity of quota equal to nominalQuota+borrowingLimit, assuming the other
|
|
ClusterQueues in the cohort have enough unused quota.
|
|
If null, it means that there is no borrowing limit.
|
|
If not null, it must be non-negative.
|
|
borrowingLimit must be null if spec.cohort is empty.
|
|
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
|
|
x-kubernetes-int-or-string: true
|
|
lendingLimit:
|
|
anyOf:
|
|
- type: integer
|
|
- type: string
|
|
description: |-
|
|
lendingLimit is the maximum amount of unused quota for the [flavor, resource]
|
|
combination that this ClusterQueue can lend to other ClusterQueues in the same cohort.
|
|
In total, at a given time, ClusterQueue reserves for its exclusive use
|
|
a quantity of quota equals to nominalQuota - lendingLimit.
|
|
If null, it means that there is no lending limit, meaning that
|
|
all the nominalQuota can be borrowed by other clusterQueues in the cohort.
|
|
If not null, it must be non-negative.
|
|
lendingLimit must be null if spec.cohort is empty.
|
|
This field is in beta stage and is enabled by default.
|
|
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
|
|
x-kubernetes-int-or-string: true
|
|
name:
|
|
description: name of this resource.
|
|
type: string
|
|
nominalQuota:
|
|
anyOf:
|
|
- type: integer
|
|
- type: string
|
|
description: |-
|
|
nominalQuota is the quantity of this resource that is available for
|
|
Workloads admitted by this ClusterQueue at a point in time.
|
|
The nominalQuota must be non-negative.
|
|
nominalQuota should represent the resources in the cluster available for
|
|
running jobs (after discounting resources consumed by system components
|
|
and pods not managed by kueue). In an autoscaled cluster, nominalQuota
|
|
should account for resources that can be provided by a component such as
|
|
Kubernetes cluster-autoscaler.
|
|
|
|
If the ClusterQueue belongs to a cohort, the sum of the quotas for each
|
|
(flavor, resource) combination defines the maximum quantity that can be
|
|
allocated by a ClusterQueue in the cohort.
|
|
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
|
|
x-kubernetes-int-or-string: true
|
|
required:
|
|
- name
|
|
- nominalQuota
|
|
type: object
|
|
maxItems: 16
|
|
minItems: 1
|
|
type: array
|
|
x-kubernetes-list-map-keys:
|
|
- name
|
|
x-kubernetes-list-type: map
|
|
required:
|
|
- name
|
|
- resources
|
|
type: object
|
|
maxItems: 16
|
|
minItems: 1
|
|
type: array
|
|
x-kubernetes-list-map-keys:
|
|
- name
|
|
x-kubernetes-list-type: map
|
|
required:
|
|
- coveredResources
|
|
- flavors
|
|
type: object
|
|
x-kubernetes-validations:
|
|
- message: flavors must have the same number of resources as the coveredResources
|
|
rule: self.flavors.all(x, size(x.resources) == size(self.coveredResources))
|
|
maxItems: 16
|
|
type: array
|
|
x-kubernetes-list-type: atomic
|
|
type: object
|
|
status:
|
|
description: CohortStatus defines the observed state of Cohort.
|
|
properties:
|
|
fairSharing:
|
|
description: |-
|
|
fairSharing contains the current state for this Cohort
|
|
when participating in Fair Sharing.
|
|
The is recorded only when Fair Sharing is enabled in the Kueue configuration.
|
|
properties:
|
|
admissionFairSharingStatus:
|
|
description: admissionFairSharingStatus represents information relevant to the Admission Fair Sharing
|
|
properties:
|
|
consumedResources:
|
|
additionalProperties:
|
|
anyOf:
|
|
- type: integer
|
|
- type: string
|
|
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
|
|
x-kubernetes-int-or-string: true
|
|
description: |-
|
|
ConsumedResources represents the aggregated usage of resources over time,
|
|
with decaying function applied.
|
|
The value is populated if usage consumption functionality is enabled in Kueue config.
|
|
type: object
|
|
lastUpdate:
|
|
description: LastUpdate is the time when share and consumed resources were updated.
|
|
format: date-time
|
|
type: string
|
|
required:
|
|
- consumedResources
|
|
- lastUpdate
|
|
type: object
|
|
weightedShare:
|
|
description: |-
|
|
WeightedShare represents the maximum of the ratios of usage
|
|
above nominal quota to the lendable resources in the
|
|
Cohort, among all the resources provided by the Node, and
|
|
divided by the weight. If zero, it means that the usage of
|
|
the Node is below the nominal quota. If the Node has a
|
|
weight of zero and is borrowing, this will return
|
|
9223372036854775807, the maximum possible share value.
|
|
format: int64
|
|
type: integer
|
|
required:
|
|
- weightedShare
|
|
type: object
|
|
type: object
|
|
type: object
|
|
served: true
|
|
storage: true
|
|
subresources:
|
|
status: {}
|