摘要 |
Embodiments perform automated provisioning of a cluster for a distributed computing platform. Target host computing devices are selected from a plurality of host computing devices based on configuration information, such as a desired cluster size, a data set, code for processing the data set and, optionally, a placement strategy. One or more virtual machines (VMs) are instantiated on each target host computing device. Each VM is configured to access a virtual disk that is preconfigured with code for executing functionality of the distributed computing platform and serves as a node of the cluster. The data set is stored in a distributed file system accessible by at least a subset of the VMs. The code for processing the data set is provided to at least a subset of the VMs, and execution of the code is initiated to obtain processing results. |