The Rise of Kubernetes
Kubernetes (K8s for short) is now the de facto backend infrastructure engine. The Cloud Native Computing Foundation Annual Survey for 2021 is titled The year Kubernetes crossed the chasm. To quote a line from the survey.
According to CNCF’s respondents, 96% of organizations are either using or evaluating Kubernetes – a record high since our surveys began in 2016.
Some key benefits of using K8s include
- Operations automation
- Infrastructure and workload scaling
- Cost effectiveness
- Developer productivity
- Strong OSS community
A large number of operations related to Deep Learning like data engineering, data pipelines, training and inference have also benefited by running on the K8s platform. There are many excellent, detailed tutorials on running AI/ML on K8s. Let's take AI inference as an example. TorchServe is a performant, flexible and easy to use tool for serving PyTorch. The TorchServe github README provides detailed instructions on how to deploy TorchServe on
- Azure AKS Cluster
- AWS EKS Cluster
- Google GKE Cluster
The README goes into the details of the steps required to get torchserve running in that managed cluster. For instance, the steps for AKS cover everything from creation of the AKS cluster to deploying and testing torchserve. Each of those tasks involve multiple steps (we won't repeat those here) that are all described in great detail. As another example, here is a blog link covering how to deploy a HuggingFace Vision Transformer model on K8s with TensorFlow Serving.
Running AI/ML in production
K8s tutorials and HowTos are a great starting point. Most developers and DevOps/MLOps engineers get their hands dirty by following such tutorials and learning the tricks of the trade. Experienced DevOps/MLOps folks however, know that running and supporting AI/ML in production is a completely different ball game. Look at the laundry list of items that are typically required to run a production workload
- Decide on the Kubernetes cluster (managed: EKS, GKE, AKS, vs self-managing)
- Worker nodes for K8s cluster, instance types (CPU, GPU, sharing), resource, and node auto-scaling setup / config
- App packaging: Helm charts vs raw Kubernetes resources vs others
- Flexible configurations management, pod resource allocation, pod auto scaling
- Secrets management
- Access control: service accounts, roles and policies, granting minimal permissions
- Network policies, isolation, security
- DNS, routing and certificate management
- Web Application Firewalls(WAF)/DDoS prevention etc
- In K8s cluster routing (e.g., Istio)
- Monitoring, metrics, usage, alerts
- Data isolation and security.
- CI/CD setup, roll out, roll back, etc.
- MLOps framework over K8s or roll your own
- Model serving container scaling, model storage, loading, caching, unloading
- Efficient serving of multiple models from different frameworks
- Cost optimizations
- Usage reporting
As you can see, in order to run AI/ML in production you will need engineers with experience and expertise in technologies like K8s, GPU, Helm, Secrets management, CI/CD pipelines, Logging, Firewalls, Monitoring, Alerts, Networking etc. In addition you need expertise in AI/ML frameworks, model serving and management. It is not surprising then to find out that there are entire teams dedicated to each function in organizations that are successfully reaping the benefits of AI/ML.
Evolving from running K8s to consuming services hosted on K8s
The Number 1 trend in Datadog’s container report from 2021 is that Nearly 90 percent of Kubernetes users leverage cloud-managed services. You read that right! 90 percent of users are choosing to offload the burden of running and managing the K8s cluster to a cloud service. As the CNCF survey notes, K8s is starting to go “under the hood” just like Linux. As Linux is being used in Phones, TVs, Automobiles etc users are simply reaping the benefits of the technologies built on Linux. Similarly, a large number of K8s users are reporting the usage of technologies built on K8s rather than looking under the hood.
The AI/ML world is starting to see a similar evolution. Large number of users are evaluating AI/ML by following various tutorials. As the business and the users get a better understanding of the problems that need to be solved to serve AI/ML in production they are starting to look for higher levels of abstraction that give them the ability to evaluate and use AI/ML without trying to run it all by themselves. The Tiyaro AI as a Service offering is an example of this evolution where users can simply try AI as a service. K8s is not going away. It is still powering all your workloads. It is just going under the hood.