How do you handle that kubernetes requires the eth0 ip in no_proxy? Do you set that automatically?
How do you handle that DNS in a corp net can get weird and for instance in Ubuntu 16.04 the NetworkManager setting for dnsmasq needs to be deactivated?
How do you report dying nodes due to kernel version and docker version not being similar?
Do you report why pods are pending?
Does kops wait until a sucessful health check before it reports a successful deployment (in contrast to helm which reports success when the docker image isn't even finished pulling)?
Do you run any metrics on the cluster to see if everything is working fine?
Edit: Sorry to disturb the kops marketing effort, but some people still hope for a real, enterprise ready solution for k8s instead of just another fluff added on a shaky foundation.
kops is an open source project that is part of the kubernetes project, we're all working to solve these things as best we can. Some of these issues are not best solved in kops; for example we don't try to force a particular monitoring system on you. That said I'm also a kubernetes contributor so I'll try to quickly answer:
* no_proxy - kops is getting support for servers that use http_proxy, but I think your issue is a client issue with kubectl proxy and it looks like it is being investigated in #45956. I retagged (what I think are) the right folks.
* DNS, docker version/kernel version: if you let kops it'll configure the AMI / kernel, docker, DNS, sysctls, everything. So in that scenario everything should just work, because kops controls everything. Obviously things can still go wrong, but I'm much more able to support or diagnose problems with a kops configuration where most things are set correctly, than a general scenario.
* why pods are pending: `kubectl describe pod` shows you why. Your "preferred alerting system" could be more proactive though.
* metrics are probably best handled by a monitoring system, and you should install your preferred system after kops installs the cluster. We try to only install things in kops that are required to get to the kubectl "boot prompt". Lots of options here: prometheus, sysdig, datadog, weave scope, newrelic etc.
* does kops wait for readiness: actually not by default - and this does cause problems. For example, if you hit your AWS instance quota, your kops cluster will silenty never come up. Similarly if your chosen instance type isn't available in your AZ. We have a fix for the latter and are working on the former. We have `kops validate` which will wait, but it's still too hard when something goes wrong - definitely room for improvement here.
In general though - where there are things you think we could do better, do open an issue on kops (or kubernetes if it's more of a kubernetes issue)!
Nice, thanks. My feeling is that this is about 75% of what we want and thereby may really be the best solution there is, right now. I'll bring your responses into my next team meeting.
Thanks for feedback. I agree that a huge wall of text is not desired. I think a single sentence answer is fine.
For instance: "Yes, we can. We considered most of that and also have some enterprise customers with similar setups. Check out "googleterm A", "googleterm B", "googleterm C". If you don't find all of that join our slack chat to get more details."
And a more likely answer, also single line: "WTF are these questions? We thought docker+k8s already solves that." (I would've also expected solutions from there but don't hope for it anymore.)
PS (actually an edit to the previous post, but it's already too old): For instance Openshift, as I just found, considers the docker-version kernel-version problem via "xxx-excluder" meta packages: https://docs.openshift.com/container-platform/3.4/install_co...
How do you handle that DNS in a corp net can get weird and for instance in Ubuntu 16.04 the NetworkManager setting for dnsmasq needs to be deactivated?
How do you report dying nodes due to kernel version and docker version not being similar?
Do you report why pods are pending?
Does kops wait until a sucessful health check before it reports a successful deployment (in contrast to helm which reports success when the docker image isn't even finished pulling)?
Do you run any metrics on the cluster to see if everything is working fine?
Edit: Sorry to disturb the kops marketing effort, but some people still hope for a real, enterprise ready solution for k8s instead of just another fluff added on a shaky foundation.