EKS security patches cause Apache Spark jobs to fail with permissions error

Over the past couple days at work we started noticing Spark 2.3 and 2.4 jobs failing with a permissions error across multiple EKS clusters. Here is an example stack trace:

java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
    at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

We eventually realized that this was due to Amazon rolling out security patches to their EKS clusters to address CVE-2019-9512 and CVE-2019-9514, causing a regression with the kubernetes client that Spark uses. Two issues have been filed against Spark for this: SPARK-28921 and SPARK-28925.

The solution is to upgrade Spark to use kubernetes client 4.4.0 or later, to pick up this patch for the issue.

I have created the following trivial pull requests against Spark to upgrade to the 4.4.2 release (latest version at time of writing):

If you need to fix this urgently, there are a few options:

Option 1: Simply replace the kubernetes client jar(s) in the Spark distributions with the three 4.4.2 jars that are available to download from Maven central and hope for the best.

kubernetes-client-4.4.2.jar
kubernetes-model-4.4.2.jar
kubernetes-model-common-4.4.2.jar

Option 2:

Option 3: Ask your Amazon Technical Account Manager to have the patches rolled back, assuming you are comfortable accepting the risk of a DoS attack against your EKS API if it is exposed publicly.