Secrets management with CCE and Hashicorp Vault

Most modern IT setups are composed of several subsystems like databases, object stores, master controller, node access, and more. To access one component from another, some form of credentials are required. Configuring and storing these secrets directly in the components is considered as an antipattern, since a vulnerability of one component may iteratively affect the security of the whole setup.

With centralized secret management it becomes unnecessary to keep secrets used by various applications spreaded across DevOps environments. This helps to close some security attack vectors (like secret sprawl, security islands), but usually introduces a problem of the so-called Secret Zero as a key to the key storage.

Vault is an open-source software, provided and maintained by Hashicorp, that addresses this very problem. It is considered one of the reference solutions for it. This article demonstrates how to utilize infrastructure authorization with Hashicorp Vault in an CCE-powered setup. As an example workload, we deploy a Zookeeper cluster with enabled TLS protection. Certificates for Zookeeper are stored in Vault, and they oblige required practices like rotations or audits. Zookeper can easily be replaced by any other component that requires access to internal credentials.

Overview

digraph cce_vault { graph [bgcolor=transparent compound=true fontcolor="#2D3436" fontname="Sans-Serif" fontsize=10 rankdir="RL"] node [fixedsize=false] subgraph cluster_cce { graph [bgcolor="#E5F5FD" shape=box style=rounded label="CCE" rankdir="LR"] zk_svc [label="ZK Service" fixedsize=True fontsize=10 height=1.4 image="../_static/images/k8_svc.png" imagescale="true" labelloc=b shape=none width=1 rank="min"] zk_pod1 [label="ZK Pod" fixedsize=True fontsize=10 height=1.4 image="../_static/images/k8_pod.png" imagescale="true" labelloc=b shape=none width=1 rank="same"] zk_pod2 [label="ZK Pod" fixedsize=True fontsize=10 height=1.4 image="../_static/images/k8_pod.png" imagescale="true" labelloc=b shape=none width=1 rank="same"] zk_pod3 [label="ZK Pod" fixedsize=True fontsize=10 height=1.4 image="../_static/images/k8_pod.png" imagescale="true" labelloc=b shape=none width=1 rank="same"] zk_pod1 -> zk_svc zk_pod2 -> zk_svc zk_pod3 -> zk_svc } vault -> zk_pod1 vault -> zk_pod2 vault -> zk_pod3 vault [label="Vault" fixedsize=True fontsize=10 height=1.4 image="../_static/images/vault.png" imagescale="true" labelloc=b shape=none width=1] user [label=Clients fixedsize=true fontsize=10 height=1.4 image="../_static/images/users.png" imagescale=true labelloc=b shape=none width=1] zk_svc -> user [dir=both] }

TLS secrets are kept in the Vault. They are being read by Vault Agent component running as a sidecar in Zookeeper service pod and writes certificates onto the file system. Zookeeper services reads certificates populated by Agent. Vault Agent is configured to use password-less access to Vault. Further in the document it is explained how exactly this is implemented.

Establishing trust between CCE and Vault

Before any application managed by the CCE is able to login to Vault relying on infrastructure based authentication it is required to do some steps on the Vault side. Kubernetes auth plugin is enabled and configured to only access requests from specific Kubernetes cluster by providing its Certificate Authority. To allow several multiple different CCE clusters to use Vault, a dedicated auth path is going to be used.

$ vault auth enable -path kubernetes_cce1 kubernetes
$ vault write auth/kubernetes_cce1/config \
    kubernetes_host="$K8S_HOST" \
    kubernetes_ca_cert="$SA_CA_CRT"

Since in our example a dedicated service account with token is being periodically rotated using client JWT as reviewer JWT can be used.

Access rules for Vault

Having Auth plugin enabled, as described above, CCE workloads are able to authenticate to Vault, but they can do nothing. It is now necessary to establish further level of authorization and let particular service accounts of CCE to get access to secrets in Vault.

For the scope of the use case, we grant the Zookeeper service account from its namespace access to the TLS secrets stored in Vault’s key-value store. For that a policy providing a read-only access to the /tls/zk* and /tls/ca paths is created.

$ vault policy write tls-zk-ro - <<EOF
path "secret/data/tls/zk_*" {capabilities = ["read"] }
path "secret/data/tls/ca" {capabilities = ["read"] }
path "secret/metadata/tls/zk_*" {capabilities = ["read"] }
path "secret/metadata/tls/ca" {capabilities = ["read"] }
EOF

Next granting the policy to the particular requestor (zookeeper service account in zookeeper namespace) must be done.

$ vault write auth/kubernetes_cce1/role/zookeeper \
    bound_service_account_names=zookeeper \
    bound_service_account_namespaces=zookeeper \
    policies=tls-zk-ro \
    ttl=2h

With this done token of the service account zookeeper in the zookeeper namespace is able to access to the vault for reading secrets located under /secret/tls path. And since it is higly recommended to follow the least required privilege principle only read only access to the TLS data is granted. A time to live of two hours is being used here meaning that once application authorize to Vault the token it gets can be used during next two hours. After two hours Vault token becomes invalid and Vault Agent gets a new one valid for next 2 hours. This needs to be carefully aligned with the time to live or the service account token to minimize their overlap. It is advised to keep it relatively short.

This is one the most sensitive steps in the whole configuration, since the applications deployed in the Kubernetes may escape their scope or get compromised by attackers. Reducing the number of secrets the accessor can read mitigates this risk.

Populating secrets in Vault

Within Vault there are two possibilities to access TLS certificates:

Vault enables users not only to store TLS certificates data in the key-value store, but also to create and revoke them. To keep this tutorial simple enough we are not going to do this and just upload generated certificates into the KV store. For production setups this example can be easily extended with extra actions.

$ vault kv put secret/tls/ca certificate=@ca.crt
$ vault kv put secret/tls/zk_server certificate=@zk_server.crt private_key=@zk_server.key
$ vault kv put secret/tls/zk_client certificate=@zk_client.crt private_key=@zk_client.key

Certificate paths and property names used here are referenced by the Zookeeper installation.

Deploying Zookeeper

Now that the secrets are stored safely in Vault and only allowed applications can fetch them it is time to look how exactly the application accesses the secrets. Generally, utilizing Vault requires modification of the application. Vault agent is a tool that was created to simplify secrets delivery for applications when it is hard or difficult to change the application itself. The Agent is taking care of reading secrets from Vault and can deliver them to the file system.

There are many way how to properly implement Zookeeper service on the Kubernetes. The scope of the blueprint is not Zookeeper itself, but demostrating how an application can be supplied by required certificates. The reference architecture described here bases on the best practices gathered from various sources and extended by HashiCorp Vault. It overrides default Zookeeper start scripts in order to allow better control of the runtime settings and properly fill all required configuration options for TLS to work. Other methods of deploying Zookeeper can be easily used here instead.

  1. Create a Kubernetes namespace named zookeeper.

$ kubectl create namespace zookeeper
  1. Create a Kubernetes service account named zookeeper.

$ kubectl create serviceaccount zookeeper
  1. In Kubernetes a service account provides an identity for the services running in the pod so that the process can access Kubernetes API. The same identity can be used to access Vault, but require one special permission - access to the tokenreview API of the Kubernetes. When instead a dedicated reviewer JWT is used, this step is not necessary, but it also means long-living sensitive data is used and frequently transferred over the network. More details on various ways to use Kubernetes tokens to authorize to Vault can be found here.

$ kubectl create clusterrolebinding vault-client-auth-delegator \
    --clusterrole=system:auth-delegator \
    --serviceaccount=zookeeper:zookeeper
  1. Create a Kubernetes ConfigMap with all required configurations. One possible approach is to define dedicated health and readiness check scripts and to override automatically created Zookeeper start script. This is especially useful when TLS protection is enabled, but default container scripts do not support this.

zookeeper-cm.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: zookeeper-config
  namespace: "zookeeper"
data:
  ok: |
    #!/bin/sh
    # This sript is used by live-check of Kubernetes pod
    if [ -f /tls/ca.pem ]; then
      echo "srvr" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \
        -key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null | grep Mode

    else
      zkServer.sh status
    fi

  ready: |
    #!/bin/sh
    # This sript is used by readiness-check of Kubernetes pod
    if [ -f /tls/ca.pem ]; then
      echo "ruok" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \
        -key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null
    else
      echo ruok | nc 127.0.0.1 ${1:-2181}
    fi

  run: |
    #!/bin/bash
    # This is the main starting script
    set -a
    ROOT=$(echo /apache-zookeeper-*)
    ZK_USER=${ZK_USER:-"zookeeper"}
    ZK_LOG_LEVEL=${ZK_LOG_LEVEL:-"INFO"}
    ZK_DATA_DIR=${ZK_DATA_DIR:-"/data"}
    ZK_DATA_LOG_DIR=${ZK_DATA_LOG_DIR:-"/data/log"}
    ZK_CONF_DIR=${ZK_CONF_DIR:-"/conf"}
    ZK_CLIENT_PORT=${ZK_CLIENT_PORT:-2181}
    ZK_SSL_CLIENT_PORT=${ZK_SSL_CLIENT_PORT:-2281}
    ZK_SERVER_PORT=${ZK_SERVER_PORT:-2888}
    ZK_ELECTION_PORT=${ZK_ELECTION_PORT:-3888}
    ID_FILE="$ZK_DATA_DIR/myid"
    ZK_CONFIG_FILE="$ZK_CONF_DIR/zoo.cfg"
    LOG4J_PROPERTIES="$ZK_CONF_DIR/log4j.properties"
    HOST=$(hostname)
    DOMAIN=`hostname -d`
    APPJAR=$(echo $ROOT/*jar)
    CLASSPATH="${ROOT}/lib/*:${APPJAR}:${ZK_CONF_DIR}:"
    if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
        NAME=${BASH_REMATCH[1]}
        ORD=${BASH_REMATCH[2]}
        MY_ID=$((ORD+1))
    else
        echo "Failed to extract ordinal from hostname $HOST"
        exit 1
    fi
    mkdir -p $ZK_DATA_DIR
    mkdir -p $ZK_DATA_LOG_DIR
    echo $MY_ID >> $ID_FILE

    echo "dataDir=$ZK_DATA_DIR" >> $ZK_CONFIG_FILE
    echo "dataLogDir=$ZK_DATA_LOG_DIR" >> $ZK_CONFIG_FILE
    echo "4lw.commands.whitelist=*" >> $ZK_CONFIG_FILE
    # Client TLS configuration
    if [[ -f /tls/ca.pem ]]; then
      echo "secureClientPort=$ZK_SSL_CLIENT_PORT" >> $ZK_CONFIG_FILE
      echo "ssl.keyStore.location=/tls/client/client.pem" >> $ZK_CONFIG_FILE
      echo "ssl.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE
    else
      echo "clientPort=$ZK_CLIENT_PORT" >> $ZK_CONFIG_FILE
    fi
    # Server TLS configuration
    if [[ -f /tls/ca.pem ]]; then
      echo "serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory" >> $ZK_CONFIG_FILE
      echo "sslQuorum=true" >> $ZK_CONFIG_FILE
      echo "ssl.quorum.keyStore.location=/tls/server/server.pem" >> $ZK_CONFIG_FILE
      echo "ssl.quorum.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE
    fi
    for (( i=1; i<=$ZK_REPLICAS; i++ ))
    do
        echo "server.$i=$NAME-$((i-1)).$DOMAIN:$ZK_SERVER_PORT:$ZK_ELECTION_PORT" >> $ZK_CONFIG_FILE
    done
    rm -f $LOG4J_PROPERTIES
    echo "zookeeper.root.logger=$ZK_LOG_LEVEL, CONSOLE" >> $LOG4J_PROPERTIES
    echo "zookeeper.console.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.file=zookeeper.log" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.maxfilesize=256MB" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.maxbackupindex=10" >> $LOG4J_PROPERTIES
    echo "zookeeper.tracelog.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES
    echo "zookeeper.tracelog.file=zookeeper_trace.log" >> $LOG4J_PROPERTIES
    echo "log4j.rootLogger=\${zookeeper.root.logger}" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE.Threshold=\${zookeeper.console.threshold}" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE.layout.ConversionPattern=\
      %d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n" >> $LOG4J_PROPERTIES
    if [ -n "$JMXDISABLE" ]
    then
        MAIN=org.apache.zookeeper.server.quorum.QuorumPeerMain
    else
        MAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=$JMXPORT \
          -Dcom.sun.management.jmxremote.authenticate=$JMXAUTH \
          -Dcom.sun.management.jmxremote.ssl=$JMXSSL \
          -Dzookeeper.jmx.log4j.disable=$JMXLOG4J \
          org.apache.zookeeper.server.quorum.QuorumPeerMain"
    fi
    set -x
    exec java -cp "$CLASSPATH" $JVMFLAGS $MAIN $ZK_CONFIG_FILE

  vault-agent-config.hcl: |
    exit_after_auth = true
    pid_file = "/home/vault/pidfile"
    auto_auth {
        method "kubernetes" {
            mount_path = "auth/kubernetes_cce1"
            config = {
                role = "zookeeper"
                token_path = "/run/secrets/tokens/vault-token"
            }
        }
        sink "file" {
            config = {
                path = "/home/vault/.vault-token"
            }
        }
    }

    cache {
        use_auto_auth_token = true
    }

    # ZK is neat-picky on cert file extensions
    template {
      destination = "/tls/ca.pem"
      contents = <<EOT
    {{- with secret "secret/data/tls/ca" }}{{ .Data.data.certificate }}{{ end }}
    EOT
    }

    template {
      destination = "/tls/server/server.pem"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.certificate }}
    {{ .Data.data.private_key }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/server/tls.crt"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.certificate }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/server/tls.key"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.private_key }}{{ end }}
    EOT
    }

    template {
      destination = "/tls/client/client.pem"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.certificate }}
    {{ .Data.data.private_key }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/client/tls.crt"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.certificate }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/client/tls.key"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.private_key }}{{ end }}
    EOT
    }
$ kubectl apply -f zookeeper-cm.yaml
  1. Create Zookeeper Headless service. It is used by pods to build quorum and implementing cluster internal communication.

zookeeper-svc.yaml
 ---
 name: "zookeeper-svc"
 namespace: "zookeeper"
 apiVersion: v1
 kind: Service
 spec:
   # Not exposing in the cluster
   clusterIP: None
   # Important to start up
   publishNotReadyAddresses: true
   selector:
     app: zookeeper
   ports:
   - port: 2281
     name: client
     targetPort: client
     protocol: TCP
   - port: 2888
     name: server
     targetPort: server
     protocol: TCP
   - port: 3888
     name: election
     targetPort: election
     protocol: TCP
$ kubectl apply -f zookeeper-svc.yaml
  1. Create Frontend service. It is used by the clients and therefore only includes client port of Zookeeper.

zookeeper-svc-public.yaml
apiVersion: v1
kind: Service
spec:
  clusterIP: None
  ports:
  - name: client
    port: 2281
    protocol: TCP
    targetPort: client
  selector:
    app: zookeeper
  sessionAffinity: None
  type: ClusterIP
$ kubectl apply -f zookeeper-svc-public.yaml
  1. Create StatefulSet replacing <VAULT_PUBLIC_ADDR> with the address of the Vault server. This includes a pod with Vault Agent side container as an init container, Vault Agent side container used continuously in the run cycle of the pod and Zookeeper main container.

zookeeper-ss.yaml
apiVersion: apps/v1
kind: StatefulSet
spec:
  podManagementPolicy: Parallel
  replicas: 3
  selector:
    matchLabels:
      app: zookeeper
      component: server
  serviceName: zookeeper-headless
  template:
    metadata:
      labels:
        app: zookeeper
        component: server
    spec:
      containers:

      - args:
        - agent
        - -config=/etc/vault/vault-agent-config.hcl
        - -log-level=debug
        - -exit-after-auth=false
        env:
        - name: VAULT_ADDR
          value: <VAULT_PUBLIC_ADDR>
        image: vault:1.9.0
        name: vault-agent-sidecar
        volumeMounts:
        - mountPath: /etc/vault
          name: vault-agent-config
        - mountPath: /tls
          name: cert-data
        - mountPath: /var/run/secrets/tokens
          name: k8-tokens

      - command:
        - /bin/bash
        - -xec
        - /config-scripts/run
        env:
        - name: ZK_REPLICAS
          value: "3"
        - name: ZOO_PORT
          value: "2181"
        - name: ZOO_STANDALONE_ENABLED
          value: "false"
        - name: ZOO_TICK_TIME
          value: "2000"
        image: zookeeper:3.7.0
        livenessProbe:
          exec:
            command:
            - sh
            - /config-scripts/ok
          failureThreshold: 2
          initialDelaySeconds: 20
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        name: zookeeper
        ports:
        - containerPort: 2281
          name: client
          protocol: TCP
        - containerPort: 2888
          name: server
          protocol: TCP
        - containerPort: 3888
          name: election
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - /config-scripts/ready
          failureThreshold: 2
          initialDelaySeconds: 20
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        securityContext:
          runAsUser: 1000
        volumeMounts:
        - mountPath: /data
          name: datadir
        - mountPath: /tls
          name: cert-data
        - mountPath: /config-scripts
          name: zookeeper-config
      dnsPolicy: ClusterFirst

      initContainers:
      - args:
        - agent
        - -config=/etc/vault/vault-agent-config.hcl
        - -log-level=debug
        - -exit-after-auth=true
        env:
        - name: VAULT_ADDR
          value: <VAULT_PUBLIC_ADDR>
        image: vault:1.9.0
        name: vault-agent
        volumeMounts:
        - mountPath: /etc/vault
          name: vault-agent-config
        - mountPath: /tls
          name: cert-data
        - mountPath: /var/run/secrets/tokens
          name: k8-tokens
      restartPolicy: Always
      serviceAccount: zookeeper
      serviceAccountName: zookeeper
      terminationGracePeriodSeconds: 1800
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: vault-agent-config.hcl
            path: vault-agent-config.hcl
          name: zookeeper-config
        name: vault-agent-config
      - configMap:
          defaultMode: 365
          name: zookeeper-config
        name: zookeeper-config
      - emptyDir: {}
        name: cert-data
      - name: k8-tokens
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              expirationSeconds: 7200
              path: vault-token

  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: datadir
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: csi-disk
      volumeMode: Filesystem
$ kubectl apply -f zookeeper-ss.yaml

With this a production-ready Zookeeper service with enabled TLS has been deployed sucessfully to the CCE. The Vault Agent takes care of authorizing to HashiCorp Vault using a Kubernetes service account with a short time to live token and fetches required secrets to the file system. In the entire Kubernetes deployment there are no secrets for the application, neither the key to the Vault, nor TLS certificates themselves. Not even using Kubernetes secrets is necessary.

References