Getting started

Invite bot to broadcast channel

For security reasons, the Slack API prohibits bots to invite themselves to channels. The bot therefore needs to be manually invited to the chosen broadcast channel before it can communicate there. Invite the bot to the broadcast channel by using this command in the channel:

/invite devopsbot

Build

These instructions assume you are on a Mac.

Make sure you have the latest version of Go installed: brew update && brew upgrade

Install golangci-lint to be able to run make lint: brew install golangci/tap/golangci-lint

To build the binary locally:

$ make build

To build the Docker image locally:

$ make image.iid

Deployment

The bot can be deployed any preferred way.

Kubernetes

A certificate need to be issued to expose the application over HTTPS, for example via ZeroSSL or Let's Encrypt.

The application need to be made publicly available.

The Kubernetes resources could look like this for example:

---
# Source: helmchart/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: devopsbot-settings
data:
  addr: :3333
  incident.environments: |-
    [
      "Staging",
      "Production"
    ]
  incident.regions: |-
    [
      "eu-west-1",
      "us-east-1"
    ]
  incident.severityLevels: |-
    [
      "high",
      "medium",
      "low"
    ]
  incident.impactLevels: |-
    [
      "high",
      "medium",
      "low"
    ]
  server.prometheusNamespace: devopsbot
  tls.addr: :3443
  tls.cert: /var/devopsbot/tls.crt
  tls.key: /var/devopsbot/tls.key
  trace: "false"
  verbose: "false"
---
# Source: helmchart/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: devopsbot
  labels:
    app: devopsbot
spec:
  ports:
      - port: 3333
        targetPort: 3333
        protocol: TCP
        name: "http"
  selector:
    app: devopsbot
---
# Source: helmchart/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: devopsbot
  labels:
    app: devopsbot

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  replicas: 1
  selector:
    matchLabels:
      app: devopsbot
  template:
    metadata:
      labels:
        app: devopsbot
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/path: "/metrics"
        prometheus.io/port: "3333"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 25
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - devopsbot
              topologyKey: "kubernetes.io/hostname"
      containers:
        - name: devopsbot
          image: <path_to_image>
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 3333
          resources:
            limits:
              cpu: 900m
              memory: 256Mi
            requests:
              cpu: 600m
              memory: 20Mi
          readinessProbe:
            httpGet:
              path: /ready
              port: 3333
            initialDelaySeconds: 1
            timeoutSeconds: 1
            periodSeconds: 2
            failureThreshold: 300
          livenessProbe:
            httpGet:
              path: /live
              port: 3333
            initialDelaySeconds: 300
            timeoutSeconds: 2
            periodSeconds: 3
            failureThreshold: 2
          env:
            - name: slack.botAccessToken
              valueFrom:
                secretKeyRef:
                  key: slack.botAccessToken
                  name: app-secrets
            - name: slack.userAccessToken
              valueFrom:
                secretKeyRef:
                  key: slack.userAccessToken
                  name: app-secrets
            - name: slack.adminGroupID
              valueFrom:
                secretKeyRef:
                  key: slack.adminGroupID
                  name: app-secrets
            - name: slack.broadcastChannelID
              valueFrom:
                secretKeyRef:
                  key: slack.broadcastChannelID
                  name: app-secrets
            - name: slack.signingSecret
              valueFrom:
                secretKeyRef:
                  key: slack.signingSecret
                  name: app-secrets
            - name: incidentDocTemplateURL
              valueFrom:
                secretKeyRef:
                  key: incidentDocTemplateURL
                  name: app-secrets
            - name: server.prometheusNamespace
              valueFrom:
                configMapKeyRef:
                  key: server.prometheusNamespace
                  name: devopsbot-settings
            - name: incident.environments
              valueFrom:
                configMapKeyRef:
                  key: incident.environments
                  name: devopsbot-settings
            - name: incident.regions
              valueFrom:
                configMapKeyRef:
                  key: incident.regions
                  name: devopsbot-settings
            - name: incident.severityLevels
              valueFrom:
                configMapKeyRef:
                  key: incident.severityLevels
                  name: devopsbot-settings
            - name: incident.impactLevels
              valueFrom:
                configMapKeyRef:
                  key: incident.impactLevels
                  name: devopsbot-settings
            - name: addr
              valueFrom:
                configMapKeyRef:
                  key: addr
                  name: devopsbot-settings
            - name: tls.addr
              valueFrom:
                configMapKeyRef:
                  key: tls.addr
                  name: devopsbot-settings
            - name: tls.cert
              valueFrom:
                configMapKeyRef:
                  key: tls.cert
                  name: devopsbot-settings
            - name: tls.key
              valueFrom:
                configMapKeyRef:
                  key: tls.key
                  name: devopsbot-settings
            - name: verbose
              valueFrom:
                configMapKeyRef:
                  key: verbose
                  name: devopsbot-settings
            - name: trace
              valueFrom:
                configMapKeyRef:
                  key: trace
                  name: devopsbot-settings
          volumeMounts:
            - name: tls-cert
              mountPath: /var/devopsbot
              readOnly: true
      volumes:
        - name: tls-cert
          secret:
            secretName: <secret_with_tls_cert>
---

Local development

To run the bot locally a valid certificates is needed by using for example mkcert, and devopsbot need to resolve to 127.0.0.1:

$ mkcert devopsbot
$ sudo echo "127.0.0.1 devopsbot" >> /etc/hosts

Then, run a local copy after having provided values for parameters that are empty by default:

$ bin/devopsbot \
  --slack.botAccessToken=xoxb-.... \
  --slack.signingSecret=...

And access it at https://devopsbot:3443 or http://devopsbot:3333.

See the --help output for more flags.

To test devopsbot functionality, it must be accessible by Slack. Optionally use inlets to expose the locally running devopsbot to the Internet. The inlets server can run on a free tier EC2 instance. Make sure it is accessible from the whole Internet and port range is wide enough. Note its publicly accessible IPv4 IP. Run the inlets server from the EC2 instance. Run the inlets client from laptop where devopsbot is running.

Verify routes

One way of verifying routes is via manual POST requests:

Start the bot

Generate a Slack signature for the request body being investigated, in this case command=/devopsbot, for example via go:

package main

import (
  "crypto/hmac"
  "crypto/sha256"
  "encoding/hex"
  "fmt"
)

func main() {
  h := hmac.New(sha256.New, []byte("<signing secret, empty if not specified>"))
  _, _ = h.Write([]byte("v0:<UNIX time stamp>:command=%2Fdevopsbot"))
  computed := h.Sum(nil)
  fmt.Println(hex.EncodeToString(computed))
}

Post the request:

curl -X POST -H 'Content-type: application/x-www-form-urlencoded' -H 'X-Slack-Request-Timestamp: <UNIX time stamp>' -H 'X-Slack-Signature: v0=<signature from previous step>' --data 'command=%2Fdevopsbot' localhost:3333/bot/command

Troubleshooting

The bot assumes basic infrastructure to be working. A mitigation plan should be made if any of these systems fail:

The app runs via Slack which can become unavailable, check Slack System Status
The bot need to be deployed successfully to be available in Slack, check the deployment
The app Docker image is hosted on GitHub container registry which can become unavailable, check GitHub Status
No internet is available, check with your internet provider
No electricity is available, check with your electricity provider
Input devices work as expected, check keyboard and mouse

Incident management

Practice incident management skills so they become second nature, so you don't struggle to follow the process.

Implementing an incident management process

The steps for implementing an incident management process can be summarized as follows:

Define exit criteria for types of incidents
Avoid groupthink and formalize assessment of operations with a risk assessment matrix. Risk assessment could be combined with Analytic Hierarchy Process (AHP).
Appoint delegates for critical functions to avoid single points of failure
Agree organization wise on effort required for different levels of severity and priority
Define and automate response plans, and make sure communication section includes backup communication methods
Work with developers to create playbooks for all services, which need verification and approval process

Incident declaration automation

devopsbot automates incident declaration:

Incident responders and commanders should focus on the essential tasks of resolving the incident
Automate the repetitive steps in the incident management process
Keep affected stakeholders informed about the status of the incident
Standardize and document the procedure to lay the foundation for something to be improved upon

An incident cannot be planned for, the nature of it is that it just happens. Incidents happen all the time for everyone, it is an unfortunate but natural part of life. That does not mean they can be taken for granted. Teams should have a process for managing them and learn from them.

An incident is defined as something that:

Negatively affects customers
Was not planned for
Cannot be resolved within 1 hour

During incidents

The workflow during an incident is as follows:

incident declaration flow using the bot

After incidents

When an incident has been declared as resolved, there is a need to communicate the resolution and learn from the experience.

The workflow after an incident is suggested as follows:

after

Slack app setup

This page contains the Slack app manifest which can be used to configure the application. Make sure to set the correct scopes. Make sure to specify the address bot/command for the slash command.

_metadata:
  major_version: 1
  minor_version: 1
display_information:
  name: devopsbot
  description: DevOpsBot
  background_color: "#004492"
features:
  bot_user:
    display_name: devopsbot
    always_online: false
  slash_commands:
    - command: /devopsbot
      url: https://<domain>/bot/command
      description: DevOpsBot
      usage_hint: "[help, incident, resolve]"
      should_escape: false
oauth_config:
  scopes:
    user:
      - reminders:write
    bot:
      - channels:manage
      - channels:read
      - chat:write
      - chat:write.customize
      - commands
      - groups:read
      - groups:write
      - im:read
      - im:write
      - incoming-webhook
      - mpim:read
      - mpim:write
      - users:read
settings:
  interactivity:
    is_enabled: true
    request_url: https://<domain>/bot/interactive
  org_deploy_enabled: false
  socket_mode_enabled: false
  token_rotation_enabled: false

Localizing the bot

The locale of the bot is set depending on the user's language preference in Slack.

go-i18n manages the translations.

The web page for go-i18n describes the procedures for translating a new language and new messages, but they are describe here too. First get the tool: go get -u github.com/nicksnyder/go-i18n/v2/goi18n. The binary will be installed into $GOPATH.

Translate a new language

If there is a new language to be added:

Create an empty message file for the new language, for example Finnish: touch translate.fi.json
Run goi18n merge active.en.json translate.fi.json to populate translate.fi.json with the messages to be translated
Translate translate.fi.json and rename it to active.fi.json
Load active.fi.json into the bundle

Translate new messages

If there are new strings to be translated:

Run goi18n extract -format json to update active.en.json with new messages
Run goi18n merge active.*.json to generate updated translate.*.json files
Translate all the messages in the translate.*.json files
Run goi18n merge active.*.json translate.*.json to merge the translated messages into the active message files