Skip to content

Commit 5d660c5

Browse files
committed
feat(examples/sks): add SKS example
1 parent 061512c commit 5d660c5

12 files changed

Lines changed: 746 additions & 282 deletions

File tree

Lines changed: 255 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,256 @@
1-
= Deployment On Exoscale SKS
1+
= Deployment on Exoscale SKS
22

3-
_Work In Progress_
3+
An example of a local deployment on Exoscale SKS is provided https://github.com/camptocamp/devops-stack/tree/main/examples/sks[here]. Clone this repository and modify the files at your convenience.
4+
In the folder, as in a standard https://developer.hashicorp.com/terraform/tutorials/modules/module#what-is-a-terraform-module[Terraform module], you will find the following files:
5+
6+
* *`terraform.tf`* - declaration of the Terraform providers used in this project as well as their configuration;
7+
* *`locals.tf`* - local variables used by the DevOps Stack modules;
8+
* *`variables.tf`* - definition of the variables that pass the credentials required for the S3 provider;
9+
* *`main.tf`* - definition of all the deployed modules;
10+
* *`dns.tf`* - definition of some of the DNS resources required for the base domain;
11+
* *`s3_buckets.tf`* - creation of the required S3 buckets needed by Longhorn, Loki and Thanos;
12+
* *`outputs.tf`* - the output variables of the DevOps Stack, e.g. credentials and the `.kubeconfig` file to use with `kubectl`;
13+
14+
== Requirements
15+
16+
On your local machine, you need to have the following tools installed:
17+
18+
* https://www.terraform.io/[Terraform] to provision the whole stack;
19+
* https://kubernetes.io/docs/reference/kubectl/[`kubectl`] or https://github.com/derailed/k9s[`k9s`]to interact with your cluster;
20+
* https://community.exoscale.com/documentation/tools/exoscale-command-line-interface/[Exoscale CLI] to interact with your Exoscale account;
21+
* https://dev.to/camptocamp-ops/simple-secret-sharing-with-gopass-and-summon-40jk[`gopass` and `summon`] to easily pass the IAM secrets as environment variables when running `terraform` commands;
22+
23+
Other than that, you will require the following:
24+
25+
* an Exoscale account;
26+
* an Exoscale IAM key with at least the tags `Compute`, `DBAAS`, `DNS`, `IAM` and `SOS`, which you can create in the Exoscale portal (you can use your personal administrator IAM key, but it is best you create a dedicated IAM key for this deployment);
27+
* a domain name and a DNS subscription on the Exoscale account;
28+
* an AWS account and associated IAM key in order to have a S3 bucket and DynamoDB to store the Terraform state (optional you if you choose to store the Terraform state locally, *which is not recommended in production*);
29+
30+
== Specificities and explanations
31+
32+
=== `secrets.yml`
33+
34+
TIP: Check https://dev.to/camptocamp-ops/simple-secret-sharing-with-gopass-and-summon-40jk[this blog post] for more information on how to configure `gopass` and `summon` to work together.
35+
36+
For simplicity and ease of use, as well as security, the example uses `gopass` and `summon` to pass the IAM credentials to the Terraform commands. The `secrets.yml` file contains the path to the the secret values on the `gopass` password store. On execution, the `summon` command will then read the `secrets.yml` file and pass the credentials as environment variables to the Terraform commands.
37+
38+
The commands presented on this tutorial all use the `summon` command.
39+
40+
=== Remote Terraform state
41+
42+
If you do not want to configure the remote Terraform state backend, you can simply remove the `backend` block from the `terraform.tf` file.
43+
44+
NOTE: Exoscale has an https://github.com/exoscale/terraform-provider-exoscale/tree/master/examples/sos-backend[example] for configuring Terraform to use SOS buckets as a backend for the Terraform state. However, at the time of writing, SOS buckets did not support encryption and there was no equivalent to DynamoDB to have the state lock feature, so in the end we preferred to use S3 buckets on AWS as a backend.
45+
46+
NOTE: More information about the remote backends is available on the https://developer.hashicorp.com/terraform/language/settings/backends/configuration[official documentation].
47+
48+
=== S3 buckets
49+
50+
The _Simple Object Storage_ (SOS) service provided by Exoscale follows the S3 specification. The Exoscale provider does not provide a way to create S3 buckets on their service. As recommended by their documentation, you have to use the AWS provider to create the S3 buckets.
51+
52+
Since we are already using the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables to configure the Terraform backend, we cannot use them to configure the `aws` provider to create the S3 buckets. Because of that, we have to have two Terraform variables, `exoscale_iam_key` and `exoscale_iam_secret`, to pass the Exoscale IAM credentials to the `aws` provider. The values of these two variables are then set using the `TF_VAR_exoscale_iam_key` and `TF_VAR_exoscale_iam_secret` environment variables.
53+
54+
Your `aws` provider configuration should then look something like this:
55+
56+
[source,terraform]
57+
----
58+
provider "aws" {
59+
endpoints {
60+
s3 = "https://sos-${local.zone}.exo.io"
61+
}
62+
63+
region = local.zone
64+
65+
access_key = var.exoscale_iam_key
66+
secret_key = var.exoscale_iam_secret
67+
68+
# Skip validations specific to AWS in order to use this provider for Exoscale services
69+
skip_credentials_validation = true
70+
skip_requesting_account_id = true
71+
skip_metadata_api_check = true
72+
skip_region_validation = true
73+
}
74+
----
75+
76+
TIP: If you are not using the remote Terraform state, you can use the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables with the Exoscale IAM key to then configure the `aws` provider. Don not forget to remove the `access_key` and `secret_key` values from said provider block.
77+
78+
=== DNS and the `base_domain` variable
79+
80+
As-is, the code from the example requires a DNS subscription on the Exoscale account and unique domain in order to create a DNS zone on the Exoscale DNS service.
81+
82+
You can bypass this requirement by deleting the `dns.tf` file and by not passing a value to the `base_domain` variable of the cluster module. This will make the cluster module return you a `nip.io` domain prefixed with the IP of the NLB
83+
84+
NOTE: It is for this reason that every other DevOps Stack module receives the `base_domain` variable from the output `module.sks.base_domain` instead of using the `local.base_domain`.
85+
86+
TIP: Check the xref:sks:ROOT:README.adoc[cluster module documentation] for more information on the `base_domain` variable.
87+
88+
=== OIDC authentication
89+
90+
IMPORTANT: The DevOps Stack modules are developed with OIDC in mind. In production, you should have an identity provider that supports OIDC and use it to authenticate to the DevOps Stack applications.
91+
92+
TIP: You can have a local containing the OIDC configuration properly structured for the DevOps Stack applications and simply use an external OIDC provider instead of using Keycloak. Check https://github.com/camptocamp/devops-stack-module-keycloak/blob/main/oidc_bootstrap/locals.tf[this `locals.tf` on the Keycloak module] for an example.
93+
94+
To quickly deploy a testing environment on SKS you can use the Keycloak module, as shown in the example.
95+
96+
After deploying Keycloak, you can use the OIDC bootstrap module to create the Keycloak realm, groups, users, etc.
97+
98+
The `user_map` variable of that module allows you to create OIDC users used to authenticate to the DevOps Stack applications. The module will generate a password for each user, which you can check later after the deployment.
99+
100+
TIP: If you do not provide a value for the `user_map` variable, the module will create a user named `devopsadmin` with a random password.
101+
102+
=== Let's Encrypt SSL certificates
103+
104+
By default, to avoid rate-limiting your domain by Let's Encrypt, the example uses the `letsencrypt-staging` configuration of the cert-manager module to generate certificates using the Let's Encrypt staging environment which has an invalid CA certificate.
105+
106+
If you feel ready to test with production certificates, you can simply edit the `locals.tf` file and change the `cluster_issuer` variable to `letsencrypt-prod`.
107+
108+
== Deployment
109+
110+
1. Clone the repository and `cd` into the `examples/sks` folder;
111+
112+
2. Adapt the `secrets.yml` file to point to the correct path on your `gopass` password store;
113+
114+
3. Check out the modules you want to deploy in the `main.tf` file, and comment out the others;
115+
+
116+
TIP: You can also add your own Terraform modules in this file or any other file on the root folder. A good place to start to write your own module is to clone the https://github.com/camptocamp/devops-stack-module-template[devops-stack-module-template] repository and adapt it to your needs.
117+
118+
4. On the `oidc` module, adapt the `user_map` variable as you wish (please check the <<oidc-authentication,OIDC section>> for more information).
119+
120+
5. From the source of the example deployment, initialize the Terraform modules and providers:
121+
+
122+
[source,bash]
123+
----
124+
summon terraform init
125+
----
126+
127+
6. Configure the variables in `locals.tf` to your preference:
128+
+
129+
IMPORTANT: The `cluster_name` must be unique for each DevOps Stack deployment in a single Exoscale account.
130+
+
131+
TIP: The xref:sks:ROOT:README.adoc[cluster module documentation] can help you know what to put in the `kubernetes_version`, `zone` and `service_level` variables.
132+
+
133+
[source,terraform]
134+
----
135+
include::example$deploy_examples/sks/locals.tf[]
136+
----
137+
138+
7. Finally, run `terraform apply` and accept the proposed changes to create the Kubernetes nodes on Exoscale SKS and populate them with our services;
139+
+
140+
[source,bash]
141+
----
142+
summon terraform apply
143+
----
144+
145+
8. After the first deployment (please note the troubleshooting step related with kube-prometheus-stack and Argo CD), you can go to the `locals.tf` and enable the _ServiceMonitor_ boolean to activate the Prometheus exporters that will send metrics to Prometheus;
146+
+
147+
IMPORTANT: This flag needs to be set as `false` for the first bootstrap of the cluster, otherwise the applications will fail to deploy while the Custom Resource Definitions of the kube-prometheus-stack are not yet created.
148+
+
149+
NOTE: You can either set the flag as `true` in the `locals.tf` file or you can simply delete the line on the modules' declarations, since this variable is set as `true` by default on each module.
150+
+
151+
TIP: Take note of the local called `app_autosync`. If you set the condition of the ternary operator to `false` you will disable the auto-sync for all the DevOps Stack modules. This allows you to choose when to manually sync the module on the Argo CD interface and is useful for troubleshooting purposes.
152+
153+
== Access the cluster and the DevOps Stack applications
154+
155+
You can use the content of the `kubernetes_kubeconfig` output to manually generate a Kubeconfig file or you can use the Exoscale CLI to recover a new one.
156+
157+
IMPORTANT: Note that if you use the `kubernetes_kubeconfig` output, you will be using exactly the same credentials that the Terraform code uses to interact with the cluster, so it's best to avoid it.
158+
159+
To use the Exoscale CLI, you can run the following command:
160+
161+
[source,bash]
162+
----
163+
summon exo compute sks kubeconfig YOUR_CLUSTER_NAME kube-admin --zone YOUR_CLUSTER_ZONE --group system:masters > ~/.kube/NAME_TO_GIVE_YOUR_CONFIG.config
164+
----
165+
166+
Then you can use the `kubectl` or `k9s` command to interact with the cluster:
167+
168+
[source,bash]
169+
----
170+
k9s --kubeconfig ~/.kube/NAME_TO_GIVE_YOUR_CONFIG.config
171+
----
172+
173+
As for the DevOps Stack applications, you can access them through the ingress domain that you can find in the `ingress_domain` output. If you used the code from the example without modifying the outputs, you will see something like this on your terminal after the `terraform apply` has done its job:
174+
175+
[source,shell]
176+
----
177+
Outputs:
178+
179+
ingress_domain = "your.domain.here"
180+
keycloak_admin_credentials = <sensitive>
181+
keycloak_users = <sensitive>
182+
kubernetes_kubeconfig = <sensitive>
183+
----
184+
185+
Or you can use `kubectl` to get all the ingresses and their respective URLs:
186+
187+
[source,bash]
188+
----
189+
kubectl get ingress --all-namespaces --kubeconfig ~/.kube/NAME_TO_GIVE_YOUR_CONFIG.config
190+
----
191+
192+
The password for the Keycloak admin user is available in the `keycloak_admin_credentials` output and the users are available in the `keycloak_users` output:
193+
194+
[source,bash]
195+
----
196+
summon terraform output keycloak_users
197+
----
198+
199+
== Stop the cluster
200+
201+
To definitively stop the cluster on a single command (that is the reason we delete some resources from the state file), you can use the following command:
202+
203+
[source,bash]
204+
----
205+
summon terraform state rm $(summon terraform state list | grep "argocd_application\|argocd_project\|kubernetes_\|helm_\|keycloak_") && summon terraform destroy
206+
----
207+
208+
== Conclusion
209+
210+
That's it, you now have a fully functional Kubernetes cluster in Exoscale SKS with the DevOps Stack applications deployed on it. For more information, keep on reading the https://devops-stack.io/docs/latest/[documentation]. **You can explore the possibilities of each module and get the link to the source code on their respective documentation pages.**
211+
212+
== Troubleshooting
213+
214+
=== `connection_error` during the first deployment
215+
216+
In some cases, you could encounter an error like these the first deployment:
217+
218+
[source,shell]
219+
----
220+
221+
│ Error: error while waiting for application kube-prometheus-stack to be created
222+
223+
│ with module.kube-prometheus-stack.module.kube-prometheus-stack.argocd_application.this,
224+
│ on ../../devops-stack-module-kube-prometheus-stack/main.tf line 91, in resource "argocd_application" "this":
225+
│ 91: resource "argocd_application" "this" {
226+
227+
│ error while waiting for application kube-prometheus-stack to be synced and healthy: rpc
228+
│ error: code = Unavailable desc = connection error: desc = "transport: error while dialing:
229+
│ dial tcp 127.0.0.1:46649: connect: connection refused"
230+
231+
----
232+
233+
[source,shell]
234+
----
235+
236+
│ Error: error while waiting for application argocd to be created
237+
238+
│ with module.argocd.argocd_application.this,
239+
│ on .terraform/modules/argocd/main.tf line 55, in resource "argocd_application" "this":
240+
│ 55: resource "argocd_application" "this" {
241+
242+
│ error while waiting for application argocd to be synced and healthy: rpc error: code = Unavailable desc = error reading from server: EOF
243+
244+
----
245+
246+
In the case of the Argo CD module, the error is due to the way we provision Argo CD on the final steps of the deployment. We use the bootstrap Argo CD to deploy the final Argo CD module, which causes a redeployment of Argo CD and consequently a momentary loss of connection between the Argo CD Terraform provider and the Argo CD server.
247+
248+
As for the kube-prometheus-stack module, this error only appeared on the SKS platform. We are still investigating the root cause of this issue.
249+
250+
*You can simply re-run the command `summon terraform apply` to finalize the bootstrap of the cluster every time you encounter this error.*
251+
252+
=== Argo CD interface reload loop when clicking on login
253+
254+
If you encounter a loop when clicking on the login button on the Argo CD interface, you can try to delete the Argo CD server pod and let it be recreated.
255+
256+
TIP: For more informations about the Argo CD module, please refer to the xref:argocd:ROOT:README.adoc[respective documentation page].

examples/sks/.terraform-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
1.5.2

examples/sks/apps.tf

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
module "helloworld_apps" {
2+
source = "git::https://github.com/camptocamp/devops-stack-module-applicationset.git?ref=v2.0.1"
3+
4+
dependency_ids = {
5+
argocd = module.argocd.id
6+
}
7+
8+
name = "helloworld-apps"
9+
argocd_namespace = module.argocd_bootstrap.argocd_namespace
10+
project_dest_namespace = "*"
11+
project_source_repo = "https://github.com/camptocamp/devops-stack-helloworld-templates.git"
12+
13+
generators = [
14+
{
15+
git = {
16+
repoURL = "https://github.com/camptocamp/devops-stack-helloworld-templates.git"
17+
revision = "main"
18+
19+
directories = [
20+
{
21+
path = "apps/*"
22+
}
23+
]
24+
}
25+
}
26+
]
27+
template = {
28+
metadata = {
29+
name = "{{path.basename}}"
30+
}
31+
32+
spec = {
33+
project = "helloworld-apps"
34+
35+
source = {
36+
repoURL = "https://github.com/camptocamp/devops-stack-helloworld-templates.git"
37+
targetRevision = "main"
38+
path = "{{path}}"
39+
40+
helm = {
41+
valueFiles = []
42+
# The following value defines this global variables that will be available to all apps in apps/*
43+
# These are needed to generate the ingresses containing the name and base domain of the cluster.
44+
values = <<-EOT
45+
cluster:
46+
name: "${module.sks.cluster_name}"
47+
domain: "${module.sks.base_domain}"
48+
issuer: "${local.cluster_issuer}"
49+
apps:
50+
longhorn: true
51+
grafana: true
52+
prometheus: true
53+
thanos: true
54+
EOT
55+
}
56+
}
57+
58+
destination = {
59+
name = "in-cluster"
60+
namespace = "{{path.basename}}"
61+
}
62+
63+
syncPolicy = {
64+
automated = {
65+
allowEmpty = false
66+
selfHeal = true
67+
prune = true
68+
}
69+
syncOptions = [
70+
"CreateNamespace=true"
71+
]
72+
}
73+
}
74+
}
75+
}

examples/sks/dns.tf

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,18 @@
1-
data "exoscale_domain" "this" {
2-
name = module.sks.base_domain
1+
# Requires a subscription to Exoscale DNS service, which should be mannually activated on the web console.
2+
# If using nip.io, which is deployed automatically, both these resources are not needed and should be commented
3+
# or deleted.
4+
5+
resource "exoscale_domain" "domain" {
6+
name = local.base_domain
37
}
48

9+
# This resource should be deactivated if there are multiple development clusters on the same account.
510
resource "exoscale_domain_record" "wildcard" {
6-
domain = data.exoscale_domain.this.id
11+
count = local.activate_wildcard_record ? 1 : 0
12+
13+
domain = resource.exoscale_domain.domain.id
714
name = "*.apps"
8-
record_type = "CNAME"
15+
record_type = "A"
916
ttl = "300"
10-
prio = 1 # because bug in exoscale provider 0.39.0
11-
content = format("default.apps.%s.%s", module.sks.cluster_name, module.sks.base_domain)
17+
content = module.sks.nlb_ip_address
1218
}

examples/sks/locals.tf

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
locals {
2+
kubernetes_version = "1.27.4"
3+
cluster_name = "YOUR_CLUSTER_NAME"
4+
zone = "YOUR_CLUSTER_ZONE"
5+
service_level = "starter"
6+
base_domain = "your.domain.here"
7+
activate_wildcard_record = true
8+
cluster_issuer = "letsencrypt-staging"
9+
enable_service_monitor = false # Can be enabled after the first bootstrap.
10+
app_autosync = true ? { allow_empty = false, prune = true, self_heal = true } : {}
11+
}

0 commit comments

Comments
 (0)