|
| 1 | +--- |
| 2 | +id: 236 |
| 3 | +state: approved |
| 4 | +created: 2023-03-30 |
| 5 | +updated: 2023-03-30 |
| 6 | +placement: |
| 7 | + category: resource-design |
| 8 | + order: 240 |
| 9 | +--- |
| 10 | + |
| 11 | +# Policy preview |
| 12 | + |
| 13 | +A policy is a resource that provides rules that admit or deny access to other |
| 14 | +resources. Generally, the outcome of a policy can be evaluated to a specific set |
| 15 | +of outcomes. |
| 16 | + |
| 17 | +Changes to policies without proper validation may have unintended consequences |
| 18 | +that can severely impact a customer’s overall infrastructure setup. To safely |
| 19 | +update resources, it is beneficial to test these changes via policy rollout |
| 20 | +APIs. |
| 21 | + |
| 22 | +Preview is a rollout safety mechanism for policy resources, which gives the |
| 23 | +customer the ability to validate the effect of their proposed changes against |
| 24 | +production traffic prior to the changes going live. The result of the policy |
| 25 | +evaluation against traffic is logged in order to give the customer the data |
| 26 | +required to test the correctness of the change. |
| 27 | + |
| 28 | +Firewall policies exemplify a case that is suitable for previewing. A new |
| 29 | +configuration can be evaluated against traffic to observe which IPs would be |
| 30 | +allowed or denied. This gives the customer the data to guide a decision on |
| 31 | +whether to promote the proposed changes to live. |
| 32 | + |
| 33 | +The expected flow for previewing a policy is as follows: |
| 34 | + |
| 35 | +1. The user creates an experiment containing a new policy configuration |
| 36 | + intended to replace the live policy. |
| 37 | +2. The user uses the "preview" method to start generating logs which compare |
| 38 | + the live and experiment policy evaluations against live traffic. |
| 39 | +3. The user inspects the logs to determine whether the experiment has the |
| 40 | + intended result. |
| 41 | +4. The user uses the "commit" method to promote the experiment to live. |
| 42 | + |
| 43 | +## Guidance |
| 44 | + |
| 45 | +### Non-goals |
| 46 | + |
| 47 | +This proposal is for a safety mechanism for policy rollouts only. Safe rollouts |
| 48 | +for non-policy resources are not in scope. |
| 49 | + |
| 50 | +### Experiments |
| 51 | + |
| 52 | +A new configuration of a policy to be previewed is stored as a nested collection |
| 53 | +under the policy. These nested collections are known as experiments. |
| 54 | + |
| 55 | +A hypothetical policy resource called, `Policy`, is used throughout. It has the |
| 56 | +following resource name pattern: |
| 57 | + |
| 58 | +`projects/{project}/locations/{location}/policies/{policy}` |
| 59 | + |
| 60 | +The experimental versions of the resource used for previewing or other safe |
| 61 | +rollout practices are represented as a nested collection under `Policy` using a |
| 62 | +new resource type. The resource type **must** follow the naming convention |
| 63 | +*RegularResourceType*`Experiment`. |
| 64 | + |
| 65 | +The following pattern is used for the experiment collection: |
| 66 | + |
| 67 | +`projects/{project}/locations/{location}/policies/{policy}/experiments/{experiment}` |
| 68 | + |
| 69 | +A proto used to represent an experiment **must** contain the following: |
| 70 | + |
| 71 | + 1. The required top-level fields for a resource, like `name` and `etag` |
| 72 | + 2. The policy message that is being tested itself |
| 73 | + 3. The field, `preview_metadata`, which contains metadata specific to |
| 74 | + previewing the experiment of a specific resource type. |
| 75 | + |
| 76 | +```proto |
| 77 | +message PolicyExperiment { |
| 78 | +
|
| 79 | + // The resource name of the PolicyExperiment. |
| 80 | + string name = 1; |
| 81 | + |
| 82 | + // The policy experiment. This Policy will be used to preview the effects of |
| 83 | + // the change but will not affect life traffic. |
| 84 | + Policy policy = 2; |
| 85 | +
|
| 86 | + // The metadata associated with this policy experiment. |
| 87 | + PolicyPreviewMetadata preview_metadata = 3 |
| 88 | + [(google.api.field_behavior) = OUTPUT_ONLY]; |
| 89 | +} |
| 90 | +``` |
| 91 | + |
| 92 | +- The experiment proto **must** have a top-level field with the same type as the |
| 93 | + live policy. |
| 94 | + - It **must** be named as the live resource type. For example, if the |
| 95 | + experiment is for FirewallPolicy, then this field **must** be named |
| 96 | + `firewall_policy`. |
| 97 | + - The name inside the embedded `policy` message **must** be the name of the |
| 98 | + live policy. |
| 99 | +- When the user is ready to promote an experiment, they **must** copy the |
| 100 | + `policy` message into the live policy and delete the experiment. This can be |
| 101 | + done manually or via a "commit" custom method. |
| 102 | +- A product **may** support multiple experiments concurrently being previewed |
| 103 | + for a single live policy. |
| 104 | + - Each experiment must generate logs having each entry preceded by log_prefix |
| 105 | + so that the user can compare the results of the experiment with the behavior |
| 106 | + of the live policy. |
| 107 | + - The number of experimental configurations for a given live policy **may** be |
| 108 | + capped at a certain number and the cap **must** be documented. |
| 109 | +- Cascading deletes **must** occur: if the live policy is deleted, all |
| 110 | + experiments **must** also be deleted. |
| 111 | + |
| 112 | +### Metadata |
| 113 | + |
| 114 | +`preview_metadata` tracks all metadata of previewing the experiment. The |
| 115 | +messages **must** follow the convention: *RegularResourceType*`PreviewMetadata`. |
| 116 | +This is so the proto can be defined uniquely for each resource type in the |
| 117 | +same service with experiments. |
| 118 | + |
| 119 | +```proto |
| 120 | +message PolicyPreviewMetadata { |
| 121 | + // The state of previewing the experiment. The possible values are ACTIVE |
| 122 | + // and INACTIVE. ACTIVE indicates that results are being logged upstream. |
| 123 | + string state = 1; |
| 124 | +
|
| 125 | + // An identifying string common to all logs generated when previewing the |
| 126 | + // experiment. Searching all logs for this string will isolate the results. |
| 127 | + string log_prefix = 2; |
| 128 | +
|
| 129 | + // The most recent time at which this experiment started previewing. |
| 130 | + google.protobuf.Timestamp start_time = 3; |
| 131 | +
|
| 132 | + // The most recent time at which this experiment stopped previewing. |
| 133 | + google.protobuf.Timestamp stop_time = 4; |
| 134 | +} |
| 135 | +``` |
| 136 | + |
| 137 | +- `PolicyPreviewMetadata` **must** have the fields defined in the proto above. |
| 138 | + - `state` **must** be `ACTIVE` or `INACTIVE`. |
| 139 | + - It **may** have additional fields if the service or resource requires it. |
| 140 | +- When an experiment is first previewed, `preview_metadata` **must** be |
| 141 | + absent. |
| 142 | + - It is present on the experiment once the "preview" method is used. |
| 143 | +- All `preview_metadata` fields **must** be output only. |
| 144 | +- `state` changes to and from `ACTIVE` and `INACTIVE` when the experiment is |
| 145 | + started or stopped, which can only be done by the "preview" and "stop" custom |
| 146 | + methods. |
| 147 | +- The first time the "preview" custom method is used, the system **must** create |
| 148 | + `preview_metadata` and do the following: |
| 149 | + - It **must** set the `state` to `ACTIVE` |
| 150 | + - It **must** populate `start_time` with the current time. |
| 151 | + - `start_time` **must** be updated every time the status is changed to |
| 152 | + `ACTIVE`. |
| 153 | + - It **must** set a system generated `log_prefix` string, which is a |
| 154 | + predefined constant hard coded by the system developers. |
| 155 | + - The same value is used for previewing experiments for the given resource |
| 156 | + type. For example, "FirewallPolicyPreviewLog" for FirewallPolicy. |
| 157 | +- When the "stop" custom method is used, the system **must** do the following: |
| 158 | + - It **must** set the `state` to `INACTIVE` |
| 159 | + - It **must** populate the `stop_time` with the current time. |
| 160 | + |
| 161 | +### Methods |
| 162 | + |
| 163 | +#### create |
| 164 | + |
| 165 | +- The resource **must** be created using long-running |
| 166 | + [Create][aip-133-long-running] and |
| 167 | + `google.longrunning.operation_info.response_type` **must** be |
| 168 | + `PolicyExperiment`. |
| 169 | +- Creating a new experiment to preview **must** support the following use |
| 170 | +cases: |
| 171 | + - Preview a new policy. |
| 172 | + - Preview an update to an already live policy. |
| 173 | + - Preview a deletion of a current policy. |
| 174 | +- For the update and delete use cases, the `policy` field in the experiment |
| 175 | + **must** have the full payload of the live policy copied into it, including |
| 176 | + the name. |
| 177 | + - The user **must** set the rules to the new intended state to preview an |
| 178 | + update. |
| 179 | + - The user **must** set set the rules to represent a no-op to preview a |
| 180 | + delete. |
| 181 | +- To preview a new policy, the system must do the following: |
| 182 | + - If the system does not support a nested collection without a live policy, |
| 183 | + the user **must** create a live policy and set the rules to represent a |
| 184 | + no-op. For example, the rules of a no-op policy **may** be empty. |
| 185 | + - An experiment is created as a child of the no-op policy. |
| 186 | +- If the system supports previewing multiple experiments for a live policy, |
| 187 | + calling "create" more than once **must** create multiple experiments. |
| 188 | + |
| 189 | +#### update |
| 190 | + |
| 191 | +- The resource **must** be updated using long-running |
| 192 | + [Update][aip-134-long-running] and |
| 193 | + `google.longrunning.operation_info.response_type` **must** be |
| 194 | + `PolicyExperiment`. |
| 195 | +- The name inside `policy` **must not** change but the other fields can in |
| 196 | + order to change the experiment being previewed because this `policy` is |
| 197 | + intended to replace the live policy, and the name of the live policy |
| 198 | + **must not** change. |
| 199 | +- The system **must** set the `state` to INACTIVE if the `state` was ACTIVE at |
| 200 | + the time of an update. |
| 201 | + - This is so the user can easily distinguish between different versions of |
| 202 | + the experiment being previewed. |
| 203 | + |
| 204 | +#### get |
| 205 | +- The standard method, [Get][aip-131], **must** be included for |
| 206 | + `PolicyExperiment` resource types. |
| 207 | + |
| 208 | +#### list |
| 209 | + |
| 210 | +- The standard method, [List][aip-132], **must** be included for |
| 211 | + `PolicyExperiment` resource types. |
| 212 | +- Filtering on `PolicyPreviewMetadata` indicates which experiments are actively |
| 213 | + previewed. |
| 214 | + - For example, the following filter string returns a List response with |
| 215 | + experiments being previewed: preview_metadata.state = ACTIVE. |
| 216 | + |
| 217 | +#### delete |
| 218 | + |
| 219 | +- The resource **must** be deleted using long-running |
| 220 | + [Delete][aip-135-long-running] and |
| 221 | + `google.longrunning.operation_info.response_type` **must** be |
| 222 | + `PolicyExperiment`. |
| 223 | + |
| 224 | +#### preview |
| 225 | + |
| 226 | +```proto |
| 227 | +// Starts previewing a PolicyExperiment. This triggers the system to start |
| 228 | +// generating logs to evaluate the PolicyExperiment. |
| 229 | +rpc PreviewPolicyExperiment(PreviewPolicyExperimentRequest) |
| 230 | + returns (google.longrunning.Operation) { |
| 231 | + option (google.api.http) = { |
| 232 | + post: "/v1/{name=policies/*/experiments/*}:preview" |
| 233 | + body: "*" |
| 234 | + }; |
| 235 | + option (google.longrunning.operation_info) = { |
| 236 | + response_type: "PolicyExperiment" |
| 237 | + metadata_type: "PreviewPolicyExperimentMetadata" |
| 238 | + }; |
| 239 | +} |
| 240 | +
|
| 241 | +// The request message for the preview custom method. |
| 242 | +message PreviewPolicyExperimentRequest { |
| 243 | + // The name of the PolicyExperiment. |
| 244 | + string name = 1; |
| 245 | +} |
| 246 | +``` |
| 247 | + |
| 248 | +- This custom method is required. |
| 249 | +- `google.longrunning.Operation.metadata_type` **must** follow guidance on |
| 250 | + [Long-running operations][aip-151] |
| 251 | +- This method **must** trigger the system to start generating logs to preview |
| 252 | + the experiment. |
| 253 | +- Whenever the method is called successfully, the system **must** set the |
| 254 | + following values in the `PolicyPreviewMetadata`: |
| 255 | + - `log_prefix` to the predefined constant. |
| 256 | + - `start_time` to the current time |
| 257 | + - `state` to `ACTIVE`. |
| 258 | +- If the method is called on an experiment with the rules representing a no-op, |
| 259 | + then the system **must** preview the deletion of the live policy. |
| 260 | + |
| 261 | +#### stop |
| 262 | + |
| 263 | +```proto |
| 264 | +// Stops previewing a PolicyExperiment. This triggers the system to stop |
| 265 | +// generating logs to evaluate the PolicyExperiment. |
| 266 | +rpc StopPolicyExperiment(StopPolicyExperimentRequest) |
| 267 | + returns (google.longrunning.Operation) { |
| 268 | + option (google.api.http) = { |
| 269 | + post: "/v1/{name=policies/*/experiments/*}:stop" |
| 270 | + body: "*" |
| 271 | + }; |
| 272 | + option (google.longrunning.operation_info) = { |
| 273 | + response_type: "PolicyExperiment" |
| 274 | + metadata_type: "StopPolicyExperimentMetadata" |
| 275 | + }; |
| 276 | +} |
| 277 | +
|
| 278 | +// The request message for the stop custom method. |
| 279 | +message StopPolicyExperimentRequest { |
| 280 | + // The name of the PolicyExperiment. |
| 281 | + string name = 1; |
| 282 | +} |
| 283 | +``` |
| 284 | + |
| 285 | +- This custom method is required. |
| 286 | +- `google.longrunning.Operation.metadata_type` **must** follow guidance on |
| 287 | + [Long-running operations][aip-151] |
| 288 | +- This method **must** trigger the system to stop generating logs to preview the |
| 289 | + experiment. |
| 290 | +- Whenever the method is called successfully, the system **must** set the |
| 291 | + following values in the `PolicyPreviewMetadata`: |
| 292 | + - `stop_time` to the current time |
| 293 | + - `state` to `INACTIVE` |
| 294 | + |
| 295 | +#### commit |
| 296 | + |
| 297 | +The resource **may** expose a new custom method called "commit" to promote an |
| 298 | +experiment. The system copies `policy` from the experiment into the live policy |
| 299 | +and then deletes the experiment. |
| 300 | + |
| 301 | +Declarative clients **may** manually copy fields from an experiment into the |
| 302 | +live policy and then delete the experiment rather than calling "commit" if |
| 303 | +preferable. |
| 304 | + |
| 305 | +```proto |
| 306 | +// Commits a PolicyExperiment. This copies the PolicyExperiment's policy message |
| 307 | +// to the live policy then deletes the PolicyExperiment. |
| 308 | +rpc CommitPolicyExperiment(CommitPolicyExperimentRequest) |
| 309 | + returns (google.longrunning.Operation) { |
| 310 | + option (google.api.http) = { |
| 311 | + post: "/v1/{name=policies/*/experiments/*}:commit" |
| 312 | + body: "*" |
| 313 | + }; |
| 314 | + option (google.longrunning.operation_info) = { |
| 315 | + response_type: "google.protobuf.Empty" |
| 316 | + metadata_type: "CommitPolicyExperimentMetadata" |
| 317 | + }; |
| 318 | +} |
| 319 | +
|
| 320 | +// The request message for the commit custom method. |
| 321 | +message CommitPolicyExperimentRequest { |
| 322 | + string name = 1; |
| 323 | + string etag = 2; |
| 324 | + string parent_etag = 3; |
| 325 | +} |
| 326 | +``` |
| 327 | + |
| 328 | +- `google.longrunning.Operation.metadata_type` **must** follow guidance on |
| 329 | + [Long-running operations][aip-151] |
| 330 | +- The method **must** atomically copy `policy` from the experiment into the live |
| 331 | + policy, and then delete the experiment. |
| 332 | +- If any experiment fails "commit", previewing it **must not** stop, and the |
| 333 | + live policy **must not** be updated. |
| 334 | +- The method can be called on an experiment in any state. |
| 335 | +- The `etag` **must** match that of the experiment in order for commit to be |
| 336 | + successful. This is so the user does not commit an unintended version of the |
| 337 | + experiment. |
| 338 | + - If no `etag` is provided, the API **must not** succeed to prevent the user |
| 339 | + from unintentionally committing a different version of the experiment as |
| 340 | + intended. |
| 341 | + - A `parent_etag` **may** be provided to guarantee that the experiment |
| 342 | + overwrites a specific version of the live policy. |
| 343 | +- The method is not idempotent and calling it twice on the same experiment |
| 344 | + **must** return a 404 NOT_FOUND as the experiment is deleted as part of the |
| 345 | + first call. |
| 346 | + |
| 347 | +### Changes to live policy API methods |
| 348 | + |
| 349 | +#### delete |
| 350 | + |
| 351 | +- A delete of the live policy **must** delete all experiments. |
| 352 | +- To maintain the experiments while negating the effect of the live policy, the |
| 353 | + live policy **must** be changed to a no-op policy instead of using this |
| 354 | + method. |
| 355 | + |
| 356 | +### Logging |
| 357 | + |
| 358 | +Logging is crucial for the user to evaluate whether an experiment should be |
| 359 | +promoted to live. |
| 360 | + |
| 361 | +Logs **must** contain the results of the evaluated experiment, the `etag` |
| 362 | +associated with that experiment alongside that of the live policy, and be |
| 363 | +preceded by the value of `log_prefix`. |
| 364 | + - The `etag` fields help the user identify which |
| 365 | + configurations of the live and experiment are evaluated in the log. |
| 366 | + - `log_prefix` helps the user separate logs specifically generated for |
| 367 | + previewing the experiment from other use cases. |
| 368 | + |
| 369 | +Overall, these logs help the user make a decision about whether to promote the |
| 370 | +experiment to live. |
| 371 | + |
| 372 | +## Changelog |
| 373 | + |
| 374 | +- **2023-03-30:** Initial AIP written. |
| 375 | + |
| 376 | +[aip-131]: https://aip.dev/131 |
| 377 | +[aip-132]: https://aip.dev/132 |
| 378 | +[aip-133-long-running]: https://aip.dev/133#long-running-create |
| 379 | +[aip-134-long-running]: https://aip.dev/134#long-running-update |
| 380 | +[aip-135-long-running]: https://aip.dev/135#long-running-delete |
| 381 | +[aip-151]: https://google.aip.dev/151 |
0 commit comments