Skip to content

Commit 0bfd687

Browse files
authored
feat(AIP-236): Creating new AIP-236
Add a new AIP around policy preview. See #1030
1 parent 234a2ad commit 0bfd687

1 file changed

Lines changed: 381 additions & 0 deletions

File tree

aip/general/0236.md

Lines changed: 381 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,381 @@
1+
---
2+
id: 236
3+
state: approved
4+
created: 2023-03-30
5+
updated: 2023-03-30
6+
placement:
7+
category: resource-design
8+
order: 240
9+
---
10+
11+
# Policy preview
12+
13+
A policy is a resource that provides rules that admit or deny access to other
14+
resources. Generally, the outcome of a policy can be evaluated to a specific set
15+
of outcomes.
16+
17+
Changes to policies without proper validation may have unintended consequences
18+
that can severely impact a customer’s overall infrastructure setup. To safely
19+
update resources, it is beneficial to test these changes via policy rollout
20+
APIs.
21+
22+
Preview is a rollout safety mechanism for policy resources, which gives the
23+
customer the ability to validate the effect of their proposed changes against
24+
production traffic prior to the changes going live. The result of the policy
25+
evaluation against traffic is logged in order to give the customer the data
26+
required to test the correctness of the change.
27+
28+
Firewall policies exemplify a case that is suitable for previewing. A new
29+
configuration can be evaluated against traffic to observe which IPs would be
30+
allowed or denied. This gives the customer the data to guide a decision on
31+
whether to promote the proposed changes to live.
32+
33+
The expected flow for previewing a policy is as follows:
34+
35+
1. The user creates an experiment containing a new policy configuration
36+
intended to replace the live policy.
37+
2. The user uses the "preview" method to start generating logs which compare
38+
the live and experiment policy evaluations against live traffic.
39+
3. The user inspects the logs to determine whether the experiment has the
40+
intended result.
41+
4. The user uses the "commit" method to promote the experiment to live.
42+
43+
## Guidance
44+
45+
### Non-goals
46+
47+
This proposal is for a safety mechanism for policy rollouts only. Safe rollouts
48+
for non-policy resources are not in scope.
49+
50+
### Experiments
51+
52+
A new configuration of a policy to be previewed is stored as a nested collection
53+
under the policy. These nested collections are known as experiments.
54+
55+
A hypothetical policy resource called, `Policy`, is used throughout. It has the
56+
following resource name pattern:
57+
58+
`projects/{project}/locations/{location}/policies/{policy}`
59+
60+
The experimental versions of the resource used for previewing or other safe
61+
rollout practices are represented as a nested collection under `Policy` using a
62+
new resource type. The resource type **must** follow the naming convention
63+
*RegularResourceType*`Experiment`.
64+
65+
The following pattern is used for the experiment collection:
66+
67+
`projects/{project}/locations/{location}/policies/{policy}/experiments/{experiment}`
68+
69+
A proto used to represent an experiment **must** contain the following:
70+
71+
1. The required top-level fields for a resource, like `name` and `etag`
72+
2. The policy message that is being tested itself
73+
3. The field, `preview_metadata`, which contains metadata specific to
74+
previewing the experiment of a specific resource type.
75+
76+
```proto
77+
message PolicyExperiment {
78+
79+
// The resource name of the PolicyExperiment.
80+
string name = 1;
81+
82+
// The policy experiment. This Policy will be used to preview the effects of
83+
// the change but will not affect life traffic.
84+
Policy policy = 2;
85+
86+
// The metadata associated with this policy experiment.
87+
PolicyPreviewMetadata preview_metadata = 3
88+
[(google.api.field_behavior) = OUTPUT_ONLY];
89+
}
90+
```
91+
92+
- The experiment proto **must** have a top-level field with the same type as the
93+
live policy.
94+
- It **must** be named as the live resource type. For example, if the
95+
experiment is for FirewallPolicy, then this field **must** be named
96+
`firewall_policy`.
97+
- The name inside the embedded `policy` message **must** be the name of the
98+
live policy.
99+
- When the user is ready to promote an experiment, they **must** copy the
100+
`policy` message into the live policy and delete the experiment. This can be
101+
done manually or via a "commit" custom method.
102+
- A product **may** support multiple experiments concurrently being previewed
103+
for a single live policy.
104+
- Each experiment must generate logs having each entry preceded by log_prefix
105+
so that the user can compare the results of the experiment with the behavior
106+
of the live policy.
107+
- The number of experimental configurations for a given live policy **may** be
108+
capped at a certain number and the cap **must** be documented.
109+
- Cascading deletes **must** occur: if the live policy is deleted, all
110+
experiments **must** also be deleted.
111+
112+
### Metadata
113+
114+
`preview_metadata` tracks all metadata of previewing the experiment. The
115+
messages **must** follow the convention: *RegularResourceType*`PreviewMetadata`.
116+
This is so the proto can be defined uniquely for each resource type in the
117+
same service with experiments.
118+
119+
```proto
120+
message PolicyPreviewMetadata {
121+
// The state of previewing the experiment. The possible values are ACTIVE
122+
// and INACTIVE. ACTIVE indicates that results are being logged upstream.
123+
string state = 1;
124+
125+
// An identifying string common to all logs generated when previewing the
126+
// experiment. Searching all logs for this string will isolate the results.
127+
string log_prefix = 2;
128+
129+
// The most recent time at which this experiment started previewing.
130+
google.protobuf.Timestamp start_time = 3;
131+
132+
// The most recent time at which this experiment stopped previewing.
133+
google.protobuf.Timestamp stop_time = 4;
134+
}
135+
```
136+
137+
- `PolicyPreviewMetadata` **must** have the fields defined in the proto above.
138+
- `state` **must** be `ACTIVE` or `INACTIVE`.
139+
- It **may** have additional fields if the service or resource requires it.
140+
- When an experiment is first previewed, `preview_metadata` **must** be
141+
absent.
142+
- It is present on the experiment once the "preview" method is used.
143+
- All `preview_metadata` fields **must** be output only.
144+
- `state` changes to and from `ACTIVE` and `INACTIVE` when the experiment is
145+
started or stopped, which can only be done by the "preview" and "stop" custom
146+
methods.
147+
- The first time the "preview" custom method is used, the system **must** create
148+
`preview_metadata` and do the following:
149+
- It **must** set the `state` to `ACTIVE`
150+
- It **must** populate `start_time` with the current time.
151+
- `start_time` **must** be updated every time the status is changed to
152+
`ACTIVE`.
153+
- It **must** set a system generated `log_prefix` string, which is a
154+
predefined constant hard coded by the system developers.
155+
- The same value is used for previewing experiments for the given resource
156+
type. For example, "FirewallPolicyPreviewLog" for FirewallPolicy.
157+
- When the "stop" custom method is used, the system **must** do the following:
158+
- It **must** set the `state` to `INACTIVE`
159+
- It **must** populate the `stop_time` with the current time.
160+
161+
### Methods
162+
163+
#### create
164+
165+
- The resource **must** be created using long-running
166+
[Create][aip-133-long-running] and
167+
`google.longrunning.operation_info.response_type` **must** be
168+
`PolicyExperiment`.
169+
- Creating a new experiment to preview **must** support the following use
170+
cases:
171+
- Preview a new policy.
172+
- Preview an update to an already live policy.
173+
- Preview a deletion of a current policy.
174+
- For the update and delete use cases, the `policy` field in the experiment
175+
**must** have the full payload of the live policy copied into it, including
176+
the name.
177+
- The user **must** set the rules to the new intended state to preview an
178+
update.
179+
- The user **must** set set the rules to represent a no-op to preview a
180+
delete.
181+
- To preview a new policy, the system must do the following:
182+
- If the system does not support a nested collection without a live policy,
183+
the user **must** create a live policy and set the rules to represent a
184+
no-op. For example, the rules of a no-op policy **may** be empty.
185+
- An experiment is created as a child of the no-op policy.
186+
- If the system supports previewing multiple experiments for a live policy,
187+
calling "create" more than once **must** create multiple experiments.
188+
189+
#### update
190+
191+
- The resource **must** be updated using long-running
192+
[Update][aip-134-long-running] and
193+
`google.longrunning.operation_info.response_type` **must** be
194+
`PolicyExperiment`.
195+
- The name inside `policy` **must not** change but the other fields can in
196+
order to change the experiment being previewed because this `policy` is
197+
intended to replace the live policy, and the name of the live policy
198+
**must not** change.
199+
- The system **must** set the `state` to INACTIVE if the `state` was ACTIVE at
200+
the time of an update.
201+
- This is so the user can easily distinguish between different versions of
202+
the experiment being previewed.
203+
204+
#### get
205+
- The standard method, [Get][aip-131], **must** be included for
206+
`PolicyExperiment` resource types.
207+
208+
#### list
209+
210+
- The standard method, [List][aip-132], **must** be included for
211+
`PolicyExperiment` resource types.
212+
- Filtering on `PolicyPreviewMetadata` indicates which experiments are actively
213+
previewed.
214+
- For example, the following filter string returns a List response with
215+
experiments being previewed: preview_metadata.state = ACTIVE.
216+
217+
#### delete
218+
219+
- The resource **must** be deleted using long-running
220+
[Delete][aip-135-long-running] and
221+
`google.longrunning.operation_info.response_type` **must** be
222+
`PolicyExperiment`.
223+
224+
#### preview
225+
226+
```proto
227+
// Starts previewing a PolicyExperiment. This triggers the system to start
228+
// generating logs to evaluate the PolicyExperiment.
229+
rpc PreviewPolicyExperiment(PreviewPolicyExperimentRequest)
230+
returns (google.longrunning.Operation) {
231+
option (google.api.http) = {
232+
post: "/v1/{name=policies/*/experiments/*}:preview"
233+
body: "*"
234+
};
235+
option (google.longrunning.operation_info) = {
236+
response_type: "PolicyExperiment"
237+
metadata_type: "PreviewPolicyExperimentMetadata"
238+
};
239+
}
240+
241+
// The request message for the preview custom method.
242+
message PreviewPolicyExperimentRequest {
243+
// The name of the PolicyExperiment.
244+
string name = 1;
245+
}
246+
```
247+
248+
- This custom method is required.
249+
- `google.longrunning.Operation.metadata_type` **must** follow guidance on
250+
[Long-running operations][aip-151]
251+
- This method **must** trigger the system to start generating logs to preview
252+
the experiment.
253+
- Whenever the method is called successfully, the system **must** set the
254+
following values in the `PolicyPreviewMetadata`:
255+
- `log_prefix` to the predefined constant.
256+
- `start_time` to the current time
257+
- `state` to `ACTIVE`.
258+
- If the method is called on an experiment with the rules representing a no-op,
259+
then the system **must** preview the deletion of the live policy.
260+
261+
#### stop
262+
263+
```proto
264+
// Stops previewing a PolicyExperiment. This triggers the system to stop
265+
// generating logs to evaluate the PolicyExperiment.
266+
rpc StopPolicyExperiment(StopPolicyExperimentRequest)
267+
returns (google.longrunning.Operation) {
268+
option (google.api.http) = {
269+
post: "/v1/{name=policies/*/experiments/*}:stop"
270+
body: "*"
271+
};
272+
option (google.longrunning.operation_info) = {
273+
response_type: "PolicyExperiment"
274+
metadata_type: "StopPolicyExperimentMetadata"
275+
};
276+
}
277+
278+
// The request message for the stop custom method.
279+
message StopPolicyExperimentRequest {
280+
// The name of the PolicyExperiment.
281+
string name = 1;
282+
}
283+
```
284+
285+
- This custom method is required.
286+
- `google.longrunning.Operation.metadata_type` **must** follow guidance on
287+
[Long-running operations][aip-151]
288+
- This method **must** trigger the system to stop generating logs to preview the
289+
experiment.
290+
- Whenever the method is called successfully, the system **must** set the
291+
following values in the `PolicyPreviewMetadata`:
292+
- `stop_time` to the current time
293+
- `state` to `INACTIVE`
294+
295+
#### commit
296+
297+
The resource **may** expose a new custom method called "commit" to promote an
298+
experiment. The system copies `policy` from the experiment into the live policy
299+
and then deletes the experiment.
300+
301+
Declarative clients **may** manually copy fields from an experiment into the
302+
live policy and then delete the experiment rather than calling "commit" if
303+
preferable.
304+
305+
```proto
306+
// Commits a PolicyExperiment. This copies the PolicyExperiment's policy message
307+
// to the live policy then deletes the PolicyExperiment.
308+
rpc CommitPolicyExperiment(CommitPolicyExperimentRequest)
309+
returns (google.longrunning.Operation) {
310+
option (google.api.http) = {
311+
post: "/v1/{name=policies/*/experiments/*}:commit"
312+
body: "*"
313+
};
314+
option (google.longrunning.operation_info) = {
315+
response_type: "google.protobuf.Empty"
316+
metadata_type: "CommitPolicyExperimentMetadata"
317+
};
318+
}
319+
320+
// The request message for the commit custom method.
321+
message CommitPolicyExperimentRequest {
322+
string name = 1;
323+
string etag = 2;
324+
string parent_etag = 3;
325+
}
326+
```
327+
328+
- `google.longrunning.Operation.metadata_type` **must** follow guidance on
329+
[Long-running operations][aip-151]
330+
- The method **must** atomically copy `policy` from the experiment into the live
331+
policy, and then delete the experiment.
332+
- If any experiment fails "commit", previewing it **must not** stop, and the
333+
live policy **must not** be updated.
334+
- The method can be called on an experiment in any state.
335+
- The `etag` **must** match that of the experiment in order for commit to be
336+
successful. This is so the user does not commit an unintended version of the
337+
experiment.
338+
- If no `etag` is provided, the API **must not** succeed to prevent the user
339+
from unintentionally committing a different version of the experiment as
340+
intended.
341+
- A `parent_etag` **may** be provided to guarantee that the experiment
342+
overwrites a specific version of the live policy.
343+
- The method is not idempotent and calling it twice on the same experiment
344+
**must** return a 404 NOT_FOUND as the experiment is deleted as part of the
345+
first call.
346+
347+
### Changes to live policy API methods
348+
349+
#### delete
350+
351+
- A delete of the live policy **must** delete all experiments.
352+
- To maintain the experiments while negating the effect of the live policy, the
353+
live policy **must** be changed to a no-op policy instead of using this
354+
method.
355+
356+
### Logging
357+
358+
Logging is crucial for the user to evaluate whether an experiment should be
359+
promoted to live.
360+
361+
Logs **must** contain the results of the evaluated experiment, the `etag`
362+
associated with that experiment alongside that of the live policy, and be
363+
preceded by the value of `log_prefix`.
364+
- The `etag` fields help the user identify which
365+
configurations of the live and experiment are evaluated in the log.
366+
- `log_prefix` helps the user separate logs specifically generated for
367+
previewing the experiment from other use cases.
368+
369+
Overall, these logs help the user make a decision about whether to promote the
370+
experiment to live.
371+
372+
## Changelog
373+
374+
- **2023-03-30:** Initial AIP written.
375+
376+
[aip-131]: https://aip.dev/131
377+
[aip-132]: https://aip.dev/132
378+
[aip-133-long-running]: https://aip.dev/133#long-running-create
379+
[aip-134-long-running]: https://aip.dev/134#long-running-update
380+
[aip-135-long-running]: https://aip.dev/135#long-running-delete
381+
[aip-151]: https://google.aip.dev/151

0 commit comments

Comments
 (0)