Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 122 additions & 10 deletions aip/general/0233.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ transaction. A batch create method provides this functionality.

## Guidance

APIs **may** support Batch Create using the following pattern:
APIs **may** support Batch Create using the following two patterns:

Returning the response synchronously

```proto
rpc BatchCreateBooks(BatchCreateBooksRequest) returns (BatchCreateBooksResponse) {
Expand All @@ -26,25 +28,51 @@ rpc BatchCreateBooks(BatchCreateBooksRequest) returns (BatchCreateBooksResponse)
}
```

Returning an Operation which resolves to the response asynchronously

```proto
rpc BatchCreateBooks(BatchCreateBooksRequest) returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{parent=publishers/*}/books:batchCreate"
body: "*"
};
option (google.longrunning.operation_info) = {
response_type: "BatchCreateBooksResponse"
metadata_type: "BatchCreateBooksOperationMetadata"
};
}
```

- The RPC's name **must** begin with `BatchCreate`. The remainder of the RPC
name **should** be the plural form of the resource being created.
- The request and response messages **must** match the RPC name, with
`Request` and `Response` suffixes.
- However, in the event that the request may take a significant amount of
time, the response message **must** be a `google.longrunning.Operation`
which ultimately resolves to the `Response` type.
- If the batch method returns an `google.longrunning.Operation`, both the
`response_type` and `metadata_type` fields **must** be specified.
- The HTTP verb **must** be `POST`.
- The HTTP URI **must** end with `:batchCreate`.
- The URI path **should** represent the collection for the resource, matching
the collection used for simple CRUD operations. If the operation spans
parents, a dash (`-`) **may** be accepted as a wildcard.
- The body clause in the `google.api.http` annotation **should** be `"*"`.
- The operation **should** be atomic: it **should** fail for all resources or
succeed for all resources (no partial success).
- If the operation covers multiple locations and at least one location is
down, the operation **must** fail.
- In cases where supporting partial responses cannot be avoided, the design
should follow the guidelines of [AIP-193](https://aip.dev/193).

### Atomic vs. Partial Success

- The batch create method **may** support atomic (all resources created or none
are) or partial success behavior. To make a choice, consider the following
factors:
- **Complexity of Ensuring Atomicity:** Operations that are simple
passthrough database transactions **should** use an atomic operation,
while operations that manage complex resources **should** use partial
success operations.
- **End-User Experience:** Consider the perspective of the API consumer.
Would atomic behavior be preferable for the given use case, even if it
means that a large batch could fail due to issues with a single or a few
entries?
- Synchronous batch create **must** be atomic.
- Asynchronous batch create **may** support atomic or partial success.
- If supporting partial success, see
[Operation metadata message](#operation-metadata-message) requirements.

### Request message

Expand Down Expand Up @@ -111,11 +139,95 @@ message BatchCreateBooksResponse {
- The response message **must** include one repeated field corresponding to the
resources that were created.

### Operation metadata message

- The `metadata_type` message **must** either match the RPC name with
`OperationMetadata` suffix, or be named with `Batch` prefix and
`OperationMetadata` suffix if the type is shared by multiple Batch methods.
- If batch create method supports partial success, the metadata message **must**
include a `map<int32, google.rpc.Status> failed_requests` field to communicate
the partial failures.
- The key in this map is the index of the request in the `requests` field in
the batch request.
- The value in each map entry **must** mirror the error(s) that would normally
be returned by the singular Standard Create method.
- If a failed request can eventually succeed due to server side retries, such
transient errors **must not** be communicated using `failed_requests`.
- When all requests in the batch fail, `Operation.error` **must** be set with
`code = google.rpc.Code.Aborted` and `message = "None of the requests
succeeded, refer to the BatchCreateBooksOperationMetadata.failed_requests
for individual error details"`
- The metadata message **may** include other fields to communicate the
operation progress.

### Adopting Partial Success

In order for an existing Batch API to adopt the partial success pattern, the API
must do the following:

- The default behavior must be retained to avoid incompatible behavioral
changes.
- If the API returns an Operation:
- The request message **must** have a `bool return_partial_success` field.
- The Operation `metadata_type` **must** include a
`map<int32, google.rpc.Status> failed_requests` field.
- When the `bool return_partial_success` field is set to true in a request,
the API should allow partial success behavior, otherwise it should continue
with atomic behavior as default.
- If the API returns a direct response synchronously:
- Since the existing clients will treat a success response as an atomic
operation, the existing version of the API **must not** adopt the partial
success pattern.
- A new version **must** be created instead that returns an Operation and
follows the partial success pattern described in this AIP.

## Rationale

### Restricting synchronous batch methods to be atomic

The restriction that synchronous batch methods must be atomic is a result of
the following considerations.

The previous iteration of this AIP recommended batch methods must be atomic.
There is no clear way to convey partial failure in a sync response status code
because an OK implies it all worked. Therefore, adding a new field to the
response to indicate partial failure would be a breaking change because the
existing clients would interpret an OK response as all resources created.

On the other hand, as described in [AIP-193](https://aip.dev/193), Operations
are more capable of presenting partial states. The response status code for an
Operation does not convey anything about the outcome of the underlying operation
and a client has to check the response body to determine if the operation was
successful.

### Communicating partial failures

The AIP recommends using a `map<int32, google.rpc.Status> failed_requests` field
to communicate partial failures, where the key is the index of the failed
request in the original batch request. The other options considered were:

- A `repeated google.rpc.Status` field. This was rejected because it is not
clear which entry corresponds to which request.
- A `map<string, google.rpc.Status>` field, where the key is the request id of
the failed request. This was rejected because:
- Client will need to maintain a map of request_id -> request in order to use
the partial success response.
- Populating a request id for the purpose of communicating errors could
conflict with [AIP-155](https://aip.dev/155) if the service can not
guarantee idempotency for an individual request across multiple batch
requests.
- A `repeated FailedRequest` field, where FailedRequest contains the individual
create request and the `google.rpc.Status`. This was rejected because echoing
the request payload back in response is discouraged due to additional
challenges around user data sensitivity.

[aip-122-parent]: ./0122.md#fields-representing-a-resources-parent
[request-message]: ./0133.md#request-message

## Changelog

- **2025-03-06**: Added detailed guidance for partial success behavior, and
decision framework for choosing between atomic and partial success
- **2023-04-18**: Changed the recommendation to allow returning partial
successes.
- **2022-06-02**: Changed suffix descriptions to eliminate superfluous "-".
Expand Down
130 changes: 122 additions & 8 deletions aip/general/0234.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ transaction. A batch update method provides this functionality.

## Guidance

APIs **may** support Batch Update using the following pattern:
APIs **may** support Batch Update using the following two patterns:

Returning the response synchronously

```proto
rpc BatchUpdateBooks(BatchUpdateBooksRequest) returns (BatchUpdateBooksResponse) {
Expand All @@ -26,23 +28,51 @@ rpc BatchUpdateBooks(BatchUpdateBooksRequest) returns (BatchUpdateBooksResponse)
}
```

Returning an Operation which resolves to the response asynchronously

```proto
rpc BatchUpdateBooks(BatchUpdateBooksRequest) returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{parent=publishers/*}/books:batchUpdate"
body: "*"
};
option (google.longrunning.operation_info) = {
response_type: "BatchUpdateBooksResponse"
metadata_type: "BatchUpdateBooksOperationMetadata"
};
}
```

- The RPC's name **must** begin with `BatchUpdate`. The remainder of the RPC
name **should** be the plural form of the resource being updated.
- The request and response messages **must** match the RPC name, with
`Request` and `Response` suffixes.
- However, in the event that the request may take a significant amount of
time, the response message **must** be a `google.longrunning.Operation`
which ultimately resolves to the `Response` type.
- If the batch method returns an `google.longrunning.Operation`, both the
`response_type` and `metadata_type` fields **must** be specified.
- The HTTP verb **must** be `POST`.
- The HTTP URI **must** end with `:batchUpdate`.
- The URI path **should** represent the collection for the resource, matching
the collection used for simple CRUD operations. If the operation spans
parents, a dash (`-`) **may** be accepted as a wildcard.
- The body clause in the `google.api.http` annotation **should** be `"*"`.
- The operation **must** be atomic: it **must** fail for all resources or
succeed for all resources (no partial success).
- If the operation covers multiple locations and at least one location is
down, the operation **must** fail.

### Atomic vs. Partial Success

- The batch update method **may** support atomic (all resources updated or none
are) or partial success behavior. To make a choice, consider the following
factors:
- **Complexity of Ensuring Atomicity:** Operations that are simple
passthrough database transactions **should** use an atomic operation,
while operations that manage complex resources **should** use partial
success operations.
- **End-User Experience:** Consider the perspective of the API consumer.
Would atomic behavior be preferable for the given use case, even if it
means that a large batch could fail due to issues with a single or a few
entries?
- Synchronous batch update **must** be atomic.
- Asynchronous batch update **may** support atomic or partial success.
- If supporting partial success, see
[Operation metadata message](#operation-metadata-message) requirements.

### Request message

Expand Down Expand Up @@ -107,11 +137,95 @@ message BatchUpdateBooksResponse {
- The response message **must** include one repeated field corresponding to the
resources that were updated.

### Operation metadata message

- The `metadata_type` message **must** either match the RPC name with
`OperationMetadata` suffix, or be named with `Batch` prefix and
`OperationMetadata` suffix if the type is shared by multiple Batch methods.
- If batch update method supports partial success, the metadata message **must**
include a `map<int32, google.rpc.Status> failed_requests` field to communicate
the partial failures.
- The key in this map is the index of the request in the `requests` field
in the batch request.
- The value in each map entry **must** mirror the error(s) that would normally
be returned by the singular Standard Update method.
- If a failed request can eventually succeed due to server side retries, such
transient errors **must not** be communicated using `failed_requests`.
- When all requests in the batch fail, `Operation.error` **must** be set with
`code = google.rpc.Code.Aborted` and `message = "None of the requests
succeeded, refer to the BatchUpdateBooksOperationMetadata.failed_requests
for individual error details"`
- The metadata message **may** include other fields to communicate the
operation progress.

### Adopting Partial Success

In order for an existing Batch API to adopt the partial success pattern, the API
must do the following:

- The default behavior must be retained to avoid incompatible behavioral
changes.
- If the API returns an Operation:
- The request message **must** have a `bool return_partial_success` field.
- The Operation `metadata_type` **must** include a
`map<int32, google.rpc.Status> failed_requests` field.
- When the `bool return_partial_success` field is set to true in a request,
the API should allow partial success behavior, otherwise it should continue
with atomic behavior as default.
- If the API returns a direct response synchronously:
- Since the existing clients will treat a success response as an atomic
operation, the existing version of the API **must not** adopt the partial
success pattern.
- A new version **must** be created instead that returns an Operation and
follows the partial success pattern described in this AIP.

## Rationale

### Restricting synchronous batch methods to be atomic

The restriction that synchronous batch methods must be atomic is a result of
the following considerations.

The previous iteration of this AIP recommended batch methods must be atomic.
There is no clear way to convey partial failure in a sync response status code
because an OK implies it all worked. Therefore, adding a new field to the
response to indicate partial failure would be a breaking change because the
existing clients would interpret an OK response as all resources updated.

On the other hand, as described in [AIP-193](https://aip.dev/193), Operations
are more capable of presenting partial states. The response status code for an
Operation does not convey anything about the outcome of the underlying operation
and a client has to check the response body to determine if the operation was
successful.

### Communicating partial failures

The AIP recommends using a `map<int32, google.rpc.Status> failed_requests` field
to communicate partial failures, where the key is the index of the failed
request in the original batch request. The other options considered were:

- A `repeated google.rpc.Status` field. This was rejected because it is not
clear which entry corresponds to which request.
- A `map<string, google.rpc.Status>` field, where the key is the request id of
the failed request. This was rejected because:
- Client will need to maintain a map of request_id -> request in order to use
the partial success response.
- Populating a request id for the purpose of communicating errors could
conflict with [AIP-155](https://aip.dev/155) if the service can not
guarantee idempotency for an individual request across multiple batch
requests.
- A `repeated FailedRequest` field, where FailedRequest contains the individual
update request and the `google.rpc.Status`. This was rejected because echoing
the request payload back in response is discouraged due to additional
challenges around user data sensitivity.

[aip-122-parent]: ./0122.md#fields-representing-a-resources-parent
[request-message]: ./0134.md#request-message

## Changelog

- **2025-03-06**: Changed recommendation to allow partial success, along with
detailed guidance
- **2022-06-02:** Changed suffix descriptions to eliminate superfluous "-".
- **2020-09-16**: Suggested annotating `parent` and `requests` fields.
- **2020-08-27**: Removed parent recommendations for top-level resources.
Expand Down
Loading