Skip to content

Fix static IP subnet mismatch when adding network to nic_group#2716

Open
vlast3k wants to merge 2 commits intocloudfoundry:mainfrom
vlast3k:fix/static-ip-subnet-match-in-nic-group
Open

Fix static IP subnet mismatch when adding network to nic_group#2716
vlast3k wants to merge 2 commits intocloudfoundry:mainfrom
vlast3k:fix/static-ip-subnet-match-in-nic-group

Conversation

@vlast3k
Copy link
Copy Markdown

@vlast3k vlast3k commented Apr 20, 2026

Summary

  • When adding a new network to existing instances in the same nic_group, the placement planner picked the first unclaimed static IP in the AZ without checking which cloud subnet it belonged to. This caused CPI errors (Networks in nic_group have different subnet_ids) when an AZ has multiple cloud subnets with interleaved static IP ranges.
  • Store cloud_properties from the containing subnet on each StaticIpToAzs entry. When selecting an IP for a network that shares a nic_group with an already-assigned sibling, constrain the lookup to IPs on a subnet with matching cloud_properties.
  • Fail with a clear error when no matching IP is available, instead of silently assigning a mismatched one.

What changed

networks_to_static_ips.rb

  • StaticIpToAzs struct now includes cloud_properties
  • create() records the subnet's cloud_properties for each IP
  • Added find_by_network_az_and_cloud_properties and next_ip_for_network_with_cloud_properties

static_ips_availability_zone_picker.rb

  • create_network_plan_with_az looks up the cloud_properties of an already-assigned sibling in the same nic_group, then constrains IP selection to matching subnets
  • Added nic_group_sibling_cloud_properties helper

Tests

  • networks_to_static_ips_spec.rb: cloud_properties storage and filtered lookup
  • static_ips_availability_zone_picker_spec.rb: multi-subnet nic_group pairing, mismatch error, no-nic_group unchanged behavior

Context

On AWS, all networks in a nic_group must resolve to the same ENI subnet. Landscapes with multiple /24 subnets per AZ and interleaved static IP ranges hit this when enabling IPv6 (adding a second network to the nic_group). BOSH assigned IPv6 IPs on a different subnet than the existing IPv4 IPs because it only filtered by AZ, not by cloud subnet.

Verified against a production cloud config: paired IPv4/IPv6 subnets sharing a nic_group have identical cloud_properties (same AWS subnet ID, same security groups), because AWS subnets are dual-stack and a nic_group maps to a single ENI. Full cloud_properties hash equality is the correct comparison — narrowing to just the subnet key would be fragile and CPI-specific.

Test plan

  • Existing placement planner specs pass (no behavioral change when nic_group is not used)
  • New spec: two networks in nic_group, two subnets per AZ with different cloud_properties — IPs are paired on matching subnets
  • New spec: mismatched static IPs (no IP on sibling subnet) raises clear error
  • New spec: without nic_group, behavior unchanged
  • Integration test on a landscape with multi-subnet AZ + dual-stack nic_group

When adding a new network to existing instances in the same nic_group,
the placement planner picked the first unclaimed static IP in the AZ
without checking which cloud subnet it belonged to. This caused CPI
errors ("Networks in nic_group have different subnet_ids") when an AZ
has multiple cloud subnets with interleaved static IP ranges.

Store cloud_properties from the containing subnet on each StaticIpToAzs
entry. When selecting an IP for a network that shares a nic_group with
an already-assigned sibling, constrain the lookup to IPs on a subnet
with matching cloud_properties. Fail with a clear error when no
matching IP is available.

Fixes: BOSH-1922
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 99ef08c5-99c0-4f7c-9fb2-886e44279d1b

📥 Commits

Reviewing files that changed from the base of the PR and between 2c66bf3 and 4bdaa97.

📒 Files selected for processing (1)
  • src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rb

Walkthrough

The PR extends static IP placement to track and match subnet cloud_properties. StaticIpToAzs now includes a cloud_properties field. NetworksToStaticIps.create stores subnet cloud_properties for each IP and adds selection queries: next_ip_for_network_with_cloud_properties and find_by_network_az_and_cloud_properties. StaticIpsAvailabilityZonePicker obtains sibling-subnet cloud_properties for nic-grouped networks and routes IP lookups through the cloud-property-aware queries, raising NetworkReservationError when no matching IP is found. Unit tests were added to verify cloud_properties storage and sibling-subnet matching and failure behavior.

Suggested reviewers

  • rkoster
  • s4heid
  • fmoehler
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: fixing static IP subnet mismatch when adding a network to a nic_group, which aligns with the core problem and solution described throughout the PR.
Description check ✅ Passed The PR description addresses all required sections: explains what changed (summary, code changes, tests), provides context (AWS nic_group requirements, multi-subnet scenario), includes test plan, and covers breaking changes. All critical information is present and well-structured.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rb`:
- Around line 288-300: The helper nic_group_sibling_cloud_properties may
accidentally match the current network as its own sibling; add a defensive early
check inside the loop to skip when sibling_network == network so it only
considers other networks, keeping the rest of the logic (matching nic_group and
checking plan.reservation.ip and subnet) unchanged; update the loop in
nic_group_sibling_cloud_properties to continue to next when sibling_network is
the same object as the provided network.
- Around line 118-129: The error message raised when static_ip_to_azs is nil
uses a nic_group-specific message even when sibling_cloud_props is nil; update
the branch inside the desired_instance.az.nil? block to raise a different
message depending on whether sibling_cloud_props is present: if
sibling_cloud_props is truthy, raise Bosh::Director::NetworkReservationError
with the subnet/nic_group message using network.nic_group (matching the existing
nic_group wording), otherwise raise the generic "Failed to distribute static
IPs..." style message used in the else-branch; adjust the code paths around
sibling_cloud_props,
`@networks_to_static_ips.next_ip_for_network_with_cloud_properties` and
`@networks_to_static_ips.next_ip_for_network` so the error raised mirrors the
else-branch behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e44e9cf-74b2-4434-8c77-eb01b375d94e

📥 Commits

Reviewing files that changed from the base of the PR and between 0dbe149 and 2c66bf3.

📒 Files selected for processing (4)
  • src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/networks_to_static_ips.rb
  • src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rb
  • src/bosh-director/spec/unit/bosh/director/deployment_plan/placement_planner/networks_to_static_ips_spec.rb
  • src/bosh-director/spec/unit/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker_spec.rb

@vlast3k
Copy link
Copy Markdown
Author

vlast3k commented Apr 20, 2026

Validation: cloud_properties hash equality is safe

A potential concern was raised during review: is full cloud_properties == cloud_properties hash equality too strict? Could paired IPv4/IPv6 subnets in the same nic_group have slightly different cloud_properties?

Verified against a production AWS cloud config. When two BOSH networks share a nic_group (e.g., IPv4-ext and IPv6-single-ext), their subnets in each AZ reference the same AWS subnet — because AWS subnets are dual-stack by default and a nic_group maps to a single ENI. The cloud_properties hashes are identical:

# IPv4 network, z1, subnet 1
cloud_properties:
  security_groups: [sg-aaa, sg-bbb, sg-ccc, sg-ddd]
  subnet: subnet-12345

# IPv6 network, z1, subnet 1
cloud_properties:
  security_groups: [sg-aaa, sg-bbb, sg-ccc, sg-ddd]
  subnet: subnet-12345

All 8 subnets across both networks were 1:1 matched — same subnet ID, same security_groups list. The only differences between the two BOSH subnets are in non-cloud_properties fields (range, gateway, dns, static).

Conclusion: Full hash equality is correct. Narrowing the comparison to just the subnet key would be fragile — it assumes the cloud_properties schema, which varies by CPI (AWS vs GCP vs Azure vs vSphere).

- Differentiate error when static IP lookup fails: nic_group-specific
  message when sibling_cloud_props is present, generic message otherwise.
  Mirrors the else-branch error handling.
- Skip self-match in nic_group_sibling_cloud_properties to prevent
  the current network from matching itself as a sibling.
@rkoster rkoster requested review from a team and neddp April 23, 2026 15:15
@beyhan beyhan moved this from Pending Merge | Prioritized to Pending Review | Discussion in Foundational Infrastructure Working Group Apr 30, 2026
@vlast3k
Copy link
Copy Markdown
Author

vlast3k commented May 4, 2026

The PR was successfully validated in a test landscape. The challenge to reproduce it was the fact that if the IPs are properly distributed across the VMs, and subnets it does not trigger. So they had to be intentionally mixed, what actually happens in real life, when VMs are added and removed and assigned VMs from subnets (we faced it in 7+ old CF Deployment)
@neddp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending Review | Discussion

Development

Successfully merging this pull request may close these issues.

4 participants