Fix static IP subnet mismatch when adding network to nic_group#2716
Fix static IP subnet mismatch when adding network to nic_group#2716vlast3k wants to merge 2 commits intocloudfoundry:mainfrom
Conversation
When adding a new network to existing instances in the same nic_group,
the placement planner picked the first unclaimed static IP in the AZ
without checking which cloud subnet it belonged to. This caused CPI
errors ("Networks in nic_group have different subnet_ids") when an AZ
has multiple cloud subnets with interleaved static IP ranges.
Store cloud_properties from the containing subnet on each StaticIpToAzs
entry. When selecting an IP for a network that shares a nic_group with
an already-assigned sibling, constrain the lookup to IPs on a subnet
with matching cloud_properties. Fail with a clear error when no
matching IP is available.
Fixes: BOSH-1922
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe PR extends static IP placement to track and match subnet Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rb`:
- Around line 288-300: The helper nic_group_sibling_cloud_properties may
accidentally match the current network as its own sibling; add a defensive early
check inside the loop to skip when sibling_network == network so it only
considers other networks, keeping the rest of the logic (matching nic_group and
checking plan.reservation.ip and subnet) unchanged; update the loop in
nic_group_sibling_cloud_properties to continue to next when sibling_network is
the same object as the provided network.
- Around line 118-129: The error message raised when static_ip_to_azs is nil
uses a nic_group-specific message even when sibling_cloud_props is nil; update
the branch inside the desired_instance.az.nil? block to raise a different
message depending on whether sibling_cloud_props is present: if
sibling_cloud_props is truthy, raise Bosh::Director::NetworkReservationError
with the subnet/nic_group message using network.nic_group (matching the existing
nic_group wording), otherwise raise the generic "Failed to distribute static
IPs..." style message used in the else-branch; adjust the code paths around
sibling_cloud_props,
`@networks_to_static_ips.next_ip_for_network_with_cloud_properties` and
`@networks_to_static_ips.next_ip_for_network` so the error raised mirrors the
else-branch behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0e44e9cf-74b2-4434-8c77-eb01b375d94e
📒 Files selected for processing (4)
src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/networks_to_static_ips.rbsrc/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rbsrc/bosh-director/spec/unit/bosh/director/deployment_plan/placement_planner/networks_to_static_ips_spec.rbsrc/bosh-director/spec/unit/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker_spec.rb
Validation:
|
- Differentiate error when static IP lookup fails: nic_group-specific message when sibling_cloud_props is present, generic message otherwise. Mirrors the else-branch error handling. - Skip self-match in nic_group_sibling_cloud_properties to prevent the current network from matching itself as a sibling.
|
The PR was successfully validated in a test landscape. The challenge to reproduce it was the fact that if the IPs are properly distributed across the VMs, and subnets it does not trigger. So they had to be intentionally mixed, what actually happens in real life, when VMs are added and removed and assigned VMs from subnets (we faced it in 7+ old CF Deployment) |
Summary
nic_group, the placement planner picked the first unclaimed static IP in the AZ without checking which cloud subnet it belonged to. This caused CPI errors (Networks in nic_group have different subnet_ids) when an AZ has multiple cloud subnets with interleaved static IP ranges.cloud_propertiesfrom the containing subnet on eachStaticIpToAzsentry. When selecting an IP for a network that shares anic_groupwith an already-assigned sibling, constrain the lookup to IPs on a subnet with matchingcloud_properties.What changed
networks_to_static_ips.rbStaticIpToAzsstruct now includescloud_propertiescreate()records the subnet'scloud_propertiesfor each IPfind_by_network_az_and_cloud_propertiesandnext_ip_for_network_with_cloud_propertiesstatic_ips_availability_zone_picker.rbcreate_network_plan_with_azlooks up the cloud_properties of an already-assigned sibling in the same nic_group, then constrains IP selection to matching subnetsnic_group_sibling_cloud_propertieshelperTests
networks_to_static_ips_spec.rb: cloud_properties storage and filtered lookupstatic_ips_availability_zone_picker_spec.rb: multi-subnet nic_group pairing, mismatch error, no-nic_group unchanged behaviorContext
On AWS, all networks in a
nic_groupmust resolve to the same ENI subnet. Landscapes with multiple/24subnets per AZ and interleaved static IP ranges hit this when enabling IPv6 (adding a second network to the nic_group). BOSH assigned IPv6 IPs on a different subnet than the existing IPv4 IPs because it only filtered by AZ, not by cloud subnet.Verified against a production cloud config: paired IPv4/IPv6 subnets sharing a
nic_grouphave identicalcloud_properties(same AWS subnet ID, same security groups), because AWS subnets are dual-stack and anic_groupmaps to a single ENI. Fullcloud_propertieshash equality is the correct comparison — narrowing to just thesubnetkey would be fragile and CPI-specific.Test plan