RHEL-149879: [viostor] Limit num_cpus and max_cpus to MAX_CPU#1518
RHEL-149879: [viostor] Limit num_cpus and max_cpus to MAX_CPU#1518benyamin-codez wants to merge 1 commit intovirtio-win:masterfrom
Conversation
Limits the maxmimum CPU count to the global macro constant `MAX_CPU`. The bot has identified potential out-of-bounds memory access on systems with > 256 CPUs as `ADAPTER_EXTENSION` members `vq` and `processing_srbs` are limited to MAX_CPU (which is 256) but the `num_queues` is not. This was previously managed by casting `num_cpus` to `num_queues` as USHORT. PR virtio-win#1502 removes this cast and reimplements `num_queues` as ULONG. This PR proposes an alternate solution to limit the `num_queues` member by limiting the `num_cpus` and `max_cpus` variables to MAX_CPU. This seems the appropriate place to limit CPU counts. Signed-off-by: benyamin-codez <115509179+benyamin-codez@users.noreply.github.com>
Summary of ChangesHello @benyamin-codez, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
[Jenkins CI]: Can one of the admins verify this patch? |
|
Yan, Please do let me know if you would prefer I add this one to PR #1502. Best regards, |
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a potential out-of-bounds memory access by limiting the num_cpus and max_cpus variables to the MAX_CPU constant. The approach is sound and effectively prevents the issue on systems with more than 256 CPUs. I have one suggestion to improve the implementation to avoid potential side effects related to macro expansion, which can lead to multiple function calls.
|
Hi @benyamin-codez, I think that assumption of having only 256 CPUs is outdated. https://learn.microsoft.com/en-us/windows-server/get-started/locks-limits?tabs=full-comparison&pivots=windows-server-2025 Your PR with using ULONG for the amount of queue is a good direction. I don't think we need to limit the driver. I think that the full solution is:
|
|
Hi Yan, iirc, the 256 limit was the previous SeaBIOS default limit due to the xAPIC 256-bit ID bitmap. This then bumped to 288 vCPUs (in 2016 iinm) when using the q35 machine type with IOMMU and EIM enabled (QEMU v2.8). We now have 1024 vCPUs with QEMU version 8.2+, but YMMV as this is configuration dependent. The legacy vCPU hot-plug/unplug was recently cleaned up (in December last year) and new QEMU versions will now just use the modern interface. The legacy code applied to machine types prior to 2.7 and was limited by: #define ACPI_CPU_HOTPLUG_ID_LIMIT 256So if we intend to continue supporting machine versions prior to 2.7, then this will remain a limit with older versions of QEMU that might still be in production. Surely, this would have to be an extreme corner case though...?!? AFAICT, the reality is that we have no effective means to glean a vCPU limit from QEMU from inside the driver anyway. Given this, and the above-mentioned limit variability, I agreee that it does make sense to remove the limit from the driver. We should be able to populate Dynamic allocation (and cleanup) would be required for the nested Regarding registry manipulation, I did this previously in a A foreseeable problem is that we will have the wrong KAFFINITY mask for the In any case, I'll need to sort out the registry PR first (#1297 - I am working on that one at the moment) and include a global value only parameter, which is easy enough. I propose we initially implement a limit on Repeating for // this one:
#define VIRTIO_SCSI_QUEUE_LAST VIRTIO_SCSI_REQUEST_QUEUE_0 + MAX_CPU
// should be:
#define VIRTIO_SCSI_QUEUE_LAST MAX_CPU - VIRTIO_SCSI_REQUEST_QUEUE_0Anyway, do you see any issues with the above approach? Perhaps more importantly, did you want to merge this PR in the interim to cover the out-of-bounds memory access scenarios, i.e. when Best regards, |
|
@benyamin-codez I don't see big problem with legacy deployments, as in any case the amount of quest must be specified by the host. Regarding "-smp" issue or a better term will be "CPU Hot plug scenario" - sure, it should be tested. |
@YanVugenfirer - I just wanted to make sure this question didn't get missed. The previous behaviour (prior to PR #1502 merging) would limit the If you want to keep that behaviour for now - as the alternate solution will take some time to prove - I will switch this PR to ready for review with a view to merge it ASAP. I highly recommend this approach to avoid breakage in the wild. Otherwise I will close this PR early and raise a new PR in due course with the proposed unlimited solution. |
|
@benyamin-codez Let's keep the existing behavior. |
|
I'm somewhat concerned that a new release without this PR merged will break systems with high vCPU counts. Can we please run some HCK-CI tests and keep this one moving? |
|
A further note wrt to the alternate (long-term) solution: We will still be limited by the MSI-X As QEMU appears to be presently limited to 1024 vCPUs, ... The alternative is just to set it to the maximum of 2,048. I recommend retaining the registry entry as it is useful for testing. Thoughts? |
|
@benyamin-codez In order not to go back to this code in couple of years, I suggest dynamic allocation of all the data structures related to the amount of queues. |
This is true, although iinm, the QEMU
I can probably come up with a first cut by the end of the quarter... |
Limits the maxmimum CPU count to the global macro constant
MAX_CPU.The bot has identified potential out-of-bounds memory access on systems with > 256 CPUs as
ADAPTER_EXTENSIONmembersvqandprocessing_srbsare limited to MAX_CPU (which is 256) but thenum_queuesis not.This was previously managed by casting
num_cpustonum_queuesas USHORT. PR #1502 removes this cast and reimplementsnum_queuesas ULONG.This PR proposes an alternate solution to limit the
num_queuesmember by limiting thenum_cpusandmax_cpusvariables to MAX_CPU. This seems the appropriate place to limit CPU counts.