Entries tagged with performance

Entry tags:

vCenter Performance Deep Dive

vCenter Performance Deep Dive
https://youtu.be/aLtJ_sC0FSU

Entry tags:

performance,
vmware

Из переписки:
Рассмотрим твой вопрос детальней. Хост сообщает что "якобы" потребление памяти вм превышает допустимые дефолтные значения за 10 минут (85% варнинг, 95% крит) , причем ситуация повторяется на ряде вм. Прочитав esxtop bible ты с уверенностью можешь заявить, что хост не перегружен, свопинг и прочие радости управления памятью спят. Указанные вм представляют собой типовую конфигурацию, не подверженны кастомным лимитам, резервациям и не используют доп функционал вида latency high, vGPU, pci passthought, FT - который бы принудительно врубил нам резерв. Судя по всему, так же нет жалоб от конечных консьюмеров, пусть даже речь о домашней лабе или серверной ООО Колокольчик. Из доступных средств у тебя остается :
а) детальная проверка реального положения вещей изнутри гостевой ОС, отсутствия на них ПО с собственным мемори менеджментом и так далее
б) просмотр ВСЕХ release notes / known issues / fixed sh1t c момента VMware ESXi, 6.5.0, 4564106 и all way go to ESXi 6.5 EP 16 14874964
c) классически забить на алармы и жить счастливо

esxtop bible -
Interpreting esxtop Statistics
https://communities.vmware.com/docs/DOC-9279

Interpreting esxtop 4.1 Statistics
https://communities.vmware.com/docs/DOC-11812

ESXTOP
http://www.yellow-bricks.com/esxtop/

подразумевая, что все остальные пункты выполнены мы плавно утыкаемся лицом в - https://kb.vmware.com/s/article/2149496
Vmware VM memory usage heuristic over-reporting on ESXi 6.52149496

ну и для общего развития еще вот это - https://kb.vmware.com/s/article/2149787
Memory usage alarm triggers for certain types of Virtual Machines in ESXi 6.x (2149787)

Entry tags:

Распределение нагрузки CPU - network часть 2

Начало - https://robopet3.dreamwidth.org/44547.html
Продолжение.
На вопросы читателей отвечает Джон Сэвилл
Механизм динамических очередей виртуальных машин Dynamic VMQ (DVMQ) активирует различные очереди VMQ в сетевом адаптере, чтобы они были назначены различным виртуальным машинам, а каждая очередь VMQ может быть назначена различным процессорам в системе. Это позволяет распределять сетевую нагрузку на несколько локальных процессоров и помогает виртуальному коммутатору быстрее распределять трафик, поскольку конкретные очереди VMQ привязаны к определенной виртуальной машине. Обратите внимание, что теперь хост и каждая виртуальная машина привязаны к единой очереди и, таким образом, к одному ядру. Это означает, что каждый из них ограничен полосой пропускания, поддерживаемой одним ядром; это обычно примерно 5–6 Гбит/с на современных процессорах. Более подробную информацию можно найти по адресу: http://blogs.technet.com/b/networking/archive/2013/09/10/vmq-deep-dive-1 of-3.aspx.

https://www.osp.ru/winitpro/2015/09/13046825/

Но при работе с ним необходимо учитывать и The Minimum Root, or "Minroot" Configuration
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/manage-hyper-v-minroot-2016#the-minimum-root-or-minroot-configuration

и необходимость настроек - Plan the Use of vRSS
https://docs.microsoft.com/en-us/windows-server/networking/technologies/vrss/vrss-plan

Entry tags:

Еще раз про распределение нагрузки CPU - network

Не секрет, что часть нагрузки по обработке сетевых данных может быть перенесена в сетевую карту. Это такие известные вещи как TCP Chimney Offload / Receive Side Scaling / Network Direct Memory Access
И там же TCP Segmentation Offload (TSO) / Large Receive Offload (LRO)
начать читать про них можно здесь:
* Information about the TCP Chimney Offload, Receive Side Scaling, and Network Direct Memory Access features in Windows Server 2008
https://support.microsoft.com/en-us/help/951037/information-about-the-tcp-chimney-offload-receive-side-scaling-and-net

* Poor network performance or high network latency on Windows virtual machines (2008925)
https://kb.vmware.com/s/article/2008925

* Understanding TCP Segmentation Offload (TSO) and Large Receive Offload (LRO) in a VMware environment (2055140)
https://kb.vmware.com/s/article/2055140

В MS server 2016 работа с сетями была улучшена:
Virtual machine multi queues (VMMQ) - теперь для одной ВМ можно сделать несколько аппаратных очередей, что улучшает быстродействие.
https://www.vmgu.ru/news/microsoft-windows-server-2016-hyper-v

Virtual Machine Multiple Queues (VMMQ), formerly known as Hardware vRSS, is a NIC offload technology that provides scalability for processing network traffic of a VPort in the host (root partition) of a virtualized node. In essence, VMMQ extends the native RSS feature to the VPorts that are associated with the physical function (PF) of a NIC including the default VPort.

VMMQ is available for the VPorts exposed in the host (root partition) regardless of whether the NIC is operating in SR-IOV or VMQ mode. VMMQ is a feature available in Windows Server 2016.
https://docs.mellanox.com/pages/viewpage.action?pageId=12007112

Тем не менее, "из коробки" все работает не всегда так, как хочется, и необходимо читать про
Setting the Number of RSS Processors
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/setting-the-number-of-rss-processors

Performance tuning for low-latency packet processing
https://docs.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-performance-tuning-nics

Conservative RSS Profile assigns 2 CPUs when 1 RSS Queue is chosen RRS feed (ссылка)

и Broadcom RSS and VMQ Tuning on Windows Servers
https://www.broadcom.com/support/knowledgebase/1211161326328/rss-and-vmq-tuning-on-windows-servers

Для Hyper-v, кроме того, необходимо изучить Dynamic Virtual Machine Queue (dVMQ) и Dynamic Virtual Machine Multi-Queue (d.VMMQ).
https://github.com/microsoft/SDN/commit/749427c97f6abaf12ac4ebe191d62978857ae9f6
https://www.chelsio.com/wp-content/uploads/resources/t6-100g-dvmmq-windows.pdf

Synthetic Accelerations in a Nutshell – Windows Server 2019
https://techcommunity.microsoft.com/t5/networking-blog/synthetic-accelerations-in-a-nutshell-windows-server-2019/ba-p/653976

Для Vmware данные задачи идут по разделу NetQueue для 5.5, 6.5 и далее.
NetQueue takes advantage of the ability of some network adapters to deliver network traffic to the system in multiple receive queues that can be processed separately, allowing processing to be scaled to multiple CPUs, improving receive-side networking performance.
https://docs.vmware.com/en/VMware-vSphere/5.5/com.vmware.vsphere.networking.doc/GUID-6B708D13-145F-4DDA-BFB1-39BCC7CD0897.html
https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.networking.doc/GUID-6B708D13-145F-4DDA-BFB1-39BCC7CD0897.html

И их настройка, включение и выключение описаны отдельно, как и ранее существовавшие проблемы, в частности:
March 23, 2017
Receive Side Scaling is not functional for vmxnet3 on Windows 8 and Windows 2012 Server or later. This issue is caused by an update for the vmxnet3 driver that addressed RSS features added in NDIS version 6.30 rendering the functionality unusable. It is observed in VMXNET3 driver versions from 1.6.6.0 to 1.7.3.0.
The Windows Receive Side Scaling (RSS) feature is not functional on virtual machines running VMware Tools versions 9.10.0 up to 10.1.5
https://blogs.vmware.com/apps/2017/03/rush-post-vmware-tools-rss-incompatibility-issues.html

И затем исправление
due to an update for the vmxnet3 driver that addressed RSS features added in NDIS version 6.30 rendered the functionality unusable. NDIS 6.30 is supported in Windows 8, Windows 2012 Server and later
https://kb.vmware.com/s/article/2149587

Можно почитать и вот эту статью -
VMware Tools 10.2.5: Changes to VMXNET3 driver settings

It was finally resolved in mid-2017 with the release of VMware Tools 10.1.7. However, only vmxnet3 driver version 1.7.3.7 in VMware Tools 10.2.0 was recommended by VMware for Windows and Microsoft Business Critical applications.

Few months after, VMware introduces the following changes to vmxnet3 driver version 1.7.3.8:

Receive Side Scaling is enabled by default,
The default value of the Receive Throttle is set to 30.
https://virtualnomadblog.com/2018/04/04/vmware-tools-10-2-5/

Entry tags:

Storage perfomance - vscsiStats

Getting started with vscsiStats
https://cormachogan.com/2013/07/10/getting-started-with-vscsistats/

Using vscsiStats for Storage Performance Analysis
https://communities.vmware.com/docs/DOC-10095

Entry tags:

azure,
performance

Azure disks speed linits for Dv3 and Dsv3-series

Azure disks speed linits for Dv3 and Dsv3-series
- Max temp storage throughput: IOPS/Read MBps/Write MBps
Max NICs/Network bandwidth

https://docs.microsoft.com/en-us/azure/virtual-machines/dv3-dsv3-series

Entry tags:

Еще раз о памяти - Memory usage alarm triggers for certain types of Virtual Machines in ESXi 6.x

Memory usage alarm triggers for certain types of Virtual Machines in ESXi 6.x (2149787)
ESXi's active memory metric, despite being called "Memory Utilization" or "Memory Usage" in different parts of the UI, is in no way related to the in-guest memory metrics. It doesn't show how much guest OS memory is available nor how much guest memory is in an "active" working set or "resident". It is only used for making memory reclamation decisions in addition to other resource controls like shares, limits and reservation
https://kb.vmware.com/s/article/2149787

Entry tags:

Часто спрашиваемое: VAAI and the Unlimited VMs per Datastore Urban Myth

VAAI and the Unlimited VMs per Datastore Urban Myth
One of the oldest debates in VMware lore is “How many virtual machines should I place on each datastore?” For this discussion, the context is block storage (as opposed to NFS). There were all sorts of opinions as well as technical constraints to be considered. There was the tried and true rule of thumb answer of 10-15-20 which has more than stood the test of time. The best qualified answer was usually: “Whatever fits best for your consolidated environment” which translates to “it depends” and an invoice in consulting language.
http://www.boche.net/blog/2013/02/28/vaai-and-the-unlimited-vms-per-datastore-urban-myth/

Перевод
Миф о неограниченном размещении ВМ на VMFS-хранилище с VAAI
https://vmind.ru/2013/03/12/myth-unlimited-vm-vmfs-datastore-vaai/

Дополнительные ссылки из комментариев:
I have documented them at our blog site,
http://www.purestorage.com/blog/virtualization-and-flash-blog-post-3/
and at
http://www.purestorage.com/blog/1000-vms-demo-vmworld-2011/

Вопрос про ограничения VM per datastore не так прост, ограничения до сих пор есть в документах:

However, in most circumstances and environments, a target of 15 to 25 virtual machines per
datastore is the conservative recommendation. By maintaining a smaller number of virtual machines per
datastore, potential for I/O contention is greatly reduced, resulting in more consistent performance across the
environment.
https://www.dellemc.com/sl-si/collaterals/unauth/technical-guides-support-information/PowerVault_ME4_Series_and_VMware_vSphere.pdf

Более подробно ситуация описана в чуть более свежей статье
Understanding VMware ESXi Queuing and the FlashArray
https://www.codyhosterman.com/2017/02/understanding-vmware-esxi-queuing-and-the-flasharray/

и
Setting the Maximum Outstanding Disk Requests for virtual machines (1268)
https://kb.vmware.com/s/article/1268

Entry tags:

performance,
vmware

SQL в Vmware и скорость работы

По следам разбирательств:
При наличии в кластере серверов с разными частотами процессоров можно получить проблему с Read Time Stamp Counter - RDTSC
Проблема выражается в резком падении производительности.
Описание тут:
https://communities.vmware.com/thread/154837
There is a known problem with RDTSC virtualization. By default, VMware virtualizes RDTSC but "monitor_control.virtual_rdtsc" option allows to disable RDTSC interception to improve time measurement resolution in VM. Disabled RDTSC virtualization may cause guest system to hangup at boot that is mentioned here:

http://www.vmware.com/pdf/WS6_Performance_Tuning_and_Benchmarking.pdf
http://www.vmware.com/pdf/vmware_timekeeping.pdf

Guest Windows hangs at boot because HAL timer initialization functions (HalpPmTimerScaleTimers, HalpScaleTimers) set TSC to zero several times to use its absolute value for time calculations instead of simply calculating the difference without resetting TSC. If RDTSC is virtualized, it returns a relatively small value because WRMSR (used to set TSC to zero) is virtualized too. If RDTSC is not virtualized, guest system receives host TSC value that is usually very big and cause divide overflow.

A recommended workaround is to start guest system with RDTSC virtualized, wait until it boots, suspend it, disable RDTSC virtualization then resume the VM. Since TSC is zeroed only several times at boot, guest can successfully use host TSC values later.

Причина проблемы - тут
Да мой старый laptop в несколько раз мощнее, чем ваш production server
https://habr.com/ru/post/496612/

Entry tags:

performance,
vmware

VMware Capacity Planner все, переходите на Live Optics

Поскольку VMware Capacity Planner "все", то знающие люди посоветовали переходить на Live Optics
Детальнее:
Using Live Optics to size a VSAN Deployment
https://virtualdeets.wordpress.com/2018/08/16/using-live-optics-to-size-a-vsan-deployment/

New HCI Assessment powered by Live Optics
https://blogs.vmware.com/virtualblocks/2018/05/01/vmware-hci-assessment/

Entry tags:

performance,
vmware

Vmware turbo boost settings

Технология Intel Turbo Boost в VMware ESXi. Cчетчик %Aperf/Mperf.
https://it-pilot.ru/2018/08/24/intel-turbo-boost-vmware-esxi/

Использует ли гипервизор VMware ESXi 6.7 возможности процессора Intel Turbo Boost?
https://www.vmgu.ru/news/vmware-esxi-and-intel-turbo-boost

Performance Best Practices for VMware vSphere 6.5
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/Perf_Best_Practices_vSphere65.pdf

Entry tags:

Exchange Load Generator 2013

Exchange Load Generator is a simulation tool to measure the impact of MAPI, OWA, ActiveSync, IMAP, POP and SMTP clients on Exchange servers.

Entry tags:

Crystal Mark

However for an All-Flash array (or really, anything more than a single disk) it is a completely unsuitably tool. Non-Unique data, insufficient queuing, limited dataset size and limited runtime mean that any results generated are going to be completely unrealistic.

As a tool to test the performance of a single HDD or SSD in a laptop, CrystalDiskMark likely does a fairly reasonably job.
https://blog.docbert.org/flash-testing-tools-crystaldiskmark/

Entry tags:

VMware perfomance

HOL-2004-01-SDC - Mastering vSphere Performance
https://labs.hol.vmware.com/HOL/catalogs/lab/6424

https://docs.vmware.com/en/VMware-vSphere/6.7/vsphere-esxi-vcenter-server-67-resource-management-guide.pdf

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vsphere-esxi-vcenter-server-67-performance-best-practices.pdf

Entry tags:

Different raid6 penalty

Normal RAID 6 write penalty = 6
Full stripe write penalty = N/N-2
Additional topics:
1. Sequential write
2. FS cluster size & stripe size

Entry tags:

Vmware speed and BIOS optimization

BIOS Settings HP DL380 Gen10 for VMware vSphere ESXi

http://www.running-system.com/bios-settings-hp-dl380-gen10/

Performance Best Practices for VMware vSphere 6.7
VMware ESXi 6.7
vCenter Server 6.7
July 27, 2018
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vsphere-esxi-vcenter-server-67-performance-best-practices.pdf

Entry tags:

ESXi host latency

Хороший пост про latency на стороне ESXi хостов.
esxtop позволяет посмотреть 4 метрики связанные с задержками - GAVG, DAVG, KAVG и QAVG. Задержки в VM, задержки от хоста до СХД, задержки на уровне ядра ESXi и задержки внутри очередей в ESXi. Но с последними двумя метриками всё не так просто.
QAVG вроде бы является частью KAVG, но иногда случается так, что QAVG больше, чем KAVG. А еще по первой ссылке другой классный пост, там уже про очереди в ESXi.
#latency #VMware #ESXi #performance

https://www.codyhosterman.com/2018/03/what-is-the-latency-stat-qavg/