PowerEdge: 문제 해결을 위한 유용한 "NVIDIA-SMI" 쿼리

Summary: 이 문서에서는 NVIDIA GPU 카드 문제 해결을 위한 유용한 "NVIDIA-SMI" 쿼리를 보여 줍니다.

Αυτό το άρθρο ισχύει για Αυτό το άρθρο δεν ισχύει για Αυτό το άρθρο δεν συνδέεται με κάποιο συγκεκριμένο προϊόν. Δεν προσδιορίζονται όλες οι εκδόσεις προϊόντων σε αυτό το άρθρο.

Instructions

VBIOS 버전

각 디바이스의 VBIOS 버전을 쿼리합니다.

$ nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv

name, pci.bus_id, vbios_version
GRID K2, 0000:87:00.0, 80.04.D4.00.07
GRID K2, 0000:88:00.0, 80.04.D4.00.08

 

Query Description
timestamp The timestamp of where the query was made in format "YYYY/MM/DD HH:MM:SS.msec".
gpu_name The official product name of the GPU. 
This is an alphanumeric string. For all products.
gpu_bus_id PCI bus id as "domain:bus:device.function", in hex.
vbios_version The BIOS of the GPU board.

호스트 측 로깅을 위한 GPU 메트릭 쿼리

이 쿼리는 하이퍼바이저 쪽 GPU 메트릭을 모니터링하는 데 유용합니다.

이 쿼리는 ESXi 및 XenServer 모두에 대해 작동합니다.

$ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,
pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,
memory.total,memory.free,memory.used --format=csv -l 5
쿼리에 매개변수를 더 추가할 때는 쿼리 옵션 사이에 공백이 추가되지 않도록 하십시오.
Query Description
timestamp The timestamp of where the query was made in format "YYYY/MM/DD HH:MM:SS.msec".
name The official product name of the GPU. 
This is an alphanumeric string. For all products.
pci.bus_id PCI bus id as "domain:bus:device.function", in hex.
driver_version The version of the installed NVIDIA display driver. 
This is an alphanumeric string.
pstate The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).
pcie.link.gen.max The maximum PCI-E link generation possible with this GPU and system configuration. 
For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation.
pcie.link.gen.current The current PCI-E link generation. These may be reduced when the GPU is not in use.
temperature.gpu Core GPU temperature. in degrees C.

utilization.gpu

Percent of time over the past sample period during which one or more kernels was executing on the GPU.
The sample period may be between 1 second and 1/6 second depending on the product.

utilization.memory

Percent of time over the past sample period during which global (device) memory was being read or written.
The sample period may be between 1 second and 1/6 second depending on the product.

memory.total

Total installed GPU memory.

memory.free

Total free memory.

memory.used

Total memory allocated by active contexts.

다음을 실행하여 쿼리 인수의 전체 목록을 얻을 수 있습니다. nvidia-smi --help-query-gpu


nvidia-smi 로깅을 위한 사용

단기 로깅

옵션 추가 "-f <filename>"를 사용하여 출력을 파일로 리디렉션합니다.

앞에 "timeout -t <seconds>"를 클릭하여 <seconds> 로깅을 중지합니다.

쿼리 세분성이 필요한 용도에 맞게 크기가 조정되었는지 확인합니다.

 

Purpose nvidia-smi "-l" value interval timeout "-t" value Duration
Fine-grain GPU behavior 5 5 seconds 600 10 minutes
General GPU behavior 60 1 minute 3600 1 hour
Broad GPU behavior 3600 1 hour 86400 24 hours

 

장기 로깅

파일 이름 및 쿼리 매개 변수에 타임스탬프 데이터가 추가된 로그 파일의 생성을 자동화하는 셸 스크립트를 생성합니다.

사용자 지정 추가 cron 작업을 /var/spool/cron/crontabs 필요한 간격으로 스크립트를 호출합니다.


클럭 및 전원에 사용되는 추가 하위 수준 명령

"지속성" 모드를 활성화합니다.

드라이버에 대해 PM(Persistence Mode)을 활성화하지 않는 한 클럭 및 전원에 대한 아래의 모든 설정은 프로그램 실행 사이에 재설정됩니다.

또한 nvidia-smi PM 모드가 활성화된 경우 명령이 더 빠르게 실행됩니다.

nvidia-smi -pm 1 - 시계, 전원 및 기타 설정이 프로그램 실행 및 드라이버 호출에서 유지되도록 합니다.


시계

Command Detail
nvidia-smi -ac <MEM clock, Graphics clock>   View clocks supported
nvidia-smi –q –d SUPPORTED_CLOCKS Set one of supported clocks
nvidia-smi -q –d CLOCK View current clock
nvidia-smi --auto-boost-default=ENABLED -i 0    Enable boosting GPU clocks (K80 and later)
nvidia-smi --rac                                                             Reset clocks back to base

전원

nvidia-smi –pl N  Set power cap (maximum wattage the GPU will use)
nvidia-smi -pm 1 Enable persistence mode
nvidia-smi stats -i <device#> -d pwrDraw Command that provides continuous monitoring of detail stats such as power
nvidia-smi --query-gpu=index, timestamp,power.draw,clocks.sm,clocks.mem,clocks.gr --format=csv -l 1 Continuously provide time stamped power and clock

기타 유용한 명령 

명령 설명
nvidia-smi -q Query all the GPUs seen by the driver and display all readable attributes for a GPU.
nvidia-smi Displays current GPU status, driver information and host of other statistics.
nvidia-smi -l Scrolls the output of nvidia-smi continuously until stopped. 
nvidia-smi --query gpu=index,timestamp,power.draw,clocks.sm,clocks.mem,clocks.gr --format=csv Continuously provides time stamped power and clock information.
nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv Query the VBIOS version of each GPU in a system.
lspci -n | grep 10de Determines if the GPU is in compute mode or graphics mode.
nvidia-smi nvlink -s -i<device#> Displays NVLink state for a specific GPU.
gpuswitchmode --listgpumodes Displays the capability of GRID 2.0 cards and switching between compute and graphics. The package is not in the normal CUDA or NVIDIA driver.
nvidia-smi -h Displays the smi commands and syntax form.
nvidia-bug-report.sh Pulls out a bug report which is sent to Level 3 support technician/NVIDIA.
nvidia-smi --query-retired-pages=gpu_uuid,retired_pages.address,retired_page.cause --format=csv Pulls out retired pages, GPU UUID, page fault address and the cause of page fault.
nvidia-smi stats Displays device statistics.
nvcc --version Shows installed CUDA version.
nvidia-smi pmon Displays process statistics in scrolling format.
nvidia-smi nvlink -c -i<device#> Displays NVLink capabilities for a specific GPU.
gpuswitchmode --gpumode graphics Changes the personality of the GPU to graphics from compute (M6 and M60 GPUs).
gpuswitchmode --gpumode compute Changes the personality of the GPU to compute from graphics (M6 and M60 GPUs).

Επηρεαζόμενα προϊόντα

PowerEdge XR2, OEMR R640, OEMR R650, OEMR R650xs, OEMR R6525, OEMR R660, OEMR XL R660, OEMR R660xs, OEMR R6625, OEMR R740, OEMR XL R740, OEMR R740xd, OEMR XL R740xd, OEMR R740xd2, OEMR R7425, OEMR R750, OEMR R750xa, OEMR R750xs, OEMR R7525, OEMR R760 , OEMR R760xa, OEMR R760XD2, OEMR XL R760, OEMR R760xs, OEMR R7625, OEMR R840, OEMR R860, OEMR R940, OEMR R940xa, OEMR R960, OEMR T440, OEMR T550, OEMR T560, OEMR T640, OEMR XL R660xs, OEMR XL R6625, OEMR XL R6725, OEMR XL R760xs, OEMR XL R7625, OEMR XL R7725, OEMR XR11, OEMR XR12, OEMR XR5610, OEMR XR7620, PowerEdge HS5610, PowerEdge HS5620, PowerEdge R640, PowerEdge R6415, PowerEdge R650, PowerEdge R650xs, PowerEdge R6525, PowerEdge R660, PowerEdge R660xs, PowerEdge R6625, PowerEdge R670, PowerEdge R740, PowerEdge R740XD, PowerEdge R740XD2, PowerEdge R7425, PowerEdge R750, PowerEdge R750XA, PowerEdge R750xs, PowerEdge R7525, PowerEdge R760, PowerEdge R760XA, PowerEdge R760xd2, PowerEdge R760xs, PowerEdge R7625, PowerEdge R770, PowerEdge R7725, PowerEdge R840, PowerEdge R860, PowerEdge R940, PowerEdge R940xa, PowerEdge R960, PowerEdge T440, PowerEdge T550, PowerEdge T560, PowerEdge T640, PowerEdge XR11, PowerEdge XR12, PowerEdge XR5610, PowerEdge XR7620 ...
Ιδιότητες άρθρου
Article Number: 000190243
Article Type: How To
Τελευταία τροποποίηση: 22 Ιουλ 2025
Version:  3
Βρείτε απαντήσεις στις ερωτήσεις σας από άλλους χρήστες της Dell
Υπηρεσίες υποστήριξης
Ελέγξτε αν η συσκευή σας καλύπτεται από τις Υπηρεσίες υποστήριξης.