这里我们以 nvidia/Llama-3.1-Nemotron-70B-Instruct-HF 为例来说明问题
代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
import torch from transformers import AutoModelForCausalLM, from collections import defaultdict # 检测可用的GPU数量 NUM_GPUS = torch.cuda.device_count() print(f"NUM_GPUS: {NUM_GPUS}") MODEL_ID = "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF" print(f"Load Model {MODEL_ID} ... ") model = AutoModelForCausalLM.from_pretrained( MODEL_ID, device_map="auto", torch_dtype=torch.bfloat16 ) # 定义字典来存储每一层的参数数量、内存大小和所在设备 layerwise_stats = defaultdict(lambda: {'num_params': 0, 'size_mb': 0, 'device': None}) # 定义字典来存储每个设备(GPU/CPU)的总内存使用 device_memory_usage = defaultdict(float) # 遍历模型参数 for name, param in model.named_parameters(): # 获取主要的层级名称,例如 model.layers.0,model.embed_tokens layer_name = '.'.join(name.split('.')[:3]) if 'layers' in name else name.split('.')[0] param_size = param.numel() * param.element_size() / 1024 / 1024 # 计算内存占用,单位为MB layerwise_stats[layer_name]['num_params'] += param.numel() # 统计每一层的总参数数量 layerwise_stats[layer_name]['size_mb'] += param_size # 统计每一层的总内存大小 layerwise_stats[layer_name]['device'] = param.device # 记录每一层所在的设备 # 记录每个设备的总内存使用情况 device_memory_usage[param.device] += param_size # 输出每一层的统计结果 for layer_name, stats in layerwise_stats.items(): print(f"Layer: {layer_name} | Total parameters: {stats['num_params']:,} | Total memory size: {stats['size_mb']:.2f} MB | Device: {stats['device']}") # 计算并输出模型的总参数数量和总内存占用 total_params = sum(stats['num_params'] for stats in layerwise_stats.values()) total_size = sum(stats['size_mb'] for stats in layerwise_stats.values()) print(f"\nTotal number of parameters in the model: {total_params:,}") print(f"Total memory size of the model: {total_size:.2f} MB") # 输出每个设备的总内存使用情况 print("\nMemory usage per device:") for device, memory in device_memory_usage.items(): print(f"Device: {device} | Total memory used: {memory:.2f} MB") |
输出结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
NUM_GPUS: 6 Load Model nvidia/Llama-3.1-Nemotron-70B-Instruct-HF ... Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 30/30 [08:41<00:00, 17.38s/it] Some parameters are on the meta device because they were offloaded to the cpu. Layer: model | Total parameters: 1,050,681,344 | Total memory size: 2004.02 MB | Device: cuda:5 Layer: model.layers.0 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.1 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.2 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.3 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.4 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.5 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.6 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.7 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.8 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.9 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:0 Layer: model.layers.10 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.11 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.12 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.13 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.14 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.15 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.16 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.17 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.18 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.19 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.20 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.21 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.22 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.23 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:1 Layer: model.layers.24 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.25 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.26 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.27 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.28 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.29 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.30 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.31 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.32 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.33 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.34 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.35 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.36 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.37 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:2 Layer: model.layers.38 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.39 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.40 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.41 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.42 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.43 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.44 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.45 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.46 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.47 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.48 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.49 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.50 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.51 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:3 Layer: model.layers.52 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.53 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.54 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.55 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.56 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.57 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.58 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.59 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.60 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.61 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.62 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.63 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.64 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.65 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:4 Layer: model.layers.66 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.67 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.68 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.69 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.70 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.71 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.72 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.73 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.74 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.75 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.76 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.77 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.78 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: model.layers.79 | Total parameters: 855,654,400 | Total memory size: 1632.03 MB | Device: cuda:5 Layer: lm_head | Total parameters: 1,050,673,152 | Total memory size: 2004.00 MB | Device: meta Total number of parameters in the model: 70,553,706,496 Total memory size of the model: 134570.52 MB Memory usage per device: Device: cuda:0 | Total memory used: 18324.31 MB Device: cuda:1 | Total memory used: 22848.44 MB Device: cuda:2 | Total memory used: 22848.44 MB Device: cuda:3 | Total memory used: 22848.44 MB Device: cuda:4 | Total memory used: 22848.44 MB Device: cuda:5 | Total memory used: 22848.45 MB Device: meta | Total memory used: 2004.00 MB |