了解 LLaMA-2 模型结构(2)

5. 如何打印模型参数？

在使用像PyTorch这样的深度学习框架时，你可以通过几种方法打印模型的参数。以下是一些常用的方法：

方法 1: 打印模型的所有参数

这种方法会遍历模型的所有参数，并打印它们。这对于小型模型可能是可行的，但对于像Llama2这样的大型模型，这可能会产生大量的输出。

命名为 test03.py，文件保存到 newsrc 目录下：

from transformers import AutoModelForCausalLM

# 指定模型路径
model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型
hf_model = AutoModelForCausalLM.from_pretrained(model_path)

for param_tensor in hf_model.parameters():
    print(param_tensor)

from transformers import AutoModelForCausalLM

# 指定模型路径

model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型

hf_model = AutoModelForCausalLM.from_pretrained(model_path)

for param_tensor in hf_model.parameters():

print(param_tensor)

运行test03.py

 python newsrc/test03.py
Loading checkpoint shards: 100%|██████████| 2/2 [01:35<00:00, 47.67s/it]
Parameter containing:
tensor([[ 1.1921e-06, -1.7881e-06, -4.2915e-06,  ...,  8.3447e-07,
         -6.4373e-06,  8.9407e-07],
        [ 1.8387e-03, -3.8147e-03,  9.6130e-04,  ..., -9.0332e-03,
          2.6550e-03, -3.7537e-03],
        [ 1.0193e-02,  9.7656e-03, -5.2795e-03,  ...,  2.9297e-03,
          4.0817e-04, -5.0964e-03],
        ...,
        [-1.3550e-02, -3.5095e-03, -1.8921e-02,  ..., -9.3384e-03,
          8.7891e-03, -1.2741e-03],
        [-1.0681e-02,  8.9722e-03,  1.2573e-02,  ..., -3.3691e-02,
         -1.6235e-02,  3.0212e-03],
        [-9.0942e-03, -1.8082e-03, -6.9809e-04,  ...,  3.8452e-03,
         -1.2085e-02,  7.2861e-04]], requires_grad=True)
Parameter containing:
tensor([[-0.0060, -0.0146, -0.0021,  ...,  0.0042,  0.0018, -0.0035],
        [ 0.0142, -0.0043,  0.0032,  ..., -0.0092, -0.0108,  0.0073],
        [-0.0137,  0.0121,  0.0002,  ...,  0.0061,  0.0181, -0.0030],
        ...,
        [ 0.0018,  0.0093, -0.0006,  ...,  0.0092, -0.0289,  0.0085],
        [ 0.0249,  0.0116,  0.0035,  ..., -0.0322, -0.0165, -0.0111],
        [-0.0136, -0.0067,  0.0016,  ...,  0.0176,  0.0175, -0.0083]],
       requires_grad=True)
...
Parameter containing:
tensor([[-0.0022, -0.0255, -0.0085,  ..., -0.0349, -0.0154,  0.0232],
        [ 0.0415,  0.0025, -0.0031,  ..., -0.0028, -0.0156,  0.0137],
        [-0.0136, -0.0101,  0.0239,  ...,  0.0304, -0.0019,  0.0238],
        ...,
        [ 0.0040,  0.0322, -0.0021,  ...,  0.0102,  0.0062,  0.0210],
        [-0.0006,  0.0483, -0.0078,  ..., -0.0070,  0.0210,  0.0219],
        [ 0.0258, -0.0342,  0.0166,  ...,  0.0068,  0.0016,  0.0269]],
       requires_grad=True)
Parameter containing:
tensor([0.4824, 0.4805, 0.4336,  ..., 0.4258, 0.4531, 0.4785],
       requires_grad=True)
Parameter containing:
tensor([0.4297, 0.4297, 0.4355,  ..., 0.4180, 0.4043, 0.4238],
       requires_grad=True)
Parameter containing:
tensor([1.8594, 1.8516, 1.7969,  ..., 1.7109, 1.8125, 1.5938],
       requires_grad=True)
Parameter containing:
tensor([[-0.0036,  0.0027, -0.0074,  ...,  0.0039, -0.0084,  0.0065],
        [-0.0311,  0.0449, -0.0029,  ..., -0.0228,  0.0147,  0.0320],
        [-0.0125,  0.0014,  0.0188,  ..., -0.0264,  0.0156, -0.0073],
        ...,
        [-0.0294, -0.0172, -0.0029,  ...,  0.0140, -0.0116, -0.0234],
        [ 0.0204,  0.0239,  0.0272,  ...,  0.0048, -0.0097, -0.0064],
        [ 0.0081, -0.0057,  0.0082,  ..., -0.0282, -0.0164,  0.0311]],
       requires_grad=True)

python newsrc/test03.py

Loading checkpoint shards: 100%|██████████| 2/2 [01:35<00:00, 47.67s/it]

Parameter containing:

tensor([[ 1.1921e-06, -1.7881e-06, -4.2915e-06, ..., 8.3447e-07,

-6.4373e-06, 8.9407e-07],

[ 1.8387e-03, -3.8147e-03, 9.6130e-04, ..., -9.0332e-03,

2.6550e-03, -3.7537e-03],

[ 1.0193e-02, 9.7656e-03, -5.2795e-03, ..., 2.9297e-03,

4.0817e-04, -5.0964e-03],

...,

[-1.3550e-02, -3.5095e-03, -1.8921e-02, ..., -9.3384e-03,

8.7891e-03, -1.2741e-03],

[-1.0681e-02, 8.9722e-03, 1.2573e-02, ..., -3.3691e-02,

-1.6235e-02, 3.0212e-03],

[-9.0942e-03, -1.8082e-03, -6.9809e-04, ..., 3.8452e-03,

-1.2085e-02, 7.2861e-04]], requires_grad=True)

Parameter containing:

tensor([[-0.0060, -0.0146, -0.0021, ..., 0.0042, 0.0018, -0.0035],

[ 0.0142, -0.0043, 0.0032, ..., -0.0092, -0.0108, 0.0073],

[-0.0137, 0.0121, 0.0002, ..., 0.0061, 0.0181, -0.0030],

...,

[ 0.0018, 0.0093, -0.0006, ..., 0.0092, -0.0289, 0.0085],

[ 0.0249, 0.0116, 0.0035, ..., -0.0322, -0.0165, -0.0111],

[-0.0136, -0.0067, 0.0016, ..., 0.0176, 0.0175, -0.0083]],

requires_grad=True)

...

Parameter containing:

tensor([[-0.0022, -0.0255, -0.0085, ..., -0.0349, -0.0154, 0.0232],

[ 0.0415, 0.0025, -0.0031, ..., -0.0028, -0.0156, 0.0137],

[-0.0136, -0.0101, 0.0239, ..., 0.0304, -0.0019, 0.0238],

...,

[ 0.0040, 0.0322, -0.0021, ..., 0.0102, 0.0062, 0.0210],

[-0.0006, 0.0483, -0.0078, ..., -0.0070, 0.0210, 0.0219],

[ 0.0258, -0.0342, 0.0166, ..., 0.0068, 0.0016, 0.0269]],

requires_grad=True)

Parameter containing:

tensor([0.4824, 0.4805, 0.4336, ..., 0.4258, 0.4531, 0.4785],

requires_grad=True)

Parameter containing:

tensor([0.4297, 0.4297, 0.4355, ..., 0.4180, 0.4043, 0.4238],

requires_grad=True)

Parameter containing:

tensor([1.8594, 1.8516, 1.7969, ..., 1.7109, 1.8125, 1.5938],

requires_grad=True)

Parameter containing:

tensor([[-0.0036, 0.0027, -0.0074, ..., 0.0039, -0.0084, 0.0065],

[-0.0311, 0.0449, -0.0029, ..., -0.0228, 0.0147, 0.0320],

[-0.0125, 0.0014, 0.0188, ..., -0.0264, 0.0156, -0.0073],

...,

[-0.0294, -0.0172, -0.0029, ..., 0.0140, -0.0116, -0.0234],

[ 0.0204, 0.0239, 0.0272, ..., 0.0048, -0.0097, -0.0064],

[ 0.0081, -0.0057, 0.0082, ..., -0.0282, -0.0164, 0.0311]],

requires_grad=True)

方法 2: 打印参数的形状和名称

这种方法不会直接打印每个参数的值，而是打印参数的名称和形状，这对于了解模型的结构非常有帮助。

命名为 test04.py，文件保存到 newsrc 目录下：

from transformers import AutoModelForCausalLM

# 指定模型路径
model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型
hf_model = AutoModelForCausalLM.from_pretrained(model_path)

for name, param in hf_model.named_parameters():
    print(f"{name}: {param.size()}")

from transformers import AutoModelForCausalLM

# 指定模型路径

model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型

hf_model = AutoModelForCausalLM.from_pretrained(model_path)

for name, param in hf_model.named_parameters():

print(f"{name}: {param.size()}")

运行test04.py

 python newsrc/test04.py
Loading checkpoint shards: 100%|███████████████| 2/2 [01:36<00:00, 48.46s/it]
model.embed_tokens.weight: torch.Size([32000, 4096])
model.layers.0.self_attn.q_proj.weight: torch.Size([4096, 4096])
model.layers.0.self_attn.k_proj.weight: torch.Size([4096, 4096])
model.layers.0.self_attn.v_proj.weight: torch.Size([4096, 4096])
model.layers.0.self_attn.o_proj.weight: torch.Size([4096, 4096])
model.layers.0.mlp.gate_proj.weight: torch.Size([11008, 4096])
model.layers.0.mlp.up_proj.weight: torch.Size([11008, 4096])
model.layers.0.mlp.down_proj.weight: torch.Size([4096, 11008])
model.layers.0.input_layernorm.weight: torch.Size([4096])
model.layers.0.post_attention_layernorm.weight: torch.Size([4096])
model.layers.1.self_attn.q_proj.weight: torch.Size([4096, 4096])
model.layers.1.self_attn.k_proj.weight: torch.Size([4096, 4096])
model.layers.1.self_attn.v_proj.weight: torch.Size([4096, 4096])
model.layers.1.self_attn.o_proj.weight: torch.Size([4096, 4096])
model.layers.1.mlp.gate_proj.weight: torch.Size([11008, 4096])
model.layers.1.mlp.up_proj.weight: torch.Size([11008, 4096])
model.layers.1.mlp.down_proj.weight: torch.Size([4096, 11008])
model.layers.1.input_layernorm.weight: torch.Size([4096])
model.layers.1.post_attention_layernorm.weight: torch.Size([4096])
...
model.layers.31.self_attn.q_proj.weight: torch.Size([4096, 4096])
model.layers.31.self_attn.k_proj.weight: torch.Size([4096, 4096])
model.layers.31.self_attn.v_proj.weight: torch.Size([4096, 4096])
model.layers.31.self_attn.o_proj.weight: torch.Size([4096, 4096])
model.layers.31.mlp.gate_proj.weight: torch.Size([11008, 4096])
model.layers.31.mlp.up_proj.weight: torch.Size([11008, 4096])
model.layers.31.mlp.down_proj.weight: torch.Size([4096, 11008])
model.layers.31.input_layernorm.weight: torch.Size([4096])
model.layers.31.post_attention_layernorm.weight: torch.Size([4096])
model.norm.weight: torch.Size([4096])
lm_head.weight: torch.Size([32000, 4096])

python newsrc/test04.py

Loading checkpoint shards: 100%|███████████████| 2/2 [01:36<00:00, 48.46s/it]

model.embed_tokens.weight: torch.Size([32000, 4096])

model.layers.0.self_attn.q_proj.weight: torch.Size([4096, 4096])

model.layers.0.self_attn.k_proj.weight: torch.Size([4096, 4096])

model.layers.0.self_attn.v_proj.weight: torch.Size([4096, 4096])

model.layers.0.self_attn.o_proj.weight: torch.Size([4096, 4096])

model.layers.0.mlp.gate_proj.weight: torch.Size([11008, 4096])

model.layers.0.mlp.up_proj.weight: torch.Size([11008, 4096])

model.layers.0.mlp.down_proj.weight: torch.Size([4096, 11008])

model.layers.0.input_layernorm.weight: torch.Size([4096])

model.layers.0.post_attention_layernorm.weight: torch.Size([4096])

model.layers.1.self_attn.q_proj.weight: torch.Size([4096, 4096])

model.layers.1.self_attn.k_proj.weight: torch.Size([4096, 4096])

model.layers.1.self_attn.v_proj.weight: torch.Size([4096, 4096])

model.layers.1.self_attn.o_proj.weight: torch.Size([4096, 4096])

model.layers.1.mlp.gate_proj.weight: torch.Size([11008, 4096])

model.layers.1.mlp.up_proj.weight: torch.Size([11008, 4096])

model.layers.1.mlp.down_proj.weight: torch.Size([4096, 11008])

model.layers.1.input_layernorm.weight: torch.Size([4096])

model.layers.1.post_attention_layernorm.weight: torch.Size([4096])

...

model.layers.31.self_attn.q_proj.weight: torch.Size([4096, 4096])

model.layers.31.self_attn.k_proj.weight: torch.Size([4096, 4096])

model.layers.31.self_attn.v_proj.weight: torch.Size([4096, 4096])

model.layers.31.self_attn.o_proj.weight: torch.Size([4096, 4096])

model.layers.31.mlp.gate_proj.weight: torch.Size([11008, 4096])

model.layers.31.mlp.up_proj.weight: torch.Size([11008, 4096])

model.layers.31.mlp.down_proj.weight: torch.Size([4096, 11008])

model.layers.31.input_layernorm.weight: torch.Size([4096])

model.layers.31.post_attention_layernorm.weight: torch.Size([4096])

model.norm.weight: torch.Size([4096])

lm_head.weight: torch.Size([32000, 4096])

方法 3: 打印特定层的参数

如果你只对模型中特定层的参数感兴趣，可以直接访问这些层并打印它们的参数。例如，如果你想打印第一个解码器层的参数：

命名为 test05.py，文件保存到 newsrc 目录下：

from transformers import AutoModelForCausalLM

# 指定模型路径
model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型
hf_model = AutoModelForCausalLM.from_pretrained(model_path)

print("1:")
for name, module in hf_model.named_children():
    print(f"{name}: {module.__class__.__name__}")

print("2:")
for name, module in hf_model.model.named_children():
    print(f"{name}: {module.__class__.__name__}")

print("3:")
for name, param in hf_model.lm_head.named_parameters():
    print(f"{name}: {param.size()}")

from transformers import AutoModelForCausalLM

# 指定模型路径

model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型

hf_model = AutoModelForCausalLM.from_pretrained(model_path)

print("1:")

for name, module in hf_model.named_children():

print(f"{name}: {module.__class__.__name__}")

print("2:")

for name, module in hf_model.model.named_children():

print(f"{name}: {module.__class__.__name__}")

print("3:")

for name, param in hf_model.lm_head.named_parameters():

print(f"{name}: {param.size()}")

运行test05.py

python newsrc/test05.py
Loading checkpoint shards: 100%|██████████| 2/2 [01:35<00:00, 47.92s/it]
1:
model: LlamaModel
lm_head: Linear
2:
embed_tokens: Embedding
layers: ModuleList
norm: LlamaRMSNorm
3:
weight: torch.Size([32000, 4096])

python newsrc/test05.py

Loading checkpoint shards: 100%|██████████| 2/2 [01:35<00:00, 47.92s/it]

model: LlamaModel

lm_head: Linear

embed_tokens: Embedding

layers: ModuleList

norm: LlamaRMSNorm

weight: torch.Size([32000, 4096])

6. 打印出模型的配置信息

模型的配置信息，包括模型的各种参数和设置。config对象通常包含了创建或初始化模型时使用的所有配置选项，如模型大小、词汇表大小、嵌入维度、注意力机制的头数等。这些信息对于理解模型的能力和设计至关重要。

命名为 test06.py，文件保存到 newsrc 目录下：

from transformers import AutoModelForCausalLM
# 指定模型路径
model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型
hf_model = AutoModelForCausalLM.from_pretrained(model_path)

print(hf_model.config)

from transformers import AutoModelForCausalLM

# 指定模型路径

model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型

hf_model = AutoModelForCausalLM.from_pretrained(model_path)

print(hf_model.config)

运行test06.py

python newsrc/test06.py
Loading checkpoint shards: 100%|████████| 2/2 [01:35<00:00, 47.94s/it]
LlamaConfig {
  "_name_or_path": "meta-llama/Llama-2-7b-chat-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 32000
}

python newsrc/test06.py

Loading checkpoint shards: 100%|████████| 2/2 [01:35<00:00, 47.94s/it]

LlamaConfig {

"_name_or_path": "meta-llama/Llama-2-7b-chat-hf",

"architectures": [

"LlamaForCausalLM"

"attention_bias": false,

"attention_dropout": 0.0,

"bos_token_id": 1,

"eos_token_id": 2,

"hidden_act": "silu",

"hidden_size": 4096,

"initializer_range": 0.02,

"intermediate_size": 11008,

"max_position_embeddings": 4096,

"model_type": "llama",

"num_attention_heads": 32,

"num_hidden_layers": 32,

"num_key_value_heads": 32,

"pretraining_tp": 1,

"rms_norm_eps": 1e-05,

"rope_scaling": null,

"rope_theta": 10000.0,

"tie_word_embeddings": false,

"torch_dtype": "float16",

"transformers_version": "4.38.2",

"use_cache": true,

"vocab_size": 32000

}

从提供的LlamaConfig配置输出中，我们可以看到Llama2模型的关键配置参数。这些参数提供了模型架构的深入了解，下面是其中一些重要参数的解释：

_name_or_path: 指定了模型加载的路径或名称，这里是"meta-llama/Llama-2-7b-chat-hf"。
architectures: 包含了模型使用的架构类型，这里是"LlamaForCausalLM"，表明这是一个因果语言模型。
attention_dropout: 注意力层中使用的dropout比率，这里设置为0.0，意味着没有应用dropout。
bos_token_id 和 eos_token_id: 分别表示文本序列开始（BOS）和结束（EOS）的特殊标记的ID。
hidden_act: 隐藏层使用的激活函数，这里是"silu"（Sigmoid线性单元，也称为Swish激活函数）。
hidden_size: 隐藏层的大小，这里是4096，表明每个隐藏层的输出维度。
intermediate_size: 前馈网络（feed-forward network）层中间层的大小，这里是11008。
max_position_embeddings: 最大的位置嵌入维度，这里是4096，限定了模型能处理的最大序列长度。
model_type: 模型的类型，这里是"llama"。
num_attention_heads: 注意力机制中使用的头数，这里是32。
num_hidden_layers: 隐藏层的数量，这里是32。
vocab_size: 词汇表的大小，这里是32000。
torch_dtype: 模型中使用的数据类型，这里是"float16"，表明模型参数使用的是半精度浮点数，这有助于减少模型的内存占用，加快计算速度。

这个配置概览揭示了Llama2模型的一些关键特性，包括其深度、宽度和操作的技术细节。例如，32层的深度与4096的隐藏大小和32的注意力头数共同决定了模型的能力和复杂性，使其适合处理复杂的语言理解和生成任务。使用半精度浮点数（float16）是为了优化性能和资源使用，特别是在支持半精度计算的硬件上。

5. 如何打印模型参数？

方法 1: 打印模型的所有参数

方法 2: 打印参数的形状和名称

方法 3: 打印特定层的参数

6. 打印出模型的配置信息

相关文章

发表评论 取消回复

发表评论取消回复