2025-12-08 14:48:25.729500 SYSTEM: Preparing env...
2025-12-08 14:48:26.393900 SYSTEM: Running...
2025-12-08 14:48:30.790700 /usr/bin/nvidia-smi
2025-12-08 14:48:30.791100 Excuting with GPU .
2025-12-08 14:48:30.855000 Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
2025-12-08 14:48:30.867000 Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
2025-12-08 14:48:30.946100 Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
2025-12-08 14:48:30.957600 Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
2025-12-08 14:48:31.049200 [WARNING] ME(59:140026271680320,MainProcess):2025-12-08-14:48:31.489.93 [mindspore/common/_decorator.py:40] 'TensorAdd' is deprecated from version 1.1 and will be removed in a future version, use 'Add' instead.
2025-12-08 14:48:38.251800 Complete the batch 1/36
2025-12-08 14:48:38.552200 Complete the batch 2/36
2025-12-08 14:48:38.640400 Complete the batch 3/36
2025-12-08 14:48:38.665300 Complete the batch 4/36
2025-12-08 14:48:38.836900 Complete the batch 5/36
2025-12-08 14:48:38.938600 Complete the batch 6/36
2025-12-08 14:48:39.040300 Complete the batch 7/36
2025-12-08 14:48:39.140400 Complete the batch 8/36
2025-12-08 14:48:39.235500 Complete the batch 9/36
2025-12-08 14:48:39.267200 Complete the batch 10/36
2025-12-08 14:48:39.379600 Complete the batch 11/36
2025-12-08 14:48:39.542200 Complete the batch 12/36
2025-12-08 14:48:39.644900 Complete the batch 13/36
2025-12-08 14:48:39.742600 Complete the batch 14/36
2025-12-08 14:48:39.771200 Complete the batch 15/36
2025-12-08 14:48:39.949000 Complete the batch 16/36
2025-12-08 14:48:40.052400 Complete the batch 17/36
2025-12-08 14:48:40.150100 Complete the batch 18/36
2025-12-08 14:48:40.265500 Complete the batch 19/36
2025-12-08 14:48:40.368900 Complete the batch 20/36
2025-12-08 14:48:40.752700 Complete the batch 21/36
2025-12-08 14:48:40.856400 Complete the batch 22/36
2025-12-08 14:48:40.949200 Complete the batch 23/36
2025-12-08 14:48:41.041500 Complete the batch 24/36
2025-12-08 14:48:41.133100 Complete the batch 25/36
2025-12-08 14:48:41.159800 Complete the batch 26/36
2025-12-08 14:48:41.251700 Complete the batch 27/36
2025-12-08 14:48:41.340200 Complete the batch 28/36
2025-12-08 14:48:41.370000 Complete the batch 29/36
2025-12-08 14:48:41.459200 Complete the batch 30/36
2025-12-08 14:48:41.542200 Complete the batch 31/36
2025-12-08 14:48:41.565900 Complete the batch 32/36
2025-12-08 14:48:41.645200 Complete the batch 33/36
2025-12-08 14:48:41.668500 Complete the batch 34/36
2025-12-08 14:48:41.752700 Complete the batch 35/36
2025-12-08 14:48:41.843000 Complete the batch 36/36
2025-12-08 14:48:45.939300 [HAMI-core Msg(59:140026271680320:libvgpu.c:837)]: Initializing.....
2025-12-08 14:48:46.144100 [HAMI-core Msg(59:140026271680320:libvgpu.c:856)]: Initialized
2025-12-08 14:48:48.461700 [HAMI-core Msg(59:140026271680320:memory.c:512)]: orig free=50195726336 total=50953846784 limit=8629780480 usage=170717856
2025-12-08 14:48:48.461800 [HAMI-core Msg(59:140026271680320:memory.c:512)]: orig free=50195726336 total=50953846784 limit=8629780480 usage=170717856
2025-12-08 14:48:48.461800 [HAMI-core Msg(59:140026271680320:memory.c:512)]: orig free=50195726336 total=50953846784 limit=8629780480 usage=170717856
2025-12-08 14:48:48.461800 [HAMI-core Msg(59:140026271680320:memory.c:512)]: orig free=50195726336 total=50953846784 limit=8629780480 usage=170717856
2025-12-08 14:48:48.786100 [HAMI-core Msg(59:140026172937984:memory.c:512)]: orig free=49121984512 total=50953846784 limit=8629780480 usage=1244459680
2025-12-08 14:48:48.791200 [HAMI-core Msg(59:140026172937984:memory.c:512)]: orig free=49121984512 total=50953846784 limit=8629780480 usage=1244459680
2025-12-08 14:58:29.606400 epoch: 1, time cost: 583.8056166172028, avg loss: 2.2416977882385254
2025-12-08 14:58:30.078100 epoch: 2, time cost: 0.30743908882141113, avg loss: 0.8196667432785034
2025-12-08 14:58:32.101800 epoch: 3, time cost: 0.41454625129699707, avg loss: 0.5328387022018433
2025-12-08 14:58:37.897600 epoch: 4, time cost: 0.568835973739624, avg loss: 0.4087119400501251
2025-12-08 14:58:39.444200 epoch: 5, time cost: 0.2193915843963623, avg loss: 0.33979251980781555
2025-12-08 14:58:44.097600 epoch: 6, time cost: 0.23917722702026367, avg loss: 0.2977721393108368
2025-12-08 14:58:44.508700 epoch: 7, time cost: 0.25797343254089355, avg loss: 0.2727084159851074
2025-12-08 14:58:44.909500 epoch: 8, time cost: 0.2284104824066162, avg loss: 0.258700430393219
2025-12-08 14:58:45.259200 epoch: 9, time cost: 0.19937777519226074, avg loss: 0.2505578398704529
2025-12-08 14:58:45.609600 epoch: 10, time cost: 0.20357894897460938, avg loss: 0.2481127828359604
2025-12-08 14:58:45.770800 validating the model...
2025-12-08 14:58:53.133300 {'acc': 0.8342013888888888, 'loss': 0.6257607564330101}
2025-12-08 14:58:53.173500 Chosen checkpoint is mobilenetv2-10.ckpt
2025-12-08 14:58:57.031200 加载模型路径: ./results/ckpt_mobilenetv2/mobilenetv2-10.ckpt
2025-12-08 14:58:58.486800 ./datasets/5fbdf571c06d3433df85ac65-momodel/garbage_26x100/val/00_01/00010.jpg Hats
2025-12-08 14:58:58.497800 ./datasets/5fbdf571c06d3433df85ac65-momodel/garbage_26x100/val/00_01/00037.jpg Hats
2025-12-08 14:58:58.507700 ./datasets/5fbdf571c06d3433df85ac65-momodel/garbage_26x100/val/00_01/00040.jpg Hats
2025-12-08 14:58:58.516400 ./datasets/5fbdf571c06d3433df85ac65-momodel/garbage_26x100/val/00_01/00055.jpg Hats
2025-12-08 14:58:58.524700 ./datasets/5fbdf571c06d3433df85ac65-momodel/garbage_26x100/val/00_01/00064.jpg Hats
2025-12-08 14:58:58.675500 /usr/bin/nvidia-smi
2025-12-08 14:58:58.675900 Excuting with GPU .
2025-12-08 14:59:03.697800 Traceback (most recent call last):
2025-12-08 14:59:03.697800   File "main.py", line 436, in <module>
2025-12-08 14:59:03.701500     print(predict(image_rgb))
2025-12-08 14:59:03.701600 NameError: name 'image_rgb' is not defined
2025-12-08 14:59:04.116700 [HAMI-core Msg(59:140026271680320:multiprocess_memory_limit.c:498)]: Calling exit handler 59
2025-12-08 14:59:04.761600 SYSTEM: Finishing...
2025-12-08 14:59:05.203800 SYSTEM: Error Exists!

📊 日志包含的主要信息

1. 环境信息 ✅

  • GPU可用Excuting with GPU .

  • 系统时间:2025-12-08 14:48:25 开始

  • 框架版本:MindSpore(有弃用警告提示)

2. 数据可视化信息 ⚠️

  • Clipping input data...:matplotlib的警告,不影响训练

  • 显示了4张验证集图片的可视化

3. 特征提取阶段 ✅

  • 完成36个batch的特征提取(1-36全部完成)

  • 每个batch约0.1-0.2秒(GPU速度快)

  • 特征保存到 ./results/garbage_26x100_features/

4. 内存和GPU初始化 ✅

  • GPU内存信息显示

  • 总内存:~50.95GB

  • 可用内存:~50.19GB

  • GPU初始化成功

5. 训练过程(核心部分)✅✅✅

epoch: 1,  loss: 2.2417  # 第一轮损失较高(正常)
epoch: 2,  loss: 0.8197  # 快速下降
epoch: 3,  loss: 0.5328  # 继续下降
epoch: 4,  loss: 0.4087
epoch: 5,  loss: 0.3398
epoch: 6,  loss: 0.2978
epoch: 7,  loss: 0.2727
epoch: 8,  loss: 0.2587
epoch: 9,  loss: 0.2506
epoch: 10, loss: 0.2481  # 最终损失

关键观察

  • 损失从2.24降到0.25,下降了89%,收敛很好

  • 第1轮特别慢(583秒),因为需要初始化

  • 后续每轮很快(0.2-0.3秒)

6. 验证结果 🎉

{'acc': 0.8342013888888888, 'loss': 0.6257607564330101}
  • 验证精度:83.42% - 非常优秀!

  • 验证损失:0.6258(比训练损失高,正常)

7. 模型保存与推理测试 ✅

  • 保存checkpoint:mobilenetv2-10.ckpt

  • 加载模型成功

  • 测试5张帽子图片全部正确识别为"Hats"

8. 最终错误 ❌

NameError: name 'image_rgb' is not defined
  • 只是测试代码的变量名错误

  • 不影响模型训练质量

  • 很容易修复

🔍 关键指标总结

阶段 状态 评价
GPU环境 ✅ 正常 使用GPU加速
特征提取 ✅ 完成 36个batch全部完成
训练收敛 ✅ 优秀 损失下降89%
验证精度 ✅ 83.42% 成绩很好!
推理测试 ✅ 正确 5/5正确识别
代码错误 ❌ 变量名 简单修复

⏱️ 时间分析

  1. 总耗时:约11分钟(从14:48:25到14:59:05)

  2. 特征提取:约10秒

  3. 训练10轮:约10分钟

  4. 验证:约7秒

  5. 推理测试:约1秒

🎯 你的模型质量评估

优点:

  1. 收敛良好:损失持续下降

  2. 没有过拟合:验证损失0.625,训练损失0.248(差距合理)

  3. 精度高:83.42%在分类任务中表现很好

  4. 推理正确:测试样本全部正确

可改进点:

  1. 第1轮训练太慢(583秒),可能是初始化问题

  2. 学习率可能偏大:第1轮损失下降很快

💡 后续操作建议

# 修复测试代码的变量名错误
# 将 predict(image_rgb) 改为 predict(image)

如果追求更高分:

# 可以尝试微调超参数
config.epochs = 15  # 多训练几轮
config.lr_max = 0.005  # 降低学习率
config.decay_type = 'cosine'  # 使用余弦衰减

Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐