yolo v5 재학습 문의

yong · 12월 4, 2023, 4:47오전

안녕하세요 yolov5를 통해 이미지를 학습하고 있습니다.

다름이 아니라 데이터셋이 50만장 정도 되어서 기존 yolov5m모델로 학습을 시키니까 한 에폭당 대략 4시간 30분 정도 소요가 돼서 중간에 멈추고 v5s모델을 통해서 학습을 진행시키고자 합니다.
m모델로 학습하고 생긴 weight파일을 yolov5s모델에 결합해서 학습을 시킬 수 있는 방법이 있을까요?

bot · 12월 4, 2023, 4:58오전

아래 답변은 OpenAI의 GPT-4 모델로 자동 생성한 것으로, 정확하지 않을 수 있습니다.
잘못된 답변을 보셨다면 댓글로 첨삭 지도 부탁드립니다.

안녕하세요, yolov5를 사용하여 이미지 학습에 관한 질문을 주셔서 감사합니다.

질문을 정리해보면, 데이터셋이 매우 크기 때문에 yolov5m 모델로 학습하는데 시간이 너무 오래 걸린다는 것이고, 그래서 yolov5s 모델로 학습을 변경하고자 하는데, 이미 yolov5m 모델로 학습한 weight를 yolov5s 모델에 결합하여 학습을 시킬 수 있는지에 대한 질문으로 이해하였습니다.

먼저, yolov5m과 yolov5s는 모델의 크기와 복잡도가 다르기 때문에, yolov5m에서 학습된 가중치를 yolov5s에 그대로 사용하는 것은 어렵습니다. 이는 두 모델의 아키텍처가 다르기 때문에 발생하는 문제입니다. 즉, 레이어의 수나 구조가 다르면 해당 가중치를 그대로 사용할 수 없습니다.

하지만, yolov5m 모델에서 학습한 가중치의 일부를 yolov5s 모델에 전이학습(Transfer Learning)을 통해 사용하는 것은 가능합니다. 이는 일반적으로 모델의 초기 레이어에서 추출한 특징들이 고수준의 특징을 추출하는 데에도 유용하기 때문입니다. 이 방법을 사용하려면, yolov5m 모델의 가중치 중 yolov5s 모델과 동일한 레이어의 가중치만을 선택하여 로드해야 합니다.

다음은 PyTorch에서 모델의 가중치를 부분적으로 로드하는 예제 코드입니다.

# yolov5m 모델의 가중치 로드
weights_m = torch.load('yolov5m_weights.pth')

# yolov5s 모델 생성
model_s = YOLOv5s()

# yolov5s 모델의 state_dict 가져오기
state_dict_s = model_s.state_dict()

# yolov5m의 가중치 중에서 yolov5s와 동일한 레이어의 가중치만 선택
new_state_dict = {k: v for k, v in weights_m.items() if k in state_dict_s}

# 선택된 가중치를 yolov5s 모델에 로드
state_dict_s.update(new_state_dict)
model_s.load_state_dict(state_dict_s)

위 코드에서 'yolov5m_weights.pth'는 yolov5m 모델의 가중치 파일이며, YOLOv5s()는 yolov5s 모델을 생성하는 코드입니다. 실제로 사용할 때는 해당 부분을 실제 yolov5s 모델 생성 코드로 변경해야 합니다.

이 방법을 사용하면, yolov5m 모델에서 학습된 가중치의 일부를 yolov5s 모델에 사용하여 학습 시간을 줄일 수 있습니다.

더 자세한 내용은 PyTorch 공식 문서의 모델 저장 및 로드 부분을 참조하시면 도움이 될 것입니다.

답변이 도움이 되었기를 바라며, 추가적인 질문이 있으시면 언제든지 물어보세요. 그러나, 제 답변이 틀릴 수도 있으니 다른 자료도 참고하시는 것을 권장드립니다.

답변이 어떠셨나요? 괜찮았다고 생각하신다면 아래 를 한 번 눌러주시길 부탁드려요!

bot · 12월 4, 2023, 5:05오전

아래 답변은 Anthropic의 Claude-2 모델로 자동 생성한 것으로, 정확하지 않을 수 있습니다.
잘못된 답변을 보셨다면 댓글로 첨삭 지도 부탁드립니다.

안녕하세요 yong님,

yolov5 모델을 사용하여 이미지 학습을 진행하는 데 어려움이 있다고 하셨네요. 50만 장의 데이터셋으로 yolov5m 모델로 학습하시다가 한 에폭당 4시간 30분이 소요되어 중간에 멈추셨다고 하셨습니다.

이에 대한 해결책으로, 현재까지 yolov5m 모델로 학습한 weight 파일을 yolov5s 모델에 불러와서 계속 학습을 진행할 수 있습니다. yolov5s 모델은 yolov5m 모델보다 가볍기 때문에 학습 속도가 더 빠를 것이라 기대합니다.

model = torch.hub.load('ultralytics/yolov5', 'custom', path='yolov5m.pt') # load yolov5m pretrained weight
model.yaml = 'yolov5s.yaml' # switch to yolov5s model architecture 
model.train() # set train mode

위 코드와 같이 yolov5m 모델의 pretrained weight를 불러온 뒤, 모델 아키텍처를 yolov5s로 변경하고 학습 모드로 설정하면 기존 학습한 weight를 이어받아 yolov5s 모델로 계속 학습할 수 있습니다.

다만 이 방법은 두 모델의 아키텍처가 완전히 동일한 것은 아니기 때문에, 일부 layer 크기 차이로 인해 예상치 못한 오류가 발생할 수 있다는 점 주의해주세요. 가급적 비슷한 크기의 모델간 transfer가 안정적입니다.

도움이 되셨길 바랍니다. 추가 질문이 있으시면 주저하지 마시고 댓글 달아��세요!

답변이 어떠셨나요? 괜찮았다고 생각하신다면 아래 를 한 번 눌러주시길 부탁드려요!

bot · 12월 4, 2023, 5:06오전

아래 답변은 OpenAI의 GPT-4-Turbo 모델로 자동 생성한 것으로, 정확하지 않을 수 있습니다.
잘못된 답변을 보셨다면 댓글로 첨삭 지도 부탁드립니다.

안녕하세요! yolov5 모델을 사용하여 대규모 이미지 데이터셋에 대한 학습을 진행하고 계신 것으로 이해했습니다. 학습 시간이 많이 소요되어 더 작은 모델인 yolov5s로 전환하고자 하시는군요. 또한, 이미 yolov5m 모델로 학습된 가중치를 yolov5s 모델에 적용하여 학습을 이어가고 싶으신 것 같습니다.

yolov5m과 yolov5s 모델은 구조가 다르기 때문에 직접적으로 가중치를 전환하는 것은 어려울 수 있습니다. 그러나, 두 모델의 공통된 부분에 대해서는 가중치를 전이할 수 있으며, 나머지 부분은 새로 학습을 해야 합니다. 이를 위해 PyTorch에서 제공하는 load_state_dict 함수의 strict 인자를 False로 설정하여 모델에 적용할 수 있는 가중치만 로드하는 방법을 사용할 수 있습니다.

아래는 yolov5m 모델에서 학습된 가중치를 yolov5s 모델에 부분적으로 적용하는 예시 코드입니다:

import torch
from models.yolo import Model

# yolov5m 모델의 구조를 로드합니다.
model_m = Model(cfg='yolov5m.yaml', ch=3, nc=num_classes)  # num_classes는 데이터셋의 클래스 수입니다.

# yolov5m 모델의 학습된 가중치를 로드합니다.
weights_m = torch.load('yolov5m_weights.pt')['model'].state_dict()

# yolov5s 모델의 구조를 로드합니다.
model_s = Model(cfg='yolov5s.yaml', ch=3, nc=num_classes)

# yolov5s 모델에 yolov5m 모델의 가중치를 부분적으로 적용합니다.
model_s.load_state_dict(weights_m, strict=False)

# 이제 model_s를 사용하여 학습을 계속 진행할 수 있습니다.

위 코드에서 yolov5m.yaml과 yolov5s.yaml은 각각 yolov5m과 yolov5s 모델의 구조를 정의한 설정 파일이며, yolov5m_weights.pt는 yolov5m 모델로 학습된 가중치 파일입니다. strict=False 옵션을 통해 모델 구조가 다른 경우에도 호환되는 부분의 가중치만 로드하게 됩니다.

이 방법을 사용하면 yolov5m 모델에서 학습된 특징을 일부 유지하면서 yolov5s 모델로 학습을 빠르게 진행할 수 있습니다. 그러나 모델의 성능은 직접 실험을 통해 확인해야 합니다.

더 자세한 정보는 PyTorch 공식 문서나 yolov5의 GitHub 저장소를 참조하시기 바랍니다.

PyTorch 공식 문서: PyTorch documentation — PyTorch 2.6 documentation
yolov5 GitHub 저장소: GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

제 답변이 도움이 되었기를 바라며, 추가적인 질문이 있으시면 언제든지 문의해 주세요.

답변이 어떠셨나요? 괜찮았다고 생각하신다면 아래 를 한 번 눌러주시길 부탁드려요!

yong · 12월 4, 2023, 8:20오전

import torch
from models.yolo import Model

num_classes = 10 # 데이터셋에 맞는 클래스 수 설정

yolov5l 모델 인스턴스 로드

weights_l_model = torch.load('D:/yolov5/yolov5l_weight.pt', map_location='cpu')['model']

yolov5l 모델 인스턴스의 state_dict 추출

weights_l = weights_l_model.state_dict()

yolov5s 모델 생성

model_s = Model(cfg='D:/yolov5/models/yolov5s.yaml', ch=3, nc=num_classes)

yolov5s 모델의 state_dict 가져오기

state_dict_s = model_s.state_dict()

yolov5l 모델의 state_dict 중 yolov5s 모델과 호환 가능한 레이어의 가중치만 선택

예: 초기 레이어의 가중치만 선택할 경우

compatible_layers = [k for k in state_dict_s.keys() if k in weights_l]
new_state_dict = {k: weights_l[k] for k in compatible_layers}

선택된 가중치를 yolov5s 모델에 로드

model_s.load_state_dict(new_state_dict, strict=False)

이제 model_s를 사용하여 학습을 계속 진행할 수 있습니다.

해당 코드를 바탕으로 실행했더니
RuntimeError Traceback (most recent call last)
Cell In[17], line 24
21 new_state_dict = {k: weights_l[k] for k in compatible_layers}
23 # 선택된 가중치를 yolov5s 모델에 로드
---> 24 model_s.load_state_dict(new_state_dict, strict=False)
26 # 이제 model_s를 사용하여 학습을 계속 진행할 수 있습니다.

File C:\ProgramData\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py:1223, in Module.load_state_dict(self, state_dict, strict)
1218 error_msgs.insert(
1219 0, 'Missing key(s) in state_dict: {}. '.format(
1220 ', '.join('"{}"'.format(k) for k in missing_keys)))
1222 if len(error_msgs) > 0:
-> 1223 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
1224 self.class.name, "\n\t".join(error_msgs)))
1225 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for DetectionModel:
size mismatch for model.0.conv.weight: copying a param with shape torch.Size([64, 3, 6, 6]) from checkpoint, the shape in current model is torch.Size([32, 3, 6, 6]).
size mismatch for model.0.bn.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.0.bn.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.0.bn.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.0.bn.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.1.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 32, 3, 3]).
size mismatch for model.1.bn.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.1.bn.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.1.bn.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.1.bn.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.2.cv1.conv.weight: copying a param with shape torch.Size([64, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 64, 1, 1]).
size mismatch for model.2.cv1.bn.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv1.bn.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv1.bn.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv1.bn.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv2.conv.weight: copying a param with shape torch.Size([64, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 64, 1, 1]).
size mismatch for model.2.cv2.bn.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv2.bn.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv2.bn.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv2.bn.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.cv3.conv.weight: copying a param with shape torch.Size([128, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1]).
size mismatch for model.2.cv3.bn.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.2.cv3.bn.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.2.cv3.bn.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.2.cv3.bn.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for model.2.m.0.cv1.conv.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 1, 1]).
size mismatch for model.2.m.0.cv1.bn.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.m.0.cv1.bn.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.m.0.cv1.bn.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.m.0.cv1.bn.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.m.0.cv2.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for model.2.m.0.cv2.bn.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.m.0.cv2.bn.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.m.0.cv2.bn.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.2.m.0.cv2.bn.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for model.3.conv.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for model.3.bn.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for model.3.bn.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for model.3.bn.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for model.3.bn.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for model.4.cv1.conv.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
size mismatch for model.4.cv1.bn.weight: copying a param with shape torch.Size([128]) from c.......

이런 오류가 발생합니다.

bongmo · 12월 4, 2023, 11:34오후

yolo v5s, v5m 의 네트워크 구조를 먼저 보셔야 할 것 같습니다.
아래 링크를 보시면 네트워크가 어떻게 구성되어 있는지 확인 할 수 있습니다.

보시면 구조는 같지만 depth_multiple, width_multiple 이 차이가 나는 걸 볼 수 있습니다. 이 내용으로 미뤄보아 레이어들의 구조는 같지만 채널 사이즈가 다를 걸로 보입니다.

즉, m모델 weight를 s모델 학습에 활용하는 부분은 어려울 것 같습니다.