onert-micro training api #12996

chunseoklee · 2024-05-14T04:47:41Z

No description provided.

Signed-off-by: Chunseok Lee <chunseok.lee@samsung.com>

jhman-yun · 2024-05-20T06:05:58Z

onert-micro/onert-micro/include/onert-micro-train.h

+  // 4. check loss
+  float loss;
+  om_train_get_loss(ctx, 0, &loss);
+


5/17일자 녹스 미팅에서 말씀 주신대로, PR기준 EarlyStop 관련 추가 요청 드립니다.
아시겠지만 EarlyStop은 대략 이 시점에서 global minimum loss를 계속 추적하는 기능입니다. (app에서 loss를 쓸지, accuracy를 쓸지 metrics를 설정해줄 수도 있지만)

이번 step의 loss가 global minimum loss보다 더 좋아지지(낮아지지) 않았으면, count 증가

count가 app에서 지정해준 횟수 N을 넘으면 최대 epoch에 도달하지 않더라도 학습 중지(early stop) -> 더 이상의 학습이 의미 없다고 판단

만약 복잡도나 일정 등의 비용에 크게 부담이 되지 않는다면, EarlyStop 기능은 이 레벨에서 구현되는 것이 구조적으로 좋을 것 같습니다.

I'd like to keep onert-micro as simple as possible since a variant introduces more maintenance cost. IMHO, it is better to implement at platform(aifw) level. That is, while onert-micro provides basic feature like this api, AIFW can provide high level features by assembling them.

jhman-yun · 2024-05-20T06:59:47Z

onert-micro/onert-micro/include/onert-micro-train.h

+  }
+  else {
+    om_train_save_as_inferencemodel(ctx, PATH);
+  }


모든 체크포인트를 매번 inference가능한 산출물로 저장하면 안 되나요? 비용 차이가 크게 나는지 궁금합니다.

I'd like to unify the inference and the checkpoint format. But, not now.
Moreover, once checkpoint is converted into inference model, it is no more trainable.

But app should validate origin model and new model if check point happened.
How it possible without updated file? (IMO, at least some kind of temp file is necessary)

om_train_save_as_inferencemodel(ctx, PATH); is for producing inference model. To validate the current model(during training), you can use om_train_inference(om_context *context);

jhman-yun · 2024-05-20T07:04:10Z

onert-micro/onert-micro/include/onert-micro-train.h

+  // 4. check loss
+  float loss;
+  om_train_get_loss(ctx, 0, &loss);
+


체크포인트 저장 시점은 보통 유저가 내린 설정에 따라서 결정되는데, 참고로 공유 드립니다.

`
best_model_name = f"./inverter_model/{self.get_model_name()}_{str(now_ts)}.h5"

mc = ModelCheckpoint(best_model_name, monitor="val_loss", save_best_only=True, mode="min", verbose=0)
`

=> monitor 대상인 "val_loss"가 "min"기준으로, best 일 때만 best_model_name 경로에 체크포인트를 저장

chunseoklee · 2024-05-21T09:08:07Z

Let's try to use the same API as onert ( https://github.com/Samsung/ONE/blob/master/runtime/onert/api/nnfw/include/nnfw_experimental.h and https://github.com/Samsung/ONE/blob/master/runtime/onert/api/nnfw/include/nnfw.h ). I will update PR soon.

Taejun-Kwon · 2024-05-22T01:34:16Z

onert-micro/onert-micro/include/onert-micro-train.h

+  }
+  else {
+    om_train_save_as_inferencemodel(ctx, PATH);
+  }


But app should validate origin model and new model if check point happened.
How it possible without updated file? (IMO, at least some kind of temp file is necessary)

Taejun-Kwon · 2024-05-22T01:35:44Z

onert-micro/onert-micro/include/onert-micro-train.h

+ *
+ * @return  @c OM_STATUS_NO_ERROR if successful
+ */
+OM_STATUS om_train_compile(om_context *ctx);


some more parameter need to be expanded, as I introduced API set in last meeting.(late, loss, metrix...)

I will add config api like set_train_info. But note that this config is optional and basically, circle model itself contains training info(loss,...)

Taejun-Kwon · 2024-05-22T01:37:46Z

onert-micro/onert-micro/include/onert-micro-train.h

+ *                        If it is nullptr, it will not change shape and batch size
+ * @return  @c OM_STATUS_NO_ERROR if successful
+ */
+OM_STATUS om_train_set_input(om_context *ctx, uint32_t index, const void *input, int size);


can be conbined with om_train_inference, I think.

As mentioned before, I'd like to keep one api for one role. Thus, hope that this api will be as it is. Only different between you suggestion and current function is single om_train_set_input call

chunseoklee · 2024-05-22T05:35:25Z

FYI, we will update checkpoint api based on #12997.

chunseoklee · 2024-06-04T16:23:35Z

While transforming to use nnfw's API, it found that we need extra (output buffer) copy to use nnfw_set_output in onert-micro side. onert-micro uses internal buffer for output while onert uses output buffer allocated by user.

chunseoklee · 2024-06-07T11:07:30Z

api implementation based on #13107 will be on https://github.com/chunseoklee/ONE/commits/v3

chunseoklee added the PR/NO TEST Tell CI to not run test label May 14, 2024

chunseoklee force-pushed the onert-micro-api-draft branch from 721ea7a to bcec239 Compare May 14, 2024 10:20

onert-micro c api

332cabb

Signed-off-by: Chunseok Lee <chunseok.lee@samsung.com>

chunseoklee force-pushed the onert-micro-api-draft branch 3 times, most recently from dcbf274 to 6da9827 Compare May 16, 2024 10:59

remove tensorinfo and tensor type

d09603b

chunseoklee force-pushed the onert-micro-api-draft branch from 6da9827 to d09603b Compare May 16, 2024 10:59

jhman-yun reviewed May 20, 2024

View reviewed changes

Taejun-Kwon reviewed May 22, 2024

View reviewed changes

rename onert-micro-train.h to onert-micro.h

4c4e0c7

sync with https://github.com/chunseoklee/ONE/tree/v3

3ae5968

chunseoklee force-pushed the onert-micro-api-draft branch from f5fb6a7 to 3ae5968 Compare June 10, 2024 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onert-micro training api #12996

onert-micro training api #12996

chunseoklee commented May 14, 2024

jhman-yun May 20, 2024

chunseoklee May 20, 2024

jhman-yun May 20, 2024

chunseoklee May 20, 2024

Taejun-Kwon May 22, 2024

chunseoklee May 22, 2024

jhman-yun May 20, 2024

chunseoklee commented May 21, 2024

Taejun-Kwon May 22, 2024

Taejun-Kwon May 22, 2024

chunseoklee May 22, 2024

Taejun-Kwon May 22, 2024

chunseoklee May 22, 2024

chunseoklee commented May 22, 2024

chunseoklee commented Jun 4, 2024

chunseoklee commented Jun 7, 2024

onert-micro training api #12996

Are you sure you want to change the base?

onert-micro training api #12996

Conversation

chunseoklee commented May 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chunseoklee commented May 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chunseoklee commented May 22, 2024

chunseoklee commented Jun 4, 2024

chunseoklee commented Jun 7, 2024