LazyMergeKit - Tensor model.final_layernorm.weight required but not present in model ... #49

Venkman42 · 2024-02-25T13:37:05Z

Hi there I'm trying to merge Phi-2 models using the following config:
`MODEL_NAME = "..."
yaml_config = """
models:

model: microsoft/phi-2
no parameters necessary for base model
model: rhysjones/phi-2-orange
parameters:
density: 0.5
weight: 0.5
model: cognitivecomputations/dolphin-2_6-phi-2
parameters:
density: 0.5
weight: 0.3
merge_method: ties
base_model: microsoft/phi-2
parameters:
normalize: true
dtype: float16

"""`

but i get the following error:
RuntimeError: Tensor model.final_layernorm.weight required but not present in model rhysjones/phi-2-orange

I tried with lxuechen/phi-2-dpo before insead of phi-2-orange but got the same error.

I'm executinhg on Google Collab with CPU Runtime with Remote_Code set to true.

Can someone help and tell me if I'm doing something wrong or if it just oesnt work with Phi?

Here is the full log:
mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle --trust-remote-code Warmup loader cache: 0% 0/3 [00:00<?, ?it/s] Fetching 10 files: 100% 10/10 [00:00<00:00, 9925.00it/s] Warmup loader cache: 33% 1/3 [00:00<00:00, 5.18it/s] Fetching 11 files: 100% 11/11 [00:00<00:00, 71977.14it/s] Warmup loader cache: 67% 2/3 [00:00<00:00, 5.58it/s] Fetching 10 files: 100% 10/10 [00:00<00:00, 31583.61it/s] Warmup loader cache: 100% 3/3 [00:00<00:00, 5.69it/s] 0% 1/2720 [00:00<00:02, 1276.42it/s] Traceback (most recent call last): File "/usr/local/bin/mergekit-yaml", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/content/mergekit/mergekit/options.py", line 76, in wrapper f(*args, **kwargs) File "/content/mergekit/mergekit/scripts/run_yaml.py", line 47, in main run_merge( File "/content/mergekit/mergekit/merge.py", line 90, in run_merge for _task, value in exec.run(): File "/content/mergekit/mergekit/graph.py", line 191, in run res = task.execute(**arguments) File "/content/mergekit/mergekit/io/tasks.py", line 73, in execute raise RuntimeError( RuntimeError: Tensor model.final_layernorm.weight required but not present in model rhysjones/phi-2-orange

The text was updated successfully, but these errors were encountered:

mlabonne · 2024-02-25T23:48:50Z

This problem might come from the fact that Microsoft changed the architecture after phi-2's release. The models that were fine-tuned still use the old one. It might work if you find a copy of the old base model. See the difference in mergekit:

Venkman42 · 2024-02-26T08:30:48Z

Thanks, I'll try it with the old version. But I somehow got the same error when attempting a pass-through merge between phi-2 and deepseek, but I got the error for the deepseek model. Is it not possible to merge llms with different architectures using pass-through in general? Is there like a Blogpost where you go into this already that I haven't seen?

Venkman42 · 2024-02-26T10:37:42Z

This problem might come from the fact that Microsoft changed the architecture after phi-2's release. The models that were fine-tuned still use the old one. It might work if you find a copy of the old base model. See the difference in mergekit:

New: https://github.com/arcee-ai/mergekit/blob/dbb2eebf4ff0c21bda6069cf6de0e3e3b249f82e/mergekit/_data/architectures/phi2.json

Old: https://github.com/arcee-ai/mergekit/blob/dbb2eebf4ff0c21bda6069cf6de0e3e3b249f82e/mergekit/_data/architectures/phi2-old.json

I just tried it again with amgadhasan/phi-2 as the basemodel, which should be the old Phi-2, but now i got this error:
RuntimeError: Tensor lm_head.ln.weight required but not present in model amgadhasan/phi-2

Do i need to change a setting in LazyMergeKit so it pulls the configuration for the old phi-2?

mlabonne · 2024-02-26T10:50:36Z

Looks like you still don't have the same tensors in all of your models. You can quickly check the names of your layers on the model card by clicking on the arrow next to "safetensors".

Venkman42 · 2024-02-26T11:06:44Z

Looks like you still don't have the same tensors in all of your models. You can quickly check the names of your layers on the model card by clicking on the arrow next to "safetensors".

Thanks for the hint, I'll check that on my next attempt :)

Venkman42 · 2024-02-26T12:01:30Z

Looks like you still don't have the same tensors in all of your models. You can quickly check the names of your layers on the model card by clicking on the arrow next to "safetensors".

I got it to work with another Phi-2 Model. I think you were right. My merge models had different names in tensors (transformer...) than Microsoft Phi-2(model....).

I got it to merge, now I'm trying to create gguf files using your notebook.
But now I get the following error:
Filtering content: 100% (3/3), 5.17 GiB | 59.74 MiB/s, done. Loading model file Phiter/model-00001-of-00003.safetensors Loading model file Phiter/model-00001-of-00003.safetensors Loading model file Phiter/model-00002-of-00003.safetensors Loading model file Phiter/model-00003-of-00003.safetensors Traceback (most recent call last): File "/content/llama.cpp/convert.py", line 1483, in <module> main() File "/content/llama.cpp/convert.py", line 1419, in main model_plus = load_some_model(args.model) File "/content/llama.cpp/convert.py", line 1280, in load_some_model model_plus = merge_multifile_models(models_plus) File "/content/llama.cpp/convert.py", line 730, in merge_multifile_models model = merge_sharded([mp.model for mp in models_plus]) File "/content/llama.cpp/convert.py", line 709, in merge_sharded return {name: convert(name) for name in names} File "/content/llama.cpp/convert.py", line 709, in <dictcomp> return {name: convert(name) for name in names} File "/content/llama.cpp/convert.py", line 684, in convert lazy_tensors: list[LazyTensor] = [model[name] for model in models] File "/content/llama.cpp/convert.py", line 684, in <listcomp> lazy_tensors: list[LazyTensor] = [model[name] for model in models] KeyError: 'transformer.h.17.mlp.fc1.bias' ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: Tesla T4, compute capability 7.5, VMM: yes main: build = 2270 (4804215c) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: quantizing 'Phiter/phiter.fp16.bin' to 'Phiter/phiter.Q4_K_M.gguf' as Q4_K_M llama_model_quantize: failed to quantize: failed to open Phiter/phiter.fp16.bin: No such file or directory main: failed to quantize model from 'Phiter/phiter.fp16.bin' ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: Tesla T4, compute capability 7.5, VMM: yes main: build = 2270 (4804215c) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: quantizing 'Phiter/phiter.fp16.bin' to 'Phiter/phiter.Q5_K_M.gguf' as Q5_K_M llama_model_quantize: failed to quantize: failed to open Phiter/phiter.fp16.bin: No such file or directory main: failed to quantize model from 'Phiter/phiter.fp16.bin'

Do you see whats the problem here?
The fp16.bin couldn't be created, but why?

I used the collab from this blog artile of yours:
https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html

mlabonne · 2024-02-26T15:43:42Z

Cool! You should be able to make GGUF versions of the model. Once again, maybe a problem with the old architecture? I can't really help you with that, unfortunately.

Venkman42 · 2024-02-26T16:01:20Z

Oh okay, I'll guess I'll try my luck by asking the people who made the ggufs for DolphinPhi and PhiOrange how they did it.

Thank you a lot for helping me troubleshoot :) Information for these kind of tasks is still hard to find, so I really appreciate you answering my rookie questions :)

Venkman42 · 2024-02-26T22:21:41Z

I got it working now thanks to some help from another friendly Huggingface User. I had to use an older Version of llama.cpp and use convertHFtogguf.py first and then quanitizing it with quantize (https://huggingface.co/brittlewis12/phi-2-orange-GGUF/discussions/1)

Thanks again for all your help. I finally have my first working merge now thanks to you :)
Feel free to check out my little Phiter https://huggingface.co/Venkman42/Phiter

I gave it a testrun and so far im quite satisfied wih the results. At least it doesnt seem to perform worse than the base models.

Would you kindly do me the honor and run an eval on it for YALL?

mlabonne · 2024-02-26T23:21:56Z

Haha well done! Sure, running the eval now :)

Venkman42 · 2024-02-27T08:57:35Z

Thank you :) I'm curious how it will score
Btw, how long do these evals usually take for smaller model? And what hardware do you run them on?

mlabonne · 2024-02-27T09:12:50Z

Congrats new SOTA in terms of phi-2 fine-tune: https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard 🎉

So I just use LLM AutoEval. It took 2 hours and 18 minutes to evaluate Phiter on a RTX 3090.

Venkman42 · 2024-02-27T09:24:28Z

Damn, I had a feeling it was good, but I didn't think it would outsmart both base models on all benchmarks and even outperform the phixtral models.

Oh okay, so it doesn't take that much compute. Maybe I'll try running it myself sometime.
Thanks again for taking the time 😊

I'm gonna close this issue now 😁

Venkman42 · 2024-02-27T09:51:56Z

I credited you on my model card for helping me troubleshoot, I hope that's okay :)

Venkman42 closed this as completed Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LazyMergeKit - Tensor model.final_layernorm.weight required but not present in model ... #49

LazyMergeKit - Tensor model.final_layernorm.weight required but not present in model ... #49

Venkman42 commented Feb 25, 2024

no parameters necessary for base model

mlabonne commented Feb 25, 2024

Venkman42 commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

mlabonne commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

mlabonne commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

mlabonne commented Feb 26, 2024

Venkman42 commented Feb 27, 2024

mlabonne commented Feb 27, 2024

Venkman42 commented Feb 27, 2024

Venkman42 commented Feb 27, 2024

LazyMergeKit - Tensor model.final_layernorm.weight required but not present in model ... #49

LazyMergeKit - Tensor model.final_layernorm.weight required but not present in model ... #49

Comments

Venkman42 commented Feb 25, 2024

no parameters necessary for base model

mlabonne commented Feb 25, 2024

Venkman42 commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

mlabonne commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

mlabonne commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

Venkman42 commented Feb 26, 2024

mlabonne commented Feb 26, 2024

Venkman42 commented Feb 27, 2024

mlabonne commented Feb 27, 2024

Venkman42 commented Feb 27, 2024

Venkman42 commented Feb 27, 2024