Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Open Source

Mistral Says Mixtral, Its New Open Source LLM, Matches or Outperforms Llama 2 70B and GPT3.5 on Most Benchmarks (mistral.ai) 20

Open source model startup Mistral AI released a new LLM last week with nothing but a torrent link. It has now offered some details about Mixtral, the new LLM. From a report: Mistral AI continues its mission to deliver the best open models to the developer community. Moving forward in AI requires taking new technological turns beyond reusing well-known architectures and training paradigms. Most importantly, it requires making the community benefit from original models to foster new inventions and usages.

Today, the team is proud to release Mixtral 8x7B, a high-quality sparse mixture of experts models (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Mixtral has the following capabilities:
1. It gracefully handles a context of 32k tokens.
2. It handles English, French, Italian, German and Spanish.
3. It has strong performance in code generation.
4. It can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.

This discussion has been archived. No new comments can be posted.

Mistral Says Mixtral, Its New Open Source LLM, Matches or Outperforms Llama 2 70B and GPT3.5 on Most Benchmarks

Comments Filter:
  • by Anonymous Coward

    the model is not open and its yet another .ai domain that does not make its code or training data public. This is not even open source.

    • by WolphFang ( 1077109 ) <m.conrad.202@g m a i l .com> on Monday December 11, 2023 @11:55AM (#64073399)

      So... the model weights being published on huggingface don't count for anything? (https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)

      Also, the blog clearly states: Licensed under Apache 2.0.

    • by DrYak ( 748999 ) on Monday December 11, 2023 @12:13PM (#64073465) Homepage

      I am probably missing something, as I'm not in that field, but...

      the model is not open

      The /. summary litteraly begins mentionning that the model was already made available via torrent.
      And currently the website has a download link [mistral.ai] mentionning a permissive license - Apache 2.0

      and its yet another .ai domain

      that's correct

      that does not make its code

      Mixtral is the model - the weights. It's the data itself (not the code).
      According to their release it should be possible to use this data with vLLM (once a pull-request has been merged).
      Others have reported running the 4-bit on llama.cpp
      - i.e. there is openly available 3rd party code you can use this data.

      That model is also available on their own server.
      Running it on their servers instead of setting up your, does cost money, but that makes sense both from a business point (some money to go in) and hardware/runtime costs (servers).

      or training data public.

      Yes, the training data set is very vague "Mixtral is pre-trained on data extracted from the open Web".

      This is not even open source.

      Basically, you could freely run Mixtral on your own server, by using data they provide links to (under permissive license) and by using 3rd party software to run this data onto.
      That's still much more open that what open.ai has been doing with anything GPT-3 or later.

      Or are people expecting a release of the method itself used to train this model?
      i.e. the dataset and the code used during training, so people could train their own similar model by tweaking either the training set or the code that generates the model?
      (As opposed as doing various further modification to the already-trained model)

      • by fph il quozientatore ( 971015 ) on Monday December 11, 2023 @12:33PM (#64073541)
        Aren't the pre-trained weights basically the equivalent of a compiled binary? You can run them as they are, but they are an unmodifiable black box, and you don't have access to the source material in case you want to tweak the process that generated them.
        By Stallman standards I wouldn't call this "open source" at all.
        • Aren't the pre-trained weights basically the equivalent of a compiled binary? You can run them as they are, but they are an unmodifiable black box, and you don't have access to the source material in case you want to tweak the process that generated them.

          Having access to the data and pipeline that produced a model is of no use to most people because they don't have the money or resources to execute it anyway.

          From the perspective of merging, tweaking, adapting, tuning.. ad nausea pre-trained is going to be way more useful to most people than anything else. If your goal is to create a model capable of x you can either train one by scratch or you can train an existing model to do what you want. The computational cost differences of starting from scratch vs s

          • I see no differences from source and binary: most people just need the binary, and will only download that. But you can distribute them both, and you should do that, if you want to call your product "open source".

            And, in any case, people don't have the money or resources to train the model *today*. That's not an excuse not to distribute the source.

            • The core purpose of opensource initiatives is that the users have the freedom to run, copy, distribute, study, change and improve the software.

              I see no differences from source and binary: {...} you can distribute them both, and you should do that, if you want to call your product "open source".

              From my outsider's perspective, it seems that the norm in the deep models (like the various Diffusion Stability or the present LLM) is exactly like the parent poster mentions to "merg[e], tweak[], adapt[], tun[e].. ad nausea pre-trained [models]".
              What you needs is weights, and a license that allows you to further tweak them.

              e.g.: If you want a specialized language mo

        • by ceoyoyo ( 59147 )

          No, they're modifiable. The point of having the trained weights is that you can train them for your application. They even mention it in the summary (#4).

          I'd argue it's open, it's not open *source* because there's no source code, it's not scientifically open because they don't give you all the information required to replicate it, and both Stallman and the OSI would hate it because of pillars, freedoms, etc. Like that PETA poster with the animals lined up, draw your line between edible and not.

          Objectively,

      • That seems like freeware to me rather than Free Open Source Software.
        You take the source code (training data), and the compiler (training algorithm), and you end up with a binary (the model weights). You only receive the model weights.

        • The difference is that due to the vastly different resources needed,
          - most software enthusiasts study, change and improve software by editing the code and recompiling (few people do binary patching).
          - most AI enthusiats study, change and improve models by tweaking, further training or mix-n-matching pre-trained models (very few people burn the gazillon cycles to do retraining from scratch)

          It's as if AI was in a strange "opposite world" where everyone and their dog can simply patch binaries, but it would tak

  • Seriously, who chooses those stories? Does currently really nothing at all happen?

    • Seriously, who chooses those stories? Does currently really nothing at all happen?

      You've been around here long enough to know the answer to that first question. Do you ever sample the Firehose and vote on submissions, or submit stories yourself? Granted, this place ain't what it used to be, and likely it will never be great again. But we can make it a bit better ourselves with a bit of effort.

  • Back in the day, Winamp also whipped the llama's ass...

  • Couldn't they have called it Mxyzptlk? At least then it would have some relation to their LLM.

  • Winamp [youtube.com], is that you?
  • by WaffleMonster ( 969671 ) on Monday December 11, 2023 @03:21PM (#64074079)

    I think it would be awesome to see much larger MoEs say 32x7 instead of 8x7 if that would make a substantive difference in model quality given multi-channel DDR5 is way cheaper than VRAM. You can get reasonable performance on CPU this way and if it makes sense still offload common layers to VRAM.

Truly simple systems... require infinite testing. -- Norman Augustine

Working...