Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI Open Source

Meet 'Smaug-72B': The New King of Open-Source AI (venturebeat.com) 37

An anonymous reader shares a report: A new open-source language model has claimed the throne of the best in the world, according to the latest rankings from Hugging Face, one of the leading platforms for natural language processing (NLP) research and applications.

The model, called "Smaug-72B," was released publicly today by the startup Abacus AI, which helps enterprises solve difficult problems in the artificial intelligence and machine learning space. Smaug-72B is technically a fine-tuned version of "Qwen-72B," another powerful language model that was released just a few months ago by Qwen, a team of researchers at Alibaba Group.

What's most noteworthy about today's release is that Smaug-72B outperforms GPT-3.5 and Mistral Medium, two of the most advanced open-source large language models developed by OpenAI and Mistral, respectively, in several of the most popular benchmarks. Smaug-72B also surpasses Qwen-72B, the model from which it was derived, by a significant margin in many of these evaluations.

This discussion has been archived. No new comments can be posted.

Meet 'Smaug-72B': The New King of Open-Source AI

Comments Filter:
  • Oh, great (Score:5, Funny)

    by 93 Escort Wagon ( 326346 ) on Wednesday February 07, 2024 @03:03PM (#64222810)

    With a name like "Smaug", it can't help but eventually turn evil...

    Good job, guys.

  • More impressive (Score:5, Interesting)

    by ceoyoyo ( 59147 ) on Wednesday February 07, 2024 @03:13PM (#64222836)

    More impressive are the models that score almost as high, probably within error, and are 1/5th the size.

    • My question would be, how does one explain the difference in size? Is it possible that the smaller models ace the tests because their models happen to cover the things that are part of the test? Would these smaller models struggle with a broader range of test inputs or subject matte?

      • by ceoyoyo ( 59147 )

        The tests are pretty broad, but it is possible to specialize, and some of the models are explicitly specialized in certain areas. Leaderboards that don't keep their tests secret and allow multiple tries are mostly bullshit, but if you're going to play look at the pretty numbers you might as well look at all of them.

        These models are all derivatives of a few source designs so there's not much difference in actual architecture. You can improve efficiency by choosing your training material and how it's presente

    • Re:More impressive (Score:5, Informative)

      by Rei ( 128717 ) on Wednesday February 07, 2024 @10:32PM (#64223814) Homepage

      A word of caution about these leaderboards: you can train to the test. You can literally just download the eval datasets, reformat them, and then use them as datasets in your finetune. I don't trust these leaderboards much at all.

      Also, "open source" is kind of a stretch. It's based on Qwen, whose license is similar to the LLaMA 2 license: viral (generations can only be used to train derivatives of itself), and if any project becomes big enough, you have to negotiate with the owner (in this case, Alibaba), who holds all the cards, on licensing terms.

      • by Shaitan ( 22585 )

        "viral (generations can only be used to train derivatives of itself)"

        I'm skeptical of enforceability here. The copyright on output belongs to the user [or whoever is paying them] and not the owner of the IP on the machine. If it belongs to anyone at all given the copyright office's take that AI generated output isn't copyrightable.

        • by ceoyoyo ( 59147 )

          The copyright isn't on the output, it's on the model. The owner can restrict how you can license derivatives of the model. Same as software.

          • by Shaitan ( 22585 )

            "viral (generations can only be used to train derivatives of itself)"

            To clarify I'm interpreting generations here to refer to generated responses to prompts and not generations of the same model being loaded and then further trained because that would render the statement I'm replying to meaningless... generations in the latter sense ARE by definition derivatives of itself.

            "The copyright isn't on the output, it's on the model. The owner can restrict how you can license derivatives of the model. Same as soft

            • by Rei ( 128717 )

              A person can put (nearly) whatever terms of use they want on their product, including how you're allowed to use it and what you're allowed to do with things you make from it. And for anyone thinking, "Well, I'll just let some other person break the license and use what he created to train my model, then it's him that broke it, *I* never agreed to any license!", you should be aware that secondary liability is indeed a "thing", you don't have to have a signed relationship with the infringer (it can be implic

              • by Shaitan ( 22585 )

                "A person can put (nearly) whatever terms of use they want on their product, including how you're allowed to use it and what you're allowed to do with things you make from it."

                Yes, but those terms aren't legally binding. If you sell shoes you can list how you'd like me to use it all day on the box but none of that is binding, you have no authority to set conditions on how I use my property. One doesn't need a license for a product which is legally obtained. Copyright on the other deals with creative works,

                • by Rei ( 128717 )

                  A license is a contract. Legally. You have to agree to the contract to use the product (a product which otherwise you have no legal right to even possess). If you think you can just ignore parts of a contract you don't like, you have a nasty surprise coming for you if you're ever sued.

                  • by Shaitan ( 22585 )

                    Why do you think you need to agree to a contract in order to use legally possess or use a work which is copyrighted? I'd venture just about everything in your home is covered under copyright protection. You don't need any sort of license from lazyboy to own and sit in a lazyboy chair or to do anything else with the chair for that matter, even giving away or selling it covered under the second sale doctrine of copyright.

                    You only need a license if you want to engage in behavior which is restricted by copyrigh

              • by Shaitan ( 22585 )

                Yes, but those terms aren't legally binding. If you sell shoes you can list how you'd like me to use it all day on the box but none of that is binding, you have no authority to set conditions on how I use my property. One doesn't need a license for a product which is legally obtained. Copyright on the other deals with creative works, nobody owns them or has any rights be default. I had an idea for a hamster spatula... look, you have it now as well. You can't use it in some weird furry sex games!

                Not only do

            • by ceoyoyo ( 59147 )

              Facebook's don't-use-our-model-to-train-your-model terms are probably unenforceable, who knows what a judge might think. Is training a model using the output of another model "studying the behaviour and output of existing software?" Do you really want to get sued by Facebook?

              If I were considering going down that route I think I'd just use Apple's unencumbered model instead.

              • by Shaitan ( 22585 )

                "Is training a model using the output of another model "studying the behaviour and output of existing software?" Do you really want to get sued by Facebook?"

                No, it is just studying the output of existing software.

                I have no interest in doing anything with their model. I just note and highlight things like this when I spot them to raise general awareness. Sadly, if otherwise unenforceable terms like this become enough for long enough they can be spun as standing and widespread industry practices and upheld de

  • by Viol8 ( 599362 ) on Wednesday February 07, 2024 @03:55PM (#64222932) Homepage

    I know it's a trained neural net, but is that simply data or does it include the actual runtime execution software too? Or do they use all the same base software in the same sense that all linux binaries require the linux kernel to run?

    • Re:What is a model? (Score:5, Informative)

      by omnichad ( 1198475 ) on Wednesday February 07, 2024 @04:04PM (#64222950) Homepage

      They distribute the model in the standardized Safetensors format. You can actually run the model on any software that can load a text model like that. Most of them are Python-based because of the easier GPU compute access.

      • by neoRUR ( 674398 )

        Its also about 130 Gig to download, so unless you have a powerful computer and memory, it's not really usable. There are other 7 Billion parameter models that are just as useful to use.

        • Sure 7 billion models might be smaller. This one is 72. I don't think the whole model has to be loaded into RAM with the way this is set up.

          • by Rei ( 128717 )

            Yes, as a general rule, the whole model has to be loaded (there are some setups out there for having layers out on disk, but if you think running on system memory is slow...).

            Note that you can get quantizations of most models, which are much, much smaller.

        • "powerful computer and memory" needs to be defined in some system requirements area.
          In general, I think these models need a lot of work to be made easier to understand and set-up by people. I've had success with Stable Diffusion and its numerous models, running them locally to generate some nice images, but that's mostly thanks to excellent step-by-step documentation I was able to find.

          I looked at Smaug's ancestor:

          How to Use

          # pip install transformers==4.35.2
          import torch
          from transformers import AutoModelForCausalLM, AutoTokenizer

          tokenizer = AutoTokenizer.from_pretrained("moreh/MoMo-72B-lora-1.8.7-DPO")
          model = AutoModelForCausalLM.from_pretrained(
          "moreh/MoMo-72B-lora-1.8.7-DPO"
          )

          That is all. I admit I understand almost none of it.

          • Powerful computer is an understatement at 72 billion parameters. There's a reason there's no simple how-to instructions - it's that the only people running these are experts already. Even 24GB of VRAM is probably half the minimum size.

        • This is like some weird slap in the face from the universe. My new PC has 128GB of memory. :)

        • Its also about 130 Gig to download, so unless you have a powerful computer and memory, it's not really usable. There are other 7 Billion parameter models that are just as useful to use.

          You don't have to download 16-bit versions of the models if just doing inference. Grab the 5-bit quantized version that drops size down to only about 40 GB or so. There is very little quality difference between 16 and 5 bits.

          With something like llama.cpp you can split roughly 40 GB between RAM/VRAM so part runs on the GPU and the rest spills over to the CPU.

          If you don't have a huge system 70B models will be slow but people should at least still be able to use it without spending a fortune on hardware. Ove

    • by ceoyoyo ( 59147 )

      A model is a big equation. The parameters are the unknown variables. Training is estimating values for them. When somebody says "the model has 72 billion parameters and you can download it here" what they mean is that there are 72 billion of those variables, estimated values for them are stored in a file, and you can download that file.

      You also need some code to take those parameters, substitute them into the equation, and evaluate it. The evaluation is pretty standard, but you need a bit of code that acuta

    • by Rei ( 128717 )

      Any inference software that supports Qwen. If you're looking for a web interface, try text-generation-webui.

  • I mean this tech is no breakthrough and not even usable in many context. If you add the contexts were it works, but is not useful, not a lot remains.

  • With a name like that, why do I think of the Alien movies, and why oh why do I find that apt as applied to "AI" stuff?

If it's worth doing, it's worth doing for money.

Working...