Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AMD United States

World's Fastest Supercomputer Coming To US in 2021 From Cray, AMD (cnet.com) 89

The "exascale" computing race is getting a new entrant called Frontier, a $600 million machine with Cray and AMD technology that could become the world's fastest when it arrives at Oak Ridge National Laboratory in 2021. From a report: Frontier should be able to perform 1.5 quintillion calculations per second, a level called 1.5 exaflops and enough to claim the performance crown, the Energy Department announced Tuesday. Its speed will be about 10 times faster than that of the current record holder on the Top500 supercomputer ranking, the IBM-built Summit machine, also at Oak Ridge, and should surpass a $500 million, 1-exaflops Cray-Intel supercomputer called Aurora to be built in 2021 at Argonne National Laboratory. There's no guarantee the US will win the race to exascale machines -- those that cross the 1-exaflop threshold -- because China, Japan and France each could have exascale machines in 2020. At stake is more than national bragging rights: It's also about the ability to perform cutting-edge research in areas like genomics, nuclear physics, cosmology, drug discovery, artificial intelligence and climate simulation.
This discussion has been archived. No new comments can be posted.

World's Fastest Supercomputer Coming To US in 2021 From Cray, AMD

Comments Filter:
  • They missed the extra $1 billion that will be needed for the nuclear power plant next door to keep the thing running!
  • Other than the "because we can" reason (which, sure, I suppose is reason enough to do it all by itself), How fast is fast enough? Several orgs are constantly racing to build ever faster machines that can process more and more data. Constantly. Like every few years (at most) Oak Ridge announces a new plan for a new machine. Can they not get more use out of their current super-computers? When is fast fast enough?

    • As far as computationally intensive problems go, it's turtles all the way up.

    • by godrik ( 1287354 ) on Tuesday May 07, 2019 @09:36AM (#58551788)

      Well, lifespan of a super computer is about 5 years. Also, energy efficiency keeps on improving and energy consumption is about 1/3 of the total cost of the system (the other two thirds being roughly the system itself and manpower). The machines are packed and scientist can definitely use the cycles. So it is not surprising that we keep building/refreshing them.

      A hard question is do we need 1 exaflop computer or 1000 petaflop computers. That's a much harder question to answer. 1000 petaflop computers would be quite valuable and are easier to build.
      But there are scientific questions that you can only answer on big system. I worked with physicists about 10 years ago. And some of their calculation required about 2PB of main memory. At the time, it was about half the memory of the biggest super computer in the US.

      • by Chozabu ( 974192 )
        What kind of calculation were they doing that required about 2PB of main memory?
        • by Pascoea ( 968200 )

          What kind of calculation were they doing that required about 2PB of main memory?

          Probably one with some big numbers. But yes, I would also love to know the serious answer to that question. It makes my head hurt to think about crunching that much data in one sitting...

        • They were attempting to run a preview build of Microsoft Office 2020. Unfortunately, it went to swap and started thrashing.

        • by godrik ( 1287354 ) on Tuesday May 07, 2019 @12:58PM (#58553060)

          The basic answer is that they were trying to eigensolve a sparse matrix which is 2PB large. You don't want to go to disk on a matrix that big or you waste all your time in IO. Recomputing the matrix is too expensive so you can't do that either. Just the vectors you need to keep in your eigensolver are dozens of GB large. So really your only solution was to store the sparse matrix in (distributed) memory. Actually that problem use case was one that was considered when designing what became burst buffer technology.

          The matrix itself comes from the expansion of the schrodinger equation for a particular atom when doing ab initio method in no-core configuration interaction calculation. (These are just words to me, I don't really know what they mean.) The number of row and column goes exponential in the number of particle that make the atom. And the number of non zero grows based on type of interactions (2 body, 3 body, ...) you consider. If you are looking at boron 10 (which is smaller than they were interested in), with a schrodinger equation truncated to a point that is probably not useful for doing physics (but helpful to run some test), the matrix was about 1TB large and you need to keep 30GB of vectors.

          I am not a physicist, so I can't tell you much more than that; and I apologize if I mis represented the physics aspect of the question. The paper is here: https://iopscience.iop.org/art... [iop.org] In case you wonder, I am author number 8.

          I am not sure they managed to run the 2PB case because accessing these machine is difficult. But they looking at smaller atoms and run 100TBs problems.

      • Please let's stop calling clusters "supercomputers".

    • Can they not get more use out of their current super-computers? When is fast fast enough?

      For the sorts of problems you use a supercomputer for there is no useful limit to "fast enough". There are always problems that require more computational power to solve than we currently possess or which take long enough to solve on our current machines as to render them busy and thus unavailable for other problems while they crunch away. Essentially an opportunity cost - by solving one problem you necessarily have to wait to solve another until the processor time is available. So faster machines let yo

    • The sky is the limit with these things. You are talking about simulations where there really isn't such a thing as "enough" resolution.
      • by Anonymous Coward

        No, the sky is not the limit here. This is the point. Strong scaling on these systems is severely impacted by the communication of data between eacb node during a job. The more nodes, the greater this overhead. What is being attempted is to find ways to: A) make this overhead smaller, and B) make the computation portion of code significantly more robust (not just faster) so that communication has less impact on overall runtimes.

        These things require large systems to test, and new interconnection types, along

        • by Trogre ( 513942 )

          Umm, I think the GP was talking about the sky being the limit for computational demand, not capacity.

          There are plenty of research disciplines that will happily saturate any supercomputer resources you throw at it.

    • by AHuxley ( 892839 )
      To simulate a nations thermonuclear weapons.
      ie the massive amount of math needed for new climate change and agricultural "calculations".
  • It's also about the ability to perform cutting-edge research in areas like genomics, nuclear physics, cosmology, drug discovery, artificial intelligence and climate simulation.

    And cryptograp4\|&.,k@.
    no carrier

  • can it run Crysis? {ducks}
  • How quickly can it break the 128 bit encryption key that most HTTPS session use today? Plug this baby into Echelon and let's see what we see!

    Oh wait - it's a "research system". /s
    • by Anonymous Coward

      You wouldn't use a general purpose supercomputer for cracking TLS ciphers, it would be horribly inefficient. You'd run, at the very least, FPGAs, or dedicated silicon for the more common ciphers.

      Something like this:
      https://en.wikipedia.org/wiki/EFF_DES_cracker

      That thing crushes DES and was built by a couple of small companies and the EFF. You think the NSA would buy off the shelf hardware and announce it in a press release? They have their own chip fab, you know.

    • Oh wait - it's a "research system". /s

      huh?

      Yes it is. Such a system is a bit crap for cracking encryption keys.

  • by Eravnrekaree ( 467752 ) on Tuesday May 07, 2019 @11:18AM (#58552488)

    The Cray supercomputers run a SUSE based distribution. SUSE is very advanced and rock solid with a fast package management system, gained during the Novell days, so it is will suited. early all supercomputers run Linux, which is impressive how Linux runs everything from a smartphone to a supercomputer. With all of the scalability improvements Linux is geting with io_uring etc, and the improved flexibility, configurability and control systemd offers, its becoming a very flexible enterprise grade OS.

    • by Anonymous Coward

      The Cray supercomputers run a SUSE based distribution. SUSE is very advanced and rock solid with a fast package management system, gained during the Novell days, so it is will suited. early all supercomputers run Linux, which is impressive how Linux runs everything from a smartphone to a supercomputer. With all of the scalability improvements Linux is geting with io_uring etc, and the improved flexibility, configurability and control systemd offers, its becoming a very flexible enterprise grade OS.

      Umm no. package system came from ximian which novell bought. Systemd is a desktop product which should have stayed away from enterprise

    • systemd is garbage that Redhat squatted and shat on the enterprise world.

      The SLES based "Cray Linux Environment" could work fine without it, in fact better without it from what I've seen by the SLES servers I have to manage. systemd is bloated unnecessary crap in the enterprise space, and when things go wrong it impedes recovery and troubleshooting.

  • Even for Cnet this is a bad article. I particularly enjoyed the "1.5 quintillion calculations per second, a level called 1.5 exaflops".

  • ... as demonstrated by Watson's dismal performance [businessinsider.com]

    IBM is shuttering its Watson AI tool for drug discovery

    • by godrik ( 1287354 )

      For sure. But the codes that are running in there are already using the best algorithms that we know for these problems. So you get to a point where going bigger is the only option. (Beside not computing it at all.)

  • It's running Linux, I presume?

Fast, cheap, good: pick two.

Working...