Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
News Science Technology

Nvidia Pascal GP100 GPU To Rock 4 TFLOPS Double Precision, 12 TFLOPS Single Precision Processing Power (techtimes.com) 45

New information emerged regarding Nvidia's Pascal GPU, covering the total compute performance of the much-anticipated FinFET-based chip. Based on a number of slides from an independent researcher, the Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance. The flagship GPU is purportedly able to provide four TFLOPs of Double-Precision (FP64) compute performance as well.
This discussion has been archived. No new comments can be posted.

Nvidia Pascal GP100 GPU To Rock 4 TFLOPS Double Precision, 12 TFLOPS Single Precision Processing Power

Comments Filter:
  • by zenlessyank ( 748553 ) on Sunday February 21, 2016 @06:18PM (#51554041)
    Stick these in a dual processor 18 core Xeon board with some nice fiber channel flash storage and then we can really play some solitaire.
  • Shockwave flash has crashed after autoplaying an ad with music. Twice.

    Can someone link to a real website?

    • I see nothing, but I'm running Ghostery. Ghostery only detects 5 ad-networks, and one of 'm is the twitter button. The rest of the items seems harmless.

      You may, as suggested by AC below, want to remove some malware from your system.

      • I run Ghostery as well and I paused the blocking to test it out. At least one of the initial networks is responsible for loading other networks which seem to load other crap in turn. It didn't take more than 10 seconds before the count was over 100. At that point, stuff started auto-playing and making noise so I shut the tab, but I wouldn't be surprised if more shit kept getting pulled in. I don't know which is the bad apple, but it's pretty damned clear that it's out of control.
        • Eew...

          Yes, after a bit closer examination I clicked on one of the links in the ads that were reasonably well-behaved, and it led me straight into a number of sites registered at the nr. 1 destination for crooks and criminals - straight up fraudulent websites.

          So since they apparently don't mind that criminals advertise on their site, they probably don't mind that some of them have "drive-by payloads" either. It's probably just a number that pop up irregularly. Nice...

          That site is indeed best avoided.

  • by PhrostyMcByte ( 589271 ) <phrosty@gmail.com> on Sunday February 21, 2016 @06:52PM (#51554207) Homepage

    This new chip is potentially quite a large step up in raw compute performance. Their current flagship Titan X is pushing 6 TFLOPS of single-precision and 192 GFLOPS of double-precision.

    They're clearly aiming high for 4K and VR performance here.

    • by Arkh89 ( 2870391 )

      Note that both the Titan and the Titan Z have better DP performance than the Titan X (1TFlops and 1.5 TFlops IIRC). I am hopping that they will stop crippling the DP on their "gaming" board though (or at least doing it to a lesser extent than the current 1/20~1/32x).
      Also, it is nice to see that the global memory bandwidth will go 4x from this generation (~250GB/s).

      • The Titan was 1.5 TFlops, while the Titan Black was 1.7 TFlops. The Z is listed at 2.7 TFlops, but it's two chips on a single card while the others single chip.

        I'd also like to see their gaming cards get better DP performance, but I'd be very surprised if we actually got the reports 1/4x. The Titan was 1/3x.

    • Re:For comparison (Score:5, Interesting)

      by thogard ( 43403 ) on Sunday February 21, 2016 @07:49PM (#51554503) Homepage

      Those numbers make it look like they were using a 32x32 hardware multiplier-adder and the new one uses a 64x64. Multiplying is a great example of how a 2x increase in transistor density from Moore's law can result in something far greater than 2x real speed increase. To do a 64x64 multiply in an 8 bit cpu (like the 6809 which had an 8x8 multiply instruction) you would have to do 56 separate multiplies (for the significand) and then 16 sums before a number of other sums and shifts to get the exponent normalized. Each of those instructions would take 2 to 11 cpu cycles. A 16 bit hardware multiplier would reduce 56 mul operations to 16 and a 32 bit hardware multiplayer would reduce it to 4. The barrel multiplier is often the largest structure in the ALU part of even a modern CPU. They show up on photos of modern chips as the largest rectangle area that isn't cache or memory controllers.

    • by Junta ( 36770 )

      Note that Maxwell consciously screwed DP performance. You have to go back to Kepler for decent DP.

    • by AHuxley ( 892839 )
      Yes 4K will be the test at 60 fps, 120 and beyond. No more sli needed :)
  • by PopeRatzo ( 965947 ) on Sunday February 21, 2016 @07:02PM (#51554275) Journal

    How many TFLOPS do I need to run the latest AAA games?

    All of them.

    • by Z80a ( 971949 )

      And how many to make em actually good?

      • by Shinobi ( 19308 )

        Depends on the game, doesn't it? With X-Plane and other sims, the answer is: ALL OF THEM! since you can make the flight models far more advanced, and also include more advanced radar simulation etc.

  • Can it run Crysis?

    At 8K?

  • A full beowulf cluster of those!

  • Unrelated metrics (Score:2, Insightful)

    by Anonymous Coward

    "Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance"

    Theoretical memory bandwidth has no impact on theoretical floating point performance.
    It would've been better to say something about the core count and clock.

  • by hankwang ( 413283 ) on Monday February 22, 2016 @02:57AM (#51556591) Homepage
    FLoating-point Operations Per Second. It makes no sense to speak of one FLOP, two FLOPs, as the S is not for plural.
    • FLoating-point Operations Per Second. It makes no sense to speak of one FLOP, two FLOPs, as the S is not for plural.

      This comment is endorsed by Pedantic-Man(tm)!

Any circuit design must contain at least one part which is obsolete, two parts which are unobtainable, and three parts which are still under development.

Working...