Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
The Almighty Buck Bug Software

How To Lose $172,222 a Second For 45 Minutes 327

An anonymous reader writes "Investment firm Knight Capital made headlines in 2012 for losing over $400 million on the New York Stock Exchange because of problems with their algorithmic trading software. Now, the owner of a Python programming blog noticed the release of a detailed SEC report into exactly what went wrong (PDF). It shows how a botched update rollout combined with useless or nonexistent process guidelines cost the company over $172,000 a second for over 45 minutes. From the report: 'When Knight used the Power Peg code previously, as child orders were executed, a cumulative quantity function counted the number of shares of the parent order that had been executed. This feature instructed the code to stop routing child orders after the parent order had been filled completely. In 2003, Knight ceased using the Power Peg functionality. In 2005, Knight moved the tracking of cumulative shares function in the Power Peg code to an earlier point in the SMARS code sequence. Knight did not retest the Power Peg code after moving the cumulative quantity function to determine whether Power Peg would still function correctly if called. ... During the deployment of the new code, however, one of Knight's technicians did not copy the new code to one of the eight SMARS computer servers. Knight did not have a second technician review this deployment and no one at Knight realized that the Power Peg code had not been removed from the eighth server, nor the new RLP code added. Knight had no written procedures that required such a review.'"
This discussion has been archived. No new comments can be posted.

How To Lose $172,222 a Second For 45 Minutes

Comments Filter:
  • by Tablizer ( 95088 ) on Tuesday October 22, 2013 @07:17PM (#45207921) Journal

    See, the private sector can blow money faster than the public sector (OmabaCare site).

    • by TsuruchiBrian ( 2731979 ) on Tuesday October 22, 2013 @07:59PM (#45208169)
      The wealth that the money represented was not "lost", but rather redistributed. Efficient redistribution of wealth is a strength of the private sector not a weakness. The private sector is good at redistributing money where it needs to go for economic growth. This company was not exercising an appropriate level of caution and it's money was redistributed elsewhere.
      • by techno-vampire ( 666512 ) on Tuesday October 22, 2013 @08:19PM (#45208273) Homepage
        I think you're being a tad too literal here. As far as Knight was concerned the money was lost because they didn't have it any longer and they had nothing to show for it.
        • I was responding to a comment that suggested this was a failure of the private sector. I responded that this is an example of what the private sector is supposed to be doing, and actually does quite well. When you talk about 1 company in the private sector losing money, this doesn't mean the private sector as a whole lost wealth (e.g. what happened here). When you talk about the public sector losing money, it almost always translates to lost wealth.

          As a society we don't need to care what happens to some

          • Re: (Score:3, Insightful)

            by Waffle Iron ( 339739 )

            It is usually just wasted.

            By your own previous logic, money can never be "wasted", just redistributed.

            Every dollar that the government collects in taxes or creates out of thin air is redistributed back into the private sector.

            • by khallow ( 566160 )
              The mugger redistributes money as well. If value isn't created as part of the "redistribution" it's a waste.
          • by TubeSteak ( 669689 ) on Tuesday October 22, 2013 @11:24PM (#45209159) Journal

            As a society we don't need to care what happens to some random company.

            If JP Morgan collapsed tonight, we (as a society) would certainly care about what happens.
            Why? Because some "random companies" are so big that their troubles would shake the (inter)national economy.

            This is a great strength of the private sector, and this property is what is referenced by the phrase "the market is self-regulating".

            The market is not self regulating, unless it is self regulating towards oligopolies, oligopsonies, cartels, and general shittiness.
            Greenspan Concedes Error on Regulation [nytimes.com]
            October 23, 2008

            But on Thursday, almost three years after stepping down as chairman of the Federal Reserve, a humbled Mr. Greenspan admitted that he had put too much faith in the self-correcting power of free markets and had failed to anticipate the self-destructive power of wanton mortgage lending.

            "Those of us who have looked to the self-interest of lending institutions to protect shareholders equity, myself included, are in a state of shocked disbelief," he told the House Committee on Oversight and Government Reform.

            On a day that brought more bad news about rising home foreclosures and slumping employment, Mr. Greenspan refused to accept blame for the crisis but acknowledged that his belief in deregulation had been shaken.

            I could quote the entire article, hell his entire testimony. [scribd.com]
            There was no room in his ideology for private companies to intentionally abandon risk management and externalize the risk by selling it off.
            So despite his attempts to mince words, the results of his Ayn Randian ideology ended up being exactly what one would historically expect from not having meaningful regulation.

            • by khallow ( 566160 )

              The market is not self regulating, unless it is self regulating towards oligopolies, oligopsonies, cartels, and general shittiness.

              The market wasn't the problem in the Greenspan case. It was easy lending by the federal reserve combined with really high leverage. The first is the fault of the Fed, the very group led by Greenspan. The second is the fault of federal regulators who let us note are part of the largest monopoly in the world and not subject to market forces.

              So despite his attempts to mince words, the results of his Ayn Randian ideology ended up being exactly what one would historically expect from not having meaningful regulation.

              I suppose that might be true, but why would one expect markets to instantly compensate for changes in government regulation?

              There's also the problem of what would have

            • If JP Morgan collapsed tonight, we (as a society) would certainly care about what happens. Why? Because some "random companies" are so big that their troubles would shake the (inter)national economy.

              1. If our country is contingent upon a particular company to survive, then I don;t think it fits the definition of "some random company".

              2. The reason we have companies like this is because they got preferential treatment (e.g. regulations that benefited them over their competitors, along with bailouts, etc)

              3. There is no reason that we need to have a system with too big to fail companies, we just don't yet have the political will to remove corrupt and inept politicians who help foster the current environme

      • It's like a Catholic catechism. No matter what data is presented, win loss or draw, it's evidence of how efficient the free market is. Well done.

      • by Errol backfiring ( 1280012 ) on Wednesday October 23, 2013 @08:53AM (#45211745) Journal
        In this on-line gambling system, money has lost all its real value. One of the very problems we have now is that a bit of on-line gambling can be traded for real work. And off course, that there is so much money to be made with on-line gambling that the financial institutions have left the real economy behind long ago. Nowadays, finance has almost nothing to do with economy anymore.
    • Eh that's the point of capitalism. People who fail at things go out of business, and others who don't fail as much replace them. Maybe this would have put Knight out of business. Great, then another company can fill that spot. Yet what happens if the government fails? They can just raise taxes or create money to cover the gap, thus making everybody else in the nation poorer.
  • by mythosaz ( 572040 ) on Tuesday October 22, 2013 @07:18PM (#45207925)

    I'm sure someone will chime in and claim to be the em-effing Change MASTER! but this seems like an ordinary error, the sort I've seen a hundred times before, where one server is a tiny bit wonky, and during the change, something doesn't happen as expected. Normally it's an "inexpensive" error, where some of your VPN users get randomly disconnected... ...and sometimes it's the sort of error where you lose half a billion dollars an hour by HFT trading... ...badly.

    • by Qzukk ( 229616 )

      It's true that mistakes can be made and accidents can happen.

      It's also true that if you bother to try, you might fix some of those mistakes and catch those accidents before they happen.

      Knight Capital didn't even bother to try. At least they managed to find private investors to bail them out after the SEC refused.

      • Exactly right. It's like when the rovers landed on mars years ago and one of them was down for nearly a month because NASA had never tested the OS on the rover for a full 30days and the bug that took it out only arose after it had been running 30days. It's an 80 billion dollar project and they didn't test the software for 30days? Sometimes you just have to wonder what the fuck people are thinking.

    • by Alan Shutko ( 5101 ) on Tuesday October 22, 2013 @08:55PM (#45208441) Homepage

      There were a number of errors made here.

      1. They failed to deploy to one of eight servers
      2. They failed to automate the deployment to the servers such that it would be impossible to deploy to all servers without knowing.
      3. They didn't have a step between code deployment and production activation where they could validate all 8 servers. For instance, in our company, we deploy the prod code to the prod servers but leave them in a "stage" environment, where the production load balancer doesn't hit those instances. Once we've validated, we then switch the load balancers to point to the correct instance.
      4. They failed to quickly back out a change when they realized it was having problems. In fact, they backed out the part on seven servers but not the flag that was being sent to the servers, which made things worse.
      5. They failed to have a risk-mitigation backstop in place which would have prevented these orders from being submitted once they hit a certain amount, and which was required by SEC Rule 15c3-5(c)(1)(i).

      There were a lot of places that you could put in a control to prevent or limit the effect of these kinds of errors, and that's the lesson people need to learn. Yes, mistakes happen! But try to make it hard to make a mistake, easy to recover from a mistake, and really easy to NOTICE a mistake.

    • by raymorris ( 2726007 ) on Tuesday October 22, 2013 @11:07PM (#45209097) Journal

      To err is human. To screw up 100,000 things per second requires root.

  • by Anonymous Coward on Tuesday October 22, 2013 @07:18PM (#45207931)

    This level of trading does not do the market any good, and puts individual investors at a severe disadvantage against firms like this.

    It can be stopped. And it should be stopped. And the only reason it is not being stopped is because too many rich and powerful people make too much money on it.

    • This level of trading does not do the market any good, and puts individual investors at a severe disadvantage against firms like this.

      It can be stopped. And it should be stopped. And the only reason it is not being stopped is because too many rich and powerful people make too much money on it.

      I'd go as far as 10 seconds just to eliminate the possibility of assholes pulling shit with different rules for rounding, or horseshit like "our clock was slightly off", etc.

    • by Anonymous Coward on Tuesday October 22, 2013 @07:42PM (#45208067)

      Personally, I believe if you're going to buy stock in a company then you should be required to hold said stock for at least 24 hours, if not much longer. the stock market was created to allow people to invest their money in a company, thus allow that company to use that money to grow which should result in a return (or loss). It was not designed for gambling, which is what it has become.

    • by ShanghaiBill ( 739463 ) on Tuesday October 22, 2013 @07:45PM (#45208077)

      You are confused. "Algorithmic trading" and "High Frequency Trading" are two different things, used for two different purposes. This was caused by AT, and you are complaining about HFS. That makes no sense.

      • "SMARS is an automated, high speed, algorithmic router that sends orders into the market for execution"

        All "High Frequency Trading" is by definition "Algorithmic trading", though the reverse doesn't necessarily hold.

      • by mysidia ( 191772 )

        This was caused by AT, and you are complaining about HFS.

        If it was not HFS; then why were the orders not being presented for human approval?

      • It's all black magic to the Luddites, and thus must be banned.

        • "It's all black magic to the Luddites, and thus must be banned."

          Are you saying that a gang of "market makers" aka high priests aka railroad barons aka oligarchs, should run the economy because only they know the esoteric and unseen illusions in the system which they themselves engineered?

          "magic" indeed. sounds more and more like gaming the system every time i hear about it.

          • Are you saying that a gang of "market makers" aka high priests aka railroad barons aka oligarchs, should run the economy because only they know the esoteric and unseen illusions in the system which they themselves engineered?

            Translation: "I have no idea what I'm talking about, so I'll just throw around some big words to make myself look intelligent". (Protip: It has rather the opposite effect.)

    • Re: (Score:3, Interesting)

      by Anonymous Coward

      Just tax EVERY transaction. Done.

    • This is not HFT and as such a small delay would not have changed anything. It went on for 45 minutes, not 45 microseconds.

    • by Skapare ( 16644 )

      The method I have proposed to fix this is "cycle trading". Traders submit their buy/sell requests for a trade cycle that happens every minute. Each cycle's trades go to completion if possible, based on buy/sell amounts and prices (bid/ask). Requests will be left over if they cannot be bought or sold in that cycle. They can be flagged to hang in there for the next cycle, or canceled, or changed.

      But instead of a one minute cycle, let's do a one day cycle.

    • by mysidia ( 191772 )

      This level of trading does not do the market any good, and puts individual investors at a severe disadvantage against firms like this.

      Well; it does do the market good... it helps with eliminating inefficiencies.

      However; I am of the opinion that there should be a minimum "execution delay"; with all trades timestamped.

      Trades should be cleared at 90 second increments; with no trade younger than 120 seconds eligible to be executed. After 90 seconds have elapsed; the trade should become non-

    • by girlintraining ( 1395911 ) on Tuesday October 22, 2013 @08:50PM (#45208421)

      Eh, other posters have already pointed out that you're referencing high frequency trading, not algorithmic trading, so this is offtopic. Nonetheless... where exactly do you think this 1 second delay should be put in, and what would it accomplish? Make the wires "longer"; That would mean less contention for premier data centers in NYC. In one second, you can send a signal around the world five times over. But that doesn't help with the propagation of trade data from which the trades are based on; By adding all that extra lag only in terms of trade execution, but not market data, you're potentially putting billions of dollars at risk as trades are now following market data, instead of running concurrently with them. Think of it this way: You swipe your card to pay for gas. The price shown is $3.55. But when you start the pump, the price drops to $3.54. But you started the pump a second too late, so you're billed a penny more than the guy who waited a split second. Now, multiply this a few million times and suddenly you've got a market crisis. It's the same if you lag the market data but allow trades at full speed.

      Let's say you put this one second latency in for both sides; trade execution and market data. How exactly do you syncronize the data when the price itself is determined by trades -- you potentially have more trades waiting to be executed than you have shares... the price is now in some kind of weird state whereby it cannot be accurate until the trades are complete, yet as the trades complete the price is trading. Now you've turned a tiny amount of speculation into a massive amount of speculation. You've made the problem a thousand times worse!

      You see, no matter where you put in your "one second delay", you're reducing liquidity, increasing costs, and causing money to be lost out of the system. Your idiotic attempt to help the "little guy" has resulted in utter chaos at best, and only made it harder for him at worse!

      High volume trade is just margin trading; Buying low and selling high. Now there's a lot of macroeconomic theory to go into what I say next, too much for a slashdot post, but fundamentally... the more trade there is, the more wealth there is. Lots of trades mean the market is healthy. It means money is moving... and the more money moves, the more it trades hands, the more value that money has. The only time money loses value is when it sits in an account doing nothing. It's like potential energy versus kinetic energy. You cannot harness the power of something that isn't moving.

      Every time I hear about people bitch about high volume trading and "the little guy" I die a little inside; It shows a shocking lack of understanding of how markets actually operate, and how these sorts of trades benefit everyone by improving liquidity. The last economic crisis, in fact, the core of all economic crisis, is the lack of money moving. You can't invest because nothing is producing. You can't produce because nobody's investing. These kind of mexican standoffs are what lead to recessions and depressions. Liquidity is at the very heart of any boom, and its absence at the heart of every bust.

      The reason why the "rich and powerful" have created a wealth gap is because money isn't trading hands. There's no trade going on -- the middle class isn't buying anything new, they're just paying off old debts. The upper class are the only ones with any liquidity, and they're holding onto money because there's nothing to invest in; If nobody's buying anything, what then is the point of investment? There's no return then. And the poor... they can't invest. They're living hand to mouth, paycheck to paycheck... economically, they're useless. They'll spend every dollar they're handed on the same things every day -- food, shelter, clothing, gas, rent... these things are essential to daily life, but they don't grow an economy. To get economic growth, you need people buying laptops, cars, services, luxury items.

      And what started all of this? Ironically, it was a small segment of the population -- th

      • I wouldn't add lag, I'd bucket it. 2 second windows. All trades in a 2-second window are collected by the exchange but not executed yet. When the window closes you take all the collected trades, shuffle them into a random order and execute them as if received in that order. Then wait for the next window to close and the next batch of trades to be processed.

        Yes, this is going to thoroughly screw over anyone trying to take advantage of changes in prices over sub-2-second timeframes. The sensible reaction to t

      • by sjames ( 1099 )

        The orders all go into a black box. At the next tick of the clock, they are matched up and not before. Until then, nobody knows what the orders are except those who posted them. Where is money lost? Where does data and trades get out of sync?

        Make the interval 5 minutes and you have a system moving at a reasonable human speed. Suddenly success in the market can be had by any PC on the internet. No need at all to be in a very special rent seeking datacenter. If something goes wrong, it can be caught and corre

  • Validation? (Score:5, Funny)

    by egcagrac0 ( 1410377 ) on Tuesday October 22, 2013 @07:26PM (#45207977)

    Damn the validation. Full speed to prod!

    • What does this have to do with "validation"? TFA talks about *partial* deployment, one server didn't get the update.

      • No testing, no test deployment in a mirrored environment, no one validated that the deployment was correct.

        If these things are the heart of your income, you take a hash of everything pre-deploy and post-deploy, and make sure it matches everywhere it has to.

        Then you validate that the hash list matches everywhere. That's just common sense. I've worked plenty of places where income was not directly tied to what was deployed I developed a manually intensive way to do exactly the same thing - make sure you de

      • Yeah, sounds more like a botched migration than any failure to test. And that was allowed to go forward because of a lacking in their formal migration process.

        There are a lot of growing pains when moving to entire systems of new technology. Testing the technology in pieces (unit testing) and as a whole (system testing) is already a given. For things that really matter though (like money), the processes themselves need to be tested and refined. I've seen a lot of processes go to production, i.e. used on prod

  • by onyxruby ( 118189 ) <onyxruby@cSTRAWomcast.net minus berry> on Tuesday October 22, 2013 @07:30PM (#45207993)

    No proper change management, no peer review, no proper lab testing. Dev should always reflect production to the greatest reasonable level. No proper maintenance windows. You should never be surprised by a change in production. This is a case study in incompetence and the failure to execute industry best practices. I'm guessing the guy or gal who raised the best practices flag was ignored as being inconvenient or too expensive.

    If I'd done this kind of thing when I was working with the exchanges I would have been fired in a heartbeat. Whoever failed to utilize best practices, or whoever failed to allow the utilization of best practices had damn well better have been fired. This is incompetence of the highest level and a perfect example of why ITIL based best practices were born.

    • It's very easy to do all of these things, and then have one server get missed from your production change list. ...and your validation check *seems* OK.

    • No proper change management, no peer review, no proper lab testing.

      Also, there appears to have been no run-time checks or assertions. I have seen high reliability systems where 90% of the code is various run-time sanity checks, and only 10% implements the actual functionality. And that was for systems with a failure cost of way less than $400M.

    • by Billly Gates ( 198444 ) on Tuesday October 22, 2013 @07:56PM (#45208155) Journal

      Dude, Wall Street traders demand software engineers get things done in minutes and hours!

      There is no process as it all had to be done yesterday if you ask anyone who has worked in that environment.

      They pay top dollars and change core algorithms on the fly as each millisecond costs money to a competitor who has a more efficient trading algorithm that steals from their own HTC network.

      So when when skims and manipulates the prices in milliseconds they steal from those who have process engineered designs who are too slow to react.

      If it messes up they get bailed out by the SEC anyway and they can just fire the programmer.

    • I'm sure they were fired. That's how scapegoating works. You tell them about the problem, they ignore you, it fails as you said it would, they fire you.
    • No proper change management, no peer review, no proper lab testing. Dev should always reflect production to the greatest reasonable level. No proper maintenance windows. You should never be surprised by a change in production. This is a case study in incompetence and the failure to execute industry best practices. I'm guessing the guy or gal who raised the best practices flag was ignored as being inconvenient or too expensive.

      If I'd done this kind of thing when I was working with the exchanges I would have been fired in a heartbeat. Whoever failed to utilize best practices, or whoever failed to allow the utilization of best practices had damn well better have been fired. This is incompetence of the highest level and a perfect example of why ITIL based best practices were born.

      I didn't read TfA, but from TfS, none of what you said would solve this problem, or a better way to put it is they all could have actually taken place to a reasonable degree.

      Is it generally expected or practical to test combinations of versions of the same software in a cluster? Only automated testing could catch a problem like that, and you'd need a simulated production workload.
      A "reasonable" development environment would NEVER reach that far. That is a very above average QA environment.

      Of course everyb

  • IT Debt (Score:5, Insightful)

    by Anonymous Coward on Tuesday October 22, 2013 @07:48PM (#45208099)

    They had some code that processed orders in a special way. There was a flag on the order they could set that would trigger that code. We will call this Power Peg. They later moved away from that functionality but it still existed in the system. It sat there for years untested and unused. 9 years later they added new functionality and decided to reuse that same flag. The new code also disabled Power Peg.

    When they pushed the new code into production, they missed a server. That missed server still had Power Peg looking for that flag. Orders started setting that flag and it was processed correctly on all but one server. But that last server was placing orders incorrectly. The logic that Power Peg used was not valid anymore. In a panic they rolled back the code on the servers. Not knowing that Power Peg was the issue, they now had all the servers running Power Peg again.

  • Knight had no written procedures that required such a review. And as the deity of your choice only knows, you are not required to expect uncommon sense from your employees.
  • by RightwingNutjob ( 1302813 ) on Tuesday October 22, 2013 @08:43PM (#45208387)
    Didn't RTFA, but summary makes me go WTF in several places:
    1. Python. I thought all the quants liked C, assembler, and even VHDL for their high frequency stuff. No matter
    2. "2nd technician to review". If this were flight hardware or a bridge or skyscraper, there would be a second "technician" to review and at least one "engineer" to personally sign off that what was built/deployed is a) done right and b) is what you want
    3. "no written procedures". There are a very small number of things in life about which it is absolutely imperative to keep a rod firmly up one's ass: a. moving machinery, b. formal mathematics, and c: hundreds of millions of dollars of your clients and shareholders' money.
  • And nothing of value was lost.
  • the body of some programmer is at the bottom of a river.
  • Of all the things that probably shouldn't be tested in production...! I want to write my own trading bot, but I'm not going to give it control of my entire portfolio before I thoroughly test it.
  • I don't always test my code, but when I do, I do it in production!

The road to ruin is always in good repair, and the travellers pay the expense of it. -- Josh Billings

Working...