Forgot your password?
typodupeerror
News IT Politics Technology

BP Gulf of Mexico Rig Lacked Alarm Systems 92

Posted by Soulskill
from the eh-we'll-keep-an-eye-on-it dept.
DMandPenfold writes "BP's monitoring IT systems on the failed Deepwater Horizon oil rig relied too heavily on engineers following complex data for long periods of time, instead of providing automatic warning alerts. That is a key verdict of the Oil Spill Commission, the authority tasked by President Barack Obama to investigate the Gulf of Mexico disaster."
This discussion has been archived. No new comments can be posted.

BP Gulf of Mexico Rig Lacked Alarm Systems

Comments Filter:
  • As opposed to... (Score:5, Interesting)

    by toejam13 (958243) on Saturday January 08, 2011 @01:58PM (#34805480)

    Three Mile Island, where the complaint was that there were too many alarms going off.

    • by Gruturo (141223) on Saturday January 08, 2011 @02:03PM (#34805506)

      Three Mile Island, where the complaint was that there were too many alarms going off.

      Yeah, surprisingly alarms have to be neither missing nor useless (by being irrelevant, hard to understand, going off for the wrong reasons, presenting wrong scenario, not correlating causes etc etc etc).

      Who'd have thunk it.

      • by ColdWetDog (752185) on Saturday January 08, 2011 @02:21PM (#34805606) Homepage
        Truly amazing, indeed. Too lazy to look it up, but earlier reports had shown that Transocean (the rig owners, not BP like the stupid article mentions) had shut down many automatic warning systems because of too many false positives.

        It's not like we've never seen this sort of thing before ...

        "You are about to do something."

        CANCEL, or ALLOW?
      • by timeOday (582209)
        Easier said than done. Always triggering the right alarm - and only the right alarm - amounts to creating a system that somehow knows exactly how to handle any situation, no matter how complex. Otherwise it will certainly trigger alarms for secondary problems without knowing they have a common cause, or trigger alarms for problems that would normally be serious enough to warrant attention, but don't rise to that level due to dire circumstances.

        And if you could make a perfect detector like that, you'd ha

        • "Easier said than done."

          Of course, or there wouldn't be bussiness around it.

          "Always triggering the right alarm - and only the right alarm - amounts to creating a system that somehow knows exactly how to handle any situation, no matter how complex."

          Wrong. It amounts to getting rid of false asumptions or trying to sell a solution as the magic snake oil that will end all and every problem. Triggering the right alarm and only the right alarm is as easy as:

          1) Known situation: manage automatically
          2) Unknown sit

          • by sirsnork (530512)
            Not to mention, you don't even have to have it being "live" until no. 2 items are into single digits per month. Just run it next to the existing systems and have someone monitor it until you're down to very low numbers for no 2
    • Right. The monitoring systems should summarize things to a point that human operators can reliably understand the situation with a reasonable amount of detail, not too much or too little. If they don't do this, then they are badly designed.
      • When Three Mile Island happened (1979), 4Mhz Z80s were state-of-the-art. Industrial control systems have come a long, long, long way since then.
    • I was thinking much the same thing... There probably is a happy medium, but it's going to be really really hard to hit.

      Sitting a monitoring console hour after hour, day after day, is very tiring and wearing. So systems that monitor for trends and alert the operator are very valuable for cutting through that. But on the flip side, it becomes very easy to depend more and more on the automated systems and less and less on knowledge of the system, environment, and equipment. TAANSTAFL.

      Discla

    • If you're responsible for keeping a system up and you have the ability to trigger alarms then you should! If you're generating false positive alarms then the solution is not to disregard or disable the alarms! You should be investigating every single alarm, documenting it and then fixing the reason why it happened. Alarms can be caused by a misconfiguration of the alarm system which is just as critical as a real alarm. If your fire alarm goes off every day would you fix it or would you start disregardin
      • "Excuses that there were too many false positives just means that people needed to fix the false positives instead of ignoring or disabling them!"

        While I'm with your overall message, you seem to forget that for this to work, bonuses and penalties need to be aligned; when they are not, things like this are expected to happen.

        I.E: I certainly should care about each and every rised alarm, and I'm even told to do so. *But* I'm not payed to take care of rising alarms as soon as I can but to accomplish a differe

    • Re:As opposed to... (Score:5, Interesting)

      by omglolbah (731566) on Saturday January 08, 2011 @04:49PM (#34806988)

      Indeed. Alarm suppression is a complex thing to set up in many cases. I personally work in the business and know how much thought goes into the alarm handling of the plants operating in Norwegian waters.

      One example of a "simple" suppression case is that if Controller A goes down, you do not need to tell the operator that ALL signals on this controller is in "bad quality" or out of bounds. What you need to tell them is that the controller is down, and which systems are affected (which they will see on their displays as valves change color or somesuch. Our system uses white asterisks and white color to indicate that something is 'dead')

      More complex cases are things like not throwing alarms for low flow rates in pipes where the valves are closed, or not throw electric alarms on equipment set to maintenance mode.

      Regardless of all this, there should be an alarm system that has priorities.

      Pri 1 alarms are such that they require IMMEDIATE attention. Such as a dangerous triple-high alarm (HHH or 3H) of a tank, pressure or temperature or a controller going down.
      Pri 2 would be alarms that could develop into Pri 1 if not handled within a few minutes (H/HH) alarms etc.
      Pri 3 would be what we call "pre-alarms". Things that could cause process upset or issues down the line. Like a low flow of coolant even though the temperature of the equipment being cooled hasnt started raising yet. Or a low level in a fuel tank.
      Pri 4 we usually assign as maintenance issues. Like two redundant sensors having more than 0.5% deviation between them (But not enough to cause a real alarm). Things that should be looked at but within a day or so.

      Being able to filter alarms like this helps immensely during an emergency. This is an old system with a limited number of 'alarm groups' and 'priority levels' but it still works fairly well. Operators can see what happens even with several hundred alarms going off at the same time. On our simulator we did a fun test where we tripped 70% of the plant (about 18000 distinct 'tags' or io points went into Bad quality and several thousand in alarm).
      The operators were able to stop the cascade failure and no pipe burst in the simulator :)

      Shit -will- hit the fan. It is always nice to be able to filter it so that only the important shit actually hits the wall :p

    • by arivanov (12034)

      Wrong analogy. BT is a UK company.

      Read job adverts for this class of UK company IT architects on jobserve. They are _VERY_ explicit that the job of the architect is only to shop-n-ship. There is no allowance to collect reqs for a made-to-order job or spec-out an in-house system. If it is not supported by an off the shelf package it will not be. Period. The "We are not software developers" mantra taken to its ultimate limit.

      My educated guess that the hodgepodge of systems delivered by 3 subcontractors for th

  • by Anonymous Coward

    I mean, IT is always the irresponsible bad guy, right? It couldn't be someone else told them not to do it because it took too long, or was a waste of money, or...

    • by nomadic (141991)
      I mean, IT is always the irresponsible bad guy, right?

      On slashdot IT is never the bad guy. It's always some mythical manager who must have ordered them to do what they do. Why can nobody here ever believe a programmer/engineer/IT guy was incompetent?
  • Just another whitewash...

  • Hm...lack of alarms...leading to a catastrophic engineering failure...where have I heard this story before...
  • I don't even want to know how much tax payer money was pissed away for that "key verdict" - having worked with quite a few monitoring and alarm systems for years I can tell you that most of the time "automatic alarms" get ignored and in fact can cause worse problems when an actual real alarm does occur because of how the operators tune them out - seems like they completely missed the mark on this - the real problem is most likely where you would expect it, the people running the system - human error I am s

    • I don't even want to know how much tax payer money was pissed away for that "key verdict" - having worked with quite a few monitoring and alarm systems for years I can tell you that most of the time "automatic alarms" get ignored and in fact can cause worse problems when an actual real alarm does occur because of how the operators tune them out - seems like they completely missed the mark on this - the real problem is most likely where you would expect it, the people running the system - human error I am sure !

      I think everyone's familiar of that phenomenon regarding the alarm that cried wolf due to all the car alarms. Rarely do people even turn their head when they hear a car alarm.

      I think I'm gonna make a "Let's blame IT!" t-shirt cause it's pretty popular theme. Seems to me that the hardware for detecting the problems was there, but the software required "the right person to be looking at the right data at the right time" which sounds vaguely like "the software requires training". If the data output is coming a

      • by hedwards (940851) on Saturday January 08, 2011 @03:12PM (#34805946)

        I think everyone's familiar of that phenomenon regarding the alarm that cried wolf due to all the car alarms. Rarely do people even turn their head when they hear a car alarm.

        Competent professionals don't do that. The problem with car alarms is that they aren't aimed at professionals, competent or otherwise, they're aimed at the general public and the mechanism they use isn't typically going to assure that anything is going on.

        Competent professionals like the ones that are supposed to be running rigs should know to check them out every time and not turn the alarm off withotu ascertaining that the alarm is in fact false. Disabling an alarm should only be done when there are adequate contingency plans in place to handle if the condition happened and how they would respond.

        I used to work security at a high rise and we'd often times have alarms turned off on portions of the building. It was the only way to ensure that under certain circumstances that work wouldn't cause a false alarm. It was done in a controlled way with plans in place to make sure that there was somebody keeping an eye on it while the work was being done, and that the alarms would be turned back on when they could be.

        And every time that building had an alarm go off which wasn't a known cause, it was always investigated promptly. Alarms that go off repeatedly need to be fixed, not disabled.

        • by thegarbz (1787294)
          The problem is who is the competent professional who is working on alarms?

          Is it the maintenance team who is backlogged with bullshit alarms that go off under normal process conditions because someone decided that it would work to prevent some disaster which may occur?
          Is it the process / technical team who decided yet another alarm will be cheaper than re-designing the process to meet the safety guidelines?
          Is it the console operator who has gone mental at the alarm going off constantly in the middle of
    • by Rob the Bold (788862) on Saturday January 08, 2011 @03:12PM (#34805954)

      I don't even want to know how much tax payer money was pissed away for that "key verdict" - having worked with quite a few monitoring and alarm systems for years I can tell you that most of the time "automatic alarms" get ignored and in fact can cause worse problems when an actual real alarm does occur because of how the operators tune them out - seems like they completely missed the mark on this - the real problem is most likely where you would expect it, the people running the system - human error I am sure !

      You don't even have to ignore the alarm that isn't there. But I don't think the "alert" that we're discussing is the big klaxon/flashing sign reading "OIL LEAK," or an oil pressure light with electrical tape over it. What the article indicates was missing was an automatic method of indicating that a failure was imminent. As far as the cost of determining this: learning from mistakes can be expensive. Not learning from mistakes is likely even more so.

    • by DMiax (915735)
      It seems to me that the verdict states that the probability and possible damages of a human error were too high, due to poor planning of safety features. I don't think it is really much different from what you say.
  • Is common practice everywhere "why buy a 5 dollar alarm when we can force some engineer to watch figures for days on end?" Gosh people hate engineers for no reason.

    • do you mean "why buy a $5 alarm when we could pay an engineer thousands of dollars a year to do the same thing?"

      I have a phrase that you should practice: "would you like fries with that?"
      and "paper or plastic, sir?"

      very good now again, in Chinese.

      gosh people PAY engineers for no reason...

    • by omglolbah (731566)

      Unfortunately, a single alarm configuration on a "tag" could cost anywhere from 10k to 100k dollars.

      The configuration isnt all that hard or time consuming but the testing of the system after the modification is brutal. At least here where it has to be certified to be allowed into operation ;)

      • If it costs that much, you are doing it wrong. A good engineering team should be able to make something work very well for only a few hundred to a few thousand dollars.

        • by omglolbah (731566)

          Doing the change: 3-4 hours of work.

          Organizing the update to the controller in the field?
          - Requires a look into what could be influenced by the change
          - Requires in some cases an 'offline' load of the controller which can only be done at a time of a maintenance downtime (once a year at most, sometimes every 2-4 years)

          Documentation:
          - Documentation of what functionality changes for operators
          - Update of system configuration diagrams
          - Update of various tag info in the plant documentation system

          Install:
          - A job pa

  • I wonder if the US government would go after it quite so much ? There does seem an attempt to play up the blame on BP and not the part played by Halliburton & others.
    • by Anonymous Coward

      Two names: Exxon Valdez.

      There was a huge shit storm when they fucked up

      So to answer your question, yes.

      BP were the boys in charge and when it comes down to it, it was up to them to keep Haliburton et al. in line, so it was there fault. And it was also the regulators fault for dropping the ball and letting a big corp make them their bitches; which is usually the case with all US Government agencies.

      • by AGMW (594303)

        ... BP were the boys in charge and when it comes down to it, it was up to them to keep Haliburton et al. in line, so it was there responsibility. ...

        Fixed that for you ...

  • When will we get a governing body that can punish or apply fines for this and enforce those fines or punishments...seriously, we need to evolve with these types of companies that spit all over international laws (or lack of)

    • When will we get a governing body that can punish or apply fines for this and enforce those fines or punishments

      Two words: regulatory capture.

      • by stewski (1455665)

        I wonder when such investigations will occur in areas where Americans aren't affected? How is the behaviour of companies such as Exxon in the Niger delta [guardian.co.uk] being tracked, oh wait it isn't. Still that doesn't matter, because it doesn't affect fat American business men!

        • I wonder when such investigations will occur in areas where Americans aren't affected? How is the behaviour of companies such as Exxon in the Niger delta [guardian.co.uk] being tracked, oh wait it isn't. Still that doesn't matter, because it doesn't affect fat American business men!

          That's just silly. If a foreign corporation is allowed to do business in your country, it is your government that should perform due diligence and make sure that said corporation is obeying local regulations. If it doesn't, then it should take appropriate action, whatever that might be.

          • by stewski (1455665)

            So if an individual (which a corporation is legally termed) behaves objectionably abroad it is no business of the government from which the individual came from? Don't get me wrong "when in Rome" and all that is fine. But how would the US government react to a US corporation working in north Korea on weapons development, I mean all the work would be obey local regulations...

            Your suggestion suggests a level of naivety that I would categorise as in-genuine; to the point of drawing parallels to three monkeys c

            • So many good points, I would hate to bring it to an end, but I believe that there should be a one track international sanction that needs to be followed in matters that affect environment in such a way that it could affect other nations indirectly (like this spill)....and that governing body should be forceable enough to make all think twice, (like the US bypassing the nato sanction not to invade, sort of like we heard you but dont care and will still do this....) can you imagine if they could actually come

  • like
    1) someone have alarm systems available but noone wants to buy them.
    2) and they saw the disaster as a good opportunity to sell more of them
    3) and announcing that deepwater horizon lacked them sounds like a good business plan
    4) just to guarantee that they will have customers for longer period of time
    5) government is going to make them mandatory for any such operations
    6)
    7) profit

  • Nagios (Score:4, Funny)

    by IceCreamGuy (904648) on Saturday January 08, 2011 @03:02PM (#34805870) Homepage
    Haven't they been on Nagios Exchange recently? check_catastrophe.pl has been out for like 3 years!

    check_catastrophy -H blowout-preventer716.haliburton.com -w ANY_LEAKS - c ANY_FRIGGIN_LEAKS
  • Lots of educated engineers, and this probably could have been fixed with a daemonized perl script that could send a trap to an snmp monitor if conditions got beyond a certain point. Or something like that. I'm sure they had more complex monitoring software, but obviously missed something simple along the way.
    • by hedwards (940851)
      Bad idea, the issue wasn't that the alarms were broken so much as they were ignored for going off too frequently. And rather than address the issue of the frequent occurrences they opted to shut them off. It's unlikely that you're going to solve that by programming around that. Programming around it is more or less the same thing as turning the alarms off or ignoring them.
      • by digsbo (1292334)

        Operator: "Disk alarm - disk is at 80% capacity."

        Manager: "Increase the threshold to 90%."

        • by omglolbah (731566)

          Operator: "I cant do that, that has to be run through the PCDA office and certified by the technical staff first."

          Manager: "Ok, I'll submit the paperwork"

          PCDA: "This is a bad idea, lets fix it instead..."

          Or something like that is how it goes here :p
          If it even passes the manager. Most of the time the technical staff handles the alarms without telling any 'manager'. The operator responsible for the shift has authority over the day to day operation without any manager interference.

          You cant operate if non-techi

  • BP's monitoring IT systems on the failed Deepwater Horizon oil rig relied too heavily on engineers following complex data for long periods of time, instead of providing automatic warning alerts.

    So, in other words, let's replace engineers who are on the spot and have some feel for what is going on with software that might not know what to do when something bad happens, and is dependent upon settings provided by people who apparently weren't able to recognize the signs of disaster until it was too late anyways. Regardless, I have the feeling there were plenty of alarm systems involved in this disaster, and I'll wager that the relevant ones were either incorrectly programmed or were turned off becaus

    • Don't replace the engineer. Give them tools that enhance their ability to see impending problems and predict the output of the system in it's current state. However, even given the best tools, if someone chooses to ignore the warnings and over-ride the automation then "accidents" will happen.
      • Don't replace the engineer

        I wasn't saying that, but it looks like that report is just another example of blaming the technical people for systemic failures of management.

        Typical. Absolutely typical.

      • Re: (Score:2, Insightful)

        by omglolbah (731566)
        It all comes down to redundant barriers.

        A B C
        1 ->-0-->--| |
        | | 0
        2 ->-0 | |
        3 ->-0-->--0-->--|
        | | 0

        A, B and C are various barriers.
        A = Automation (automatic shutdown on severe alarms etc)
        B = Procedures (Check X before doing Y)
        C = Operator Training

        As you can see her
        • " We dont want to replace anyone but we -do- want to add more barriers!"

          Who is "we"? For all that matters, the manager is not part of "we": all he wants is his bonuses.

          • by omglolbah (731566)

            I'm sadly not allowed to disclose the company name due to an NDA, but it is one of the largest in norther europe.

            At the particular company where I work we fucking HATE the shoddy work and failed procedures of this disaster. It makes us all look like asshats.

            The people in charge of the technical things here are actually not the people who are trying to get bonuses. The government oversight on the security of such sites and rigs is so strong as to be borderline anal. And personally I am fine with that. I woul

            • "The people in charge of the technical things here are actually not the people who are trying to get bonuses."

              That's why I asked for your definition of "we". Of course the engineers dislike appearing like asshats.

              "The government oversight on the security of such sites"

              So being a representative democracy, I'd say goverment is the kind of "we" to be in control in managing such externalities instead of "we", the high managers that get the bonuses.

              Of course, your government is one of those damn communist ones,

  • by Anonymous Coward

    I don't have a source. But CNN has coverage that engineers warned that the blowout preventers were going to leak, and BP ignored them. This is a corporate failure, as much as it is a technical one.

    • by omglolbah (731566)

      Yep.

      Both are bad.. Together they are absolutely cataclysmic.

      Complete failure of barriers here. Have a gander at my other comment about the idea behind those barriers.

      http://news.slashdot.org/comments.pl?sid=1942186&cid=34807134 [slashdot.org]

    • by AGMW (594303)

      I don't have a source. But CNN has coverage that engineers warned that the blowout preventers were going to leak, and BP ignored them. This is a corporate failure, as much as it is a technical one.

      I certainly saw engineers from Transocean, or was it Halliburton, saying something like that. Luckily we can obviously trust those engineers because they (and the company they work for) has nothing to gain from saying it.

      Of course, it could be argued that if those engineers, who presumably worked for Transocean (who owned and operated the rig) knew there was a problem and did nothing about it then they, and the company they work for, are left holding the smoking gun!
      Unless we allow the "ve vere only fol

  • Does it seem a little wrong to call it an 'IT system'? Control system, SCADA, or embedded system maybe, but IT?
    • Does it seem a little wrong to call it an 'IT system'? Control system, SCADA, or embedded system maybe, but "IT?"

      Was not Information moving around? Was not that Information moving around by Technical means?

      Automatic control systems are IT, Supervisory Control And Data Acquisition systems are IT, signaling embedded systems are IT.

  • by AGMW (594303) on Saturday January 08, 2011 @05:31PM (#34807514) Homepage
    it was Transocean that owned and operated the rig?, so perhaps the story could better be titled:-

    Transocean Gulf of Mexico Rig, leased to BP, lacked Alarm Systems

    • by geekbrad (1595727)
      Technically, as in "what does the paperwork say", of course, you're right. Though the Deepwater Horizon had drilled under lease to BP since it was built - before Transocean was even involved. Your headline makes it sound like BP just borrowed a screwdriver from them, rather than having had exclusive use of this rig since inception.
  • by magus_melchior (262681) on Saturday January 08, 2011 @05:47PM (#34807736) Journal

    They had this exact problem with Texas City-- they didn't do maintenance on the systems, so a subsystem overfilled with volatile hydrocarbons with no alarms going off at all-- and when one alert sounded at the monitoring area, they ignored it. They didn't invest the (relatively) small cost of installing a flare (to burn off excess), so the excess hydrocarbons spilled out into the open. Cost-cutting and an incredibly cavalier approach to maintenance from the London management generated a fucking fuel-air bomb in Texas.

    This is one instance where the Brit management, when they changed to Hayward, should have told their investors to "fuck off-- er, give us a few years" and spend the necessary money to get their facilities up to snuff, or decommission the facilities that are too costly to maintain. Alas, profit motive proved more powerful than basic empathy or responsibility.

    • by thegarbz (1787294)
      Different problem different situation. An alarm is only valued if it's actioned. The problems at Texas were as you put it cavalier approaches, but not to maintenance, to everything. Someone's too man enough to follow the instructions and instead rely on instinct to guide them. What use is a high level alarm on a fractionation tower when operators will routinely and against procedure start up the unit with the level instrument overfilled. When starting up an average unit there can literally be hundreds of al

Real Users hate Real Programmers.

Working...