Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
Microsoft Media Music

The Exact Cause of the Zune Meltdown 465

Posted by kdawson
from the off-by-one-every-four dept.
An anonymous reader writes "The Zune 30 failure became national news when it happened just three days ago. The source code for the bad driver leaked soon after, and now, someone has come up with a very detailed explanation for where the code was bad as well as a number of solutions to deal with it. From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off. Worse yet: this bug affects every Windows CE device carrying this driver."
This discussion has been archived. No new comments can be posted.

The Exact Cause of the Zune Meltdown

Comments Filter:
  • Wow. (Score:4, Funny)

    by LeadLine (1278328) on Sunday January 04, 2009 @06:08PM (#26323741)

    It wasn't a bug! It was an unexpected feature!

    Microsoft is taking a stance against teenagers blowing their ears out with loud music.

    • A simpler reason (Score:3, Insightful)

      by kaiwai (765866)

      How about a much simpler reason - it plain well sucks giant donkey balls.

      We're talking about a device which only works with Windows, only available in a small mumber of countries (I don't give a shit about the music service - you can put music on it without a fucking music service so the need to 'roll out the service' is a bullshit excuse) and the software sucks balls.

      Its a top to bottom epic failure - and its in the mold of Microsoft NEVER to learn from these failures or more correctly, learn from its riva

  • Warning, Y2.1K bug. (Score:4, Informative)

    by LostCluster (625375) * on Sunday January 04, 2009 @06:09PM (#26323749)

    Just before anybody claims to have a foolproof solution to leap years, make sure you test against the year 2100. It's a multiple of four, but also a multiple of 100 that's not a multiple of 400... and therefore NOT a leap year.

    • Re: (Score:3, Funny)

      by Yvan256 (722131)

      What about other years which are multiples of four, also multiples of 100 but not multiples of 400?

      • by LostCluster (625375) * on Sunday January 04, 2009 @06:16PM (#26323829)

        Here's your 500 year plan:

        1900 - multiple of 100, not a multiple of 400, no leap day.
        2000 - was a multiple of 100, but also a multiple of 400 so we still had a leap day.
        2100 - see above
        2200 - not a multiple of 400, no leap day.
        2300 - not a multiple of 400, no leap day
        2400 - multiple of 400, so have the leap day anyway.

        • by Anonymous Coward on Sunday January 04, 2009 @07:29PM (#26324435)

          For Slashdotters you lot seem pretty confident the Zune is going to be around for awhile.

        • by Anonymous Coward on Sunday January 04, 2009 @10:14PM (#26325637)

          I can't help but imagine how I would be directed by work to "solve" this problem.

          First, they would tell me that it's too difficult, expensive, and complicated to implement the correct solution. Even if I gave them a working prototype, they wouldn't change their minds.

          Then they would tell me "just assume every 100th year is not a leap year." So I would do that instead. In the time from 2100 to 2400, they would say that "a better solution is due to come out next quarter." They would say this every quarter for 299 years.

          In 2399, they would finally give me permission to fix the problem. But the leap year-calculating code works, and they don't want me to mess with it. Instead, they'd tell me to add a test when the program starts to see what year it is. If it's 2400, then it will refuse to run. (We'll definitely have a better solution in place by Q1. Definitely.)

          But the program often runs for an extended period of time without being restarted, so it's possible that someone will start it in December 2399 and it will still be running in February/March 2400. Management has a simple fix for this one: calculate the average run time for the program, add a margin of error, and use that to determine the actual "upper limit" on when the program is allowed to start. My boss would be really excited about this, because it would allow us to refine our earlier not-after-January-1st estimate to be "completely accurate."

          Unfortunately, we don't know the average run time for this program. So I'm told to add code to it to track when it starts and ends and store the results in a file. When the program starts, it examines that file (in addition to recording its own start time), calculates the average run time, adds 10% (there are still director-level meetings about whether we should round up to the nearest hour or day), and subtracts that value from February 28th, 2400. If the current timestamp is greater than or equal to the result we got from that, the program won't start.

          That's pretty good, but my boss would be worried about the program crashing. If that happens, after all, we won't know the program's end time -- never mind that it's November by now and there's no chance of getting useful data no matter what -- so instead of logging an end time, the program logs a heartbeat every minute. Now, you can determine when the program ended -- to within a minute! -- simply by looking at the heartbeat timestamps. When you encounter a gap of more than 1 minute (plus a small margin of error), you know the program ended. This has the bonus, my boss tells me, of simplifying the design by only requiring you to log one type of message to the file. He also assures me that this "telemetry data" has the potential to be "really useful for data mining." He talks about adding information on CPU time consumed, memory in use, I/Os, all sorts of stuff, then putting it in a database to be retrieved later. I manage to talk him out of it by pointing out that "the better solution [with which I am completely uninvolved] will be out in just a few months, so you should just make sure it makes it into that instead."

          Not that I'm bitter.

    • Re: (Score:3, Insightful)

      by Gothmolly (148874)

      No need to hard-code, there's an established algorithm for computing this.

    • Re: (Score:3, Insightful)

      by gcnaddict (841664)
      The code in the freescale driver actually covers this. Check the IsLeapYear() function in the code (line 162):

      static int IsLeapYear(int Year)
      {
      int Leap;

      Leap = 0;
      if ((Year % 4) == 0) {
      Leap = 1;
      if ((Year % 100) == 0) {
      Leap = (Year%400) ? 0 : 1;
      }
      }

      return (Leap);
      }
    • by Bozdune (68800) on Sunday January 04, 2009 @10:23PM (#26325681)

      If my code's still running in 2100, our society has got way bigger problems than me not figuring leap years correctly.

  • And you are a programmer, I would highly recommend the book Deep C Secrets [amazon.com]. It's partly practical and partly culture. It covers some well (and lesser) known bugs that while very small and "stupid" had very real consequences.

  • Import calendar? (Score:5, Insightful)

    by TurtleBlue (202905) on Sunday January 04, 2009 @06:18PM (#26323841)

    "From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off."

    I can't remember the last time a QA department was asked to test date functions... but then again, I can't remember the last time anyone wrote their own Leap Year calendaring calculator from scratch.

    I'm sure there are a hundred reasons to do it (licensing being one of them) but really, when was the last time you didn't just import calendaring from another library and call it a day?

    Please clarify to me if this is something at the hardware driver level: I honestly don't know. If this were me, my own bosses wouldn't ask "Why didn't QA catch this", as much as "why are you wasting time writing your own calendar code? And then why didn't you flag it as functionality that needed to be tested?"

    • Re: (Score:3, Interesting)

      by gcnaddict (841664)
      Well, that might explain why the same clock doesn't exist in the subsequent Zunes, but who knows.

      I'm more disturbed by one of the comments [aeroxp.org] and the subsequent reply.
    • Re: (Score:3, Insightful)

      by nedlohs (1335013)

      FOr fuck sake, how do you manage to read that part and not read the part before it: "source code for the bad driver".

      The QA person/people testing the driver for the real time clock better damn well be testing date and time stuff.

    • Re:Import calendar? (Score:5, Informative)

      by Anonymous Coward on Sunday January 04, 2009 @06:31PM (#26323953)

      It is driver code supplied by the manufacturer of the hardware platform on which the Zune and a couple of other devices are built. This platform includes a real-time clock which counts seconds since midnight and days since 1/1/1980. Considering that hardware component prices are cut-throat, there is probably no quality management for the software whatsoever. If it appears to work, it ships.

      • by TurtleBlue (202905) on Sunday January 04, 2009 @06:47PM (#26324099)

        Thanks - that makes a tad more sense. I see everyone running around blaming Microsoft for the code since their name is on the product, even if it was a 3rd party vendor. They certainly are still liable for all the busted Zunes, but I couldn't imagine Microsoft didn't have *some* C leap-year code sitting around that actually worked, and could be compiled for any chip they wanted.

        Microsoft still has to take the hit up front, but then they'll sue or "renegotiate contracts" with the vendor that supplied the bad driver code, based on what it costs them.

        I'm still shocked that the manufacturer couldn't dig up *some* free/open calendaring code that's was around pre-2004. But hey, at least we know they were honest about not ripping off some other source code and calling it their own.

    • Re:Import calendar? (Score:5, Informative)

      by nato10 (600871) on Sunday January 04, 2009 @06:50PM (#26324123)

      This is kernel-level code -- part of the OEM Abstraction Layer -- that is used to read the current time from the RTC, hence it is hardware-specific. RTCs on other processors, or Freescale-based devices using external RTCs, may implement the OemGetRealTime () function differently than Freescale has done here (the buggy ConvertDays () function is just a helper function).

    • Re:Import calendar? (Score:4, Informative)

      by AuMatar (183847) on Sunday January 04, 2009 @07:30PM (#26324453)

      It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible- most embedded devices do not have a lot of rom space- what they have is measured in MBs, not GBs. Remember not all the world is cell phones and mp3 players. In that case writing your own leap year function is the correct answer- existing calendar libraries likely have far more functionality than you need and would blow out your size. Given a choice between statically linking an entire calendaring library and writing a simple IsLeapYear function, writing the leap year function is the correct choice for that environment.

      • Re:Import calendar? (Score:5, Informative)

        by profplump (309017) <zach-slashjunk@kotlarek.com> on Sunday January 04, 2009 @07:56PM (#26324603)

        It's really probably not. Most of the basic calendar functions in libc (or glibc or dietlibc or uLibc) were written for 8 MHz machines running with 1 MB of system memory -- they'd do just fine on your embedded system.

      • Re:Import calendar? (Score:4, Informative)

        by TrekkieGod (627867) on Sunday January 04, 2009 @08:30PM (#26324849) Homepage Journal

        It was found in driver code. Part of the goal of driver code is to be as lean and mean as possible

        He failed. In the function in question he had the number of days since Jan 1, 1980. At the end of the loop, he was supposed to have the number of years since 1980 + the number of days since the beginning of the current year. His solution was to iterate the year beginning from 1980, check to see if it's a leap year, then subtract 365 or 366 days accordingly. The loop would supposedly continue until the desired state is achieved but, because of the bug, became an infinite loop at the end of leap years.

        Not only was his function not "lean and mean" but it actually gets more expensive to run every year that passes :)

        I'm also curious as to why 1980 is the epoch, but that's not as important.

        • Re: (Score:3, Informative)

          by lewiscr (3314)

          I'm also curious as to why 1980 is the epoch, but that's not as important.

          MS-DOS defines the epoch as Jan 1, 1980.

    • Re:Import calendar? (Score:5, Informative)

      by jellomizer (103300) on Sunday January 04, 2009 @10:44PM (#26325811)

      Or from a more basic standpoint...
      People make mistakes.

      When testing leap year for a data set you like to see if you have a Febuary 29th and A March 1st, as well the days of the week are updated after the leap day. December 31st isn't on the top of days to check for leap year code.

      Secondly coding for date times even with good prebuilt libraries is a pain. Unfortunately Time and Date are not really good mathematical functions. 365 days a year except for every 4 years where there is 366 The subset of year is split to months where each value is different of having 28, 29, 30, 31 days in it. Then we have 7 days weeks, which do not divide nicely with any other greater time unit (except for the 28 day month, which is only happens once a year... except for a leap year) Now each day has 24 hours, split into 2 12 hour segments, each hour is split up with 60 minutes and then 60 second per minute. Then finally after the second we can start using the Metric niceity in programming. Oh! Oh! don't forget about TIme Zones, and Daylight savings time (which is different per country, state, and follows political lines more then geographic lines.), And if you are going at high speeds for aerospace applications those Crazy Einstine theories come into play.
      Now no one really goes with the same approach to follow all these crazy rules and having a common library is still tricky because we all do different math calculations, also when you do a time++, do you want it one more second like in Unix/Linux OS development or one extra day like in Microsoft SQL. Then when you get these values sorted or a quick search/filter. and you may need to sort them etc. American Time Format doesn't do a good job at this. So we need to switch it to European formats. All in all it is a lot of tough coding all of it is tough to QA Because you need to test all the times to truely know that there is no bugs in it.

  • "Leaked"...? (Score:5, Informative)

    by Anonymous Coward on Sunday January 04, 2009 @06:18PM (#26323843)

    It's an open source driver from Freescale.

  • by feepness (543479) on Sunday January 04, 2009 @06:23PM (#26323893) Homepage

    From a coding/QA standpoint, one has to wonder how this bug was missed if the quality assurance team wasn't slacking off.

    MSFT's QA team hasn't been slacking off. They haven't slacked on since about the mid 90s.

    • You proceed from a false assumption... that Microsoft ever really had a QA team in the first place.
      • Re: (Score:3, Funny)

        "You proceed from a false assumption... that Microsoft ever really had a QA team in the first place."

        They do - it's called "customers."

  • QA?? (Score:2, Interesting)

    by Gorimek (61128)

    This kind of bug is where TDD shines. If you don't write any code unless you have a test that forces you to, it's very hard to produce this bug type.

    (TDD = Test Driven Development [wikipedia.org])

    • This kind of bug is where TDD shines.

      I'm not so sure. Let's look at the timeline without TDD:

      1) Microsoft writes method (say, one hour).
      2) Microsoft discovers but on December 31st, 2008
      3) Microsoft spends one hour fixing bug (assuming documentation and source control and test of fix)

      Now lets look at that timeline with TDD:

      1) Microsoft writes method (let's say one hour again)
      2) Microsoft writes test for method. Test includes random dates but not December 31st, 2008. One hour.
      3) Microsoft discovers but on

      • Re: (Score:3, Interesting)

        by Gorimek (61128)

        There is some confusion here

        Now lets look at that timeline with TDD:

        1) Microsoft writes method (let's say one hour again)
        2) Microsoft writes test for method. Test includes random dates but not December 31st, 2008. One hour.

        That is not Test Driven Development.

        In TDD, you actually let the tests drive the development. You first write a test, then the simplest code that will satisfy it. Then add more tests/assertions and modify the code. Rinse and repeat until you've run out of edge cases. For a function like t

  • by msgmonkey (599753) on Sunday January 04, 2009 @06:30PM (#26323935)

    For example I had some code I developed on Windows CE 4.2 .NET which kept on hanging on calling the FindWindow() fuction call.

    Turns out that trying to find a window by class name will hang (this version of) CE every time, even though you would have thought its a very much used function call and would be caught by CE.

    So no I'm not surprised at all that this bug got through.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      If you dealt with Platform Builder you could see the WindowsCE source code. There were some amazing bugs in there. One that is more relevant that i saw in WindowsCE 4.x was the daylight savings changeover code.

      They had a function that returned whether or not time needed to be added for daylight savings. The function would have a list of changeover times for various regions. If the time was past the first changeover time of the year the function would add time, if the function was past the second changeover

  • by canuck57 (662392) on Sunday January 04, 2009 @06:32PM (#26323969)

    Looking at that code, it never had effective code review or Q/A. If I was the manager responsible I would be looking up those who signed off on the code in the last review. I didn't spot one, but 4 issues in that code and would not doubt more exist. Second off, there are much simpler ways of doing this in the C libraries, and simplicty has value.

    But the design, I suspect is very flawed. Why not use asctime() and rely on it's more proven calculations of leap year and the like via the OS libraries?

    And when you see something like this, you know someones brain was in the off position:

    556 day -= 366;
    557 year += 1;

  • Lines 122, 521, 690, 710, and 748 scare me; gotos in C code...
    • by KiltedKnight (171132) * on Sunday January 04, 2009 @06:43PM (#26324061) Homepage Journal
      Ever written code for an OS or device driver? You use them there... frequently... as "get me the frack out of here because of a fatal error"...

      Never mind that if done properly, there is nothing wrong with using a goto statement... just make sure that you only move in one direction... ideally "down" towards the end of the function, not somewhere else in the whole program.

    • by concernedadmin (1054160) on Sunday January 04, 2009 @06:45PM (#26324089)

      Lines 122, 521, 690, 710, and 748 scare me; gotos in C code...

      They've used one form of a goto that's actually quite readable and useful. Would you rather have:

      if (condition1 && condition2) {
      /* boilerplate code with a return */
      }

      if (issue1 || issue2) {
      /* same repeated boilerplate code with a return */
      }

      or

      if (condition1 && condition2) {
      goto cleanup;
      }

      if (issue1 || issue2) {
      goto cleanup;
      }
      cleanup:
      /* just one instance of this code,
      no need for duplication of efforts */
      Believe it or not, there are useful reasons to use goto, and Microsoft happened to use goto for the right reason here. The Linux kernel also happens to use this practice to boost the readability of the code.

      • Re: (Score:3, Insightful)

        by jd142 (129673)

        Why not this:

        function cleanup():void
        { //do something
        }

        if (condition1 && condition2) {
        cleanup();
        }

        if (issue1 || issue2) {
        cleanup();
        }

        If something is done exactly the same way twice, that's a function. Heck, if something is sufficiently complicated that it makes the main code easier to read, that's a function too in my book.

        Of course, I prefer my braces to line up vertically, so what do I know?

        • by AuMatar (183847) on Sunday January 04, 2009 @07:19PM (#26324367)

          Because cleanup doesn't have access to the local variables of the calling function. This means they need to be passed in. The result is a very obscure function that takes in half a dozen or more variables and gets difficult to maintain since it's purpose makes absolutely no sense without the context in the calling function (not to mention easy to have bugs- forget to check just one pointer for null before using it and you're into undefined behavior, which may only occur in rare error conditions making it difficult to test for). Using a cleanup function like that just isn't practical.

          • by QRDeNameland (873957) on Sunday January 04, 2009 @08:35PM (#26324885)

            The addition of single bool avoids both the specialized cleanup() function and the goto:

            bool needs_cleanup = false;

            if (condition1 && condition2) {

            needs_cleanup = true;

            }

            if (issue1 || issue2) {

            needs_cleanup = true;

            }

            if (needs_cleanup) {

            // clean up local vars exactly as you would have done

            // have done under the cleanup: label with the goto

            }

            • Re: (Score:3, Interesting)

              by dkf (304284)

              The addition of single bool avoids both the specialized cleanup() function and the goto:

              [...]

              The problem with that is that it tends (in real code) to either greatly increase the depth of nesting of the code (making it harder to understand) or, worse yet, spray a great collection of various flags indicating the various types of cleanup required and what conditions have and haven't been checked. Making conditions more complex isn't a good plan at all for maintenance, since it increases the difference between the pseudo-code for the function (i.e. the level that you think about, without all the grotty

  • by gnasher719 (869701) on Sunday January 04, 2009 @06:36PM (#26323995)

    Both the original code and the various corrections in the article don't catch what the algorithm is supposed to do, and therefore create code that is too complicated.

    The essence of the algorithm is this: We start with number of days since 1/Jan/1980, with the first day having the number one. We want to end up with the correct year, with a day number relative to the first day of that year, with the first day again having the number one. So we set year = 1980. And as long as day is greater than the number of days in that year, we can't have the right value yet, so we change day and year accordingly. This produces a very simple loop:

    for (;;) {
        int daysInYear = IsLeapYear (year) ? 366 : 365;
        if (day = daysInYear) break;
        day -= daysInYear; year += 1;
    }

    This is what Knuth called an "N + 1/2" loop: A loop pattern where a more or less substantial bit of code has to be executed at the beginning of the loop before we can decide whether the loop needs exiting or continuing. By following the "N+1/2 loop" pattern we avoid repeating the same code (with possible small changes) completely. And that exactly was the problem here: The same code was used twice but slightly differently (one set number of days = 365, the other made it dependent on whether the year was a leap year or not). The solutions given in the article all contain repeated code; either two loop exits, or a duplicated calculation of the number of days in a year.

  • integer function f_isleap(year)
    IMPLICIT NONE
    c
    c Purpose :: Return 0 if a year is NOT leap year and a 1 otherwise.
    c
    c Description: Every fourth year is a leap year. c But NOT when divisible
    c by 100, except if the year is divisible by 400.
    c
    integer Year
    if((MOD(Year,400).eq.0) .or.
    % ((MOD(Year,4).eq.0) .and.

  • by nato10 (600871) on Sunday January 04, 2009 @06:41PM (#26324051)

    This code is actually from the Windows CE OAL (OEM Abstraction Layer), part of the code that reads the current time from the RTC. As such, the implementation is hardware-dependent, which is why there isn't a standard implementation of this function for Windows CE.

    In addition, this code is in a portion of Windows CE source code provided by a device's BSP developer, not by Microsoft. In most cases, Windows CE BSP developers start with sample BSPs written by a processor's manufacturer -- in this case, Freescale -- and then improve it.

    It turns out that this bug is specific to the Freescale's BSP -- sample Windows CE BSPs for other procesors don't have it -- and other Freescale devices using Windows CE will only have this issue if their developers used this code verbatim. Since sample BSPs provided by processor manufacturers are often of poor quality, many Windows CE developers typically rewrite such functions. In other words, the impact of this particular bug may be quite limited, which may be why there haven't been reports of this issue on other devices.

    In this particular case, though, Microsoft (or a contractor) was the Zune's BSP developer, so they certainly should have caught this.

    • by TheSunborn (68004) <tiller AT daimi DOT au DOT dk> on Sunday January 04, 2009 @06:58PM (#26324203)

      I still wonder: Why is code that translate from a number of days, to a year hardware dependent?

      Getting the number of seconds since epos is hardware depending, but translating this to other time measurements should not be,
      unless they are building a time machine.

      • Re: (Score:3, Interesting)

        by petermgreen (876956)

        Different RTC chips measure time in different ways. This particular one used time of day and a day count afaict. Some however give you a time broken down into hours,minuites,seconds,days,months and years.

        So if the API was designed arround the latter style of RTC chip the hardware vendor would have to write code to convert to the format the API expects and when writing driver code you generally can't just go and call your regular libraries.

  • Not so uncommon (Score:3, Interesting)

    by fermion (181285) on Sunday January 04, 2009 @07:05PM (#26324265) Homepage Journal
    These functions that are only used once in great while are the devil to test. I think anyone who has programmed in any complex situation will have to admit to one of these silly bugs, and maybe even the bug going to production.

    What I see here is a really convoluted piece of code to perform a really simple task. There are a lot of constants that are written as constants. If there a #define orginyear, the why not #define daysperyear and #define daysperleapyear. The first is used only once, while the rest are used twice.

    In any case, the fundamental problem is not encapsulating data. This is quite a common error is code architecture. In this case, this function knows a lot of things it does not need to know. It know about leap years, number of days, and all this confuses the reader. They layout of the function already has the overhead of a fuction call, so why do we not let this overhead work for us by not returning the proxy leap year boolean, but what we actually want, which is the number of days in this year.

    int daysperyear;
    for(;;)
    {
    daysperyear=howmaydaysthisyear(year);
    if(days>daysperyear)
    {
    days -= daysperyear;
    year++;
    }
    else
    {
    break;
    }
    };

    In this case all days per year information and leap year information is encapsulated in a single function, and the top function does not need to know about either. This, I think, is writing quality into code, and not depending on QA to catch mistakes common to novice programmers. No guarantee this will work as is, it is just psuedo code, not even checking the logic completely.

  • by kabloom (755503) on Sunday January 04, 2009 @07:53PM (#26324579) Homepage

    The proper way to do this would be with division and modulus, which gives you a nice constant time solution even if you're still using your Zune in 2108. They ought to read Calendrical Calculations [amazon.com] by Nachum Dershowitz and Ed Reingold and learn how to do this properly.

  • WWSD? (Score:3, Interesting)

    by qazwart (261667) on Sunday January 04, 2009 @08:05PM (#26324671) Homepage

    Way back in the pre-Cambrian days when I actually was a decent C programmer, there was a book chalked full of algorithms. I can't remember now if it was the "Stevens" book or the "Stevenson" book. It was our bible. Our guide. The holiest book in our bookshelf. Whenever we got the yen to do some programming, we always took out the "Stevens" book and asked ourselves "What Would Stevens Do?"

    In this day, is there not one such book or place where someone says "Gee, I have to write some code that will calculate the date, day of week, and year from a fixed day. I wonder if I can look up this bit of code in some reference book, and do it right the first time?"

    And, then the second question: Why in the heck does the Zune care a fig about today's date? I believe there's some other device on the market that rhymes with "Shapple ShiPod" that does something similar to the Zune and yet doesn't care one whit about today's date. I won't claim that particular device is error free, but I but you a couple of doughnuts that it won't freeze up the day before a big holiday because it doesn't realize that 2008 has 366 days in it.

  • Not QA's fault (Score:4, Insightful)

    by Sleepy (4551) on Sunday January 04, 2009 @08:41PM (#26324943) Homepage

    "evidence of QA.. slacking off"

    These comments routinely come from two groups:

    1) Software Developers
    2) Joe the Plumber

    Or put another way: elitism or ignorance.

    If a software division is letting QA "test" all on their own, that's a recipe for disaster... and it's the head of engineering at fault.

    See, software testing does not occur in a vacuum, no more than developers code without a list of requirements from Sales or Marketing.

    Engineering takes takes the requirements, use that to produce an agreed upon set of specifications.

    QA follows the same model... they take the software specs and derive a set of effective tests.... tests which are agreed upon by Engineering, and signed off on.

    When I did QA, it was mostly for startups who lacked this kind of process. The result was QA was always 2 steps behind software that continually morphed: hardware changed, or the customer changed their mind. I'm not placing the blame on any 1 group here... I come from Support, then QA, and now develop. Startups can be rough.

    But at the end of the day, not documenting and agreeing on what the product and tests should be will cost you big time.. maybe 7 out of 10 times.

Wishing without work is like fishing without bait. -- Frank Tyger

Working...