Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
News Science

Data Science Community Rocked by Pet Adoption Contest Cheating Scandal (vice.com) 75

A team of programmers scraped a pet adoption website to cheat in a $10,000 contest that was intended to help shelter pets get adopted. From a report: Kaggle, an online data science community that regularly hosts machine learning competitions with prizes often in the tens of thousands of dollars, has uncovered a cheating scandal involving a winning team. The Google subsidiary announced late last week that the winner of a competition involving a pet adoption site had been disqualified from the contest for fraudulently obtaining and obscuring test set data. The fact that a team cheated in a competition nominally intended to help shelter animals also raises questions about whether the people who participate in machine learning competitions like Kaggle are actually interested in making the world a better place, or whether they simply want to win prize money and climb virtual leaderboards.

The competition asked contestants to develop algorithms to predict the rate of pet adoption based on pet listings from PetFinder.my, a Malaysian pet adoption site. The goal, according to the competition, was to help discover what makes a shelter pet's online profile appealing for adopters. The winning team's entry would be "adapted into AI tools that will guide shelters and rescuers around the world on improving their pet profiles' appeal, reducing animal suffering and euthanization," the competition site said. The algorithm from BestPetting, the first place team, seemed to almost perfectly predict the rate of adoption for the test set against which the submissions were evaluated, winning with a nearly perfect score of 0.912 (out of 1.0). As a reward for their winning solution, the team of three was awarded the top prize of $10,000. Nine months after the close of the competition, however, one observant teenager found that the impressive results were too good to be true.

This discussion has been archived. No new comments can be posted.

Data Science Community Rocked by Pet Adoption Contest Cheating Scandal

Comments Filter:
  • Why altruistic? (Score:5, Insightful)

    by thegarbz ( 1787294 ) on Wednesday January 15, 2020 @11:54AM (#59623288)

    Why is it everything that needs to be done has to be done for altruism? Let people climb leaderboards for selfcentred reasons. If they happen to make the world a better place as they go then all for them.

    I don't do work on safety systems to give me a cushy feeling inside and hug my co-workers every day. I improve safety while climbing the technical ladder to get paid, that doesn't change the results.

    Screw these cheaters and their cheat data, but no... It doesn't in the slightest raise questions about making the world a better place.

    • by optikos ( 1187213 ) on Wednesday January 15, 2020 @12:23PM (#59623358)

      Let people climb leaderboards for self-centred reasons.

      Um, you do realize that the inferior technology won, which would mean that, in the field-deployment of the inferior technology, excessive amounts of imperfect matches/cuteness-identifications will occur compared to the actually-well-performing technology. Hence, good animals die, while mediocre animals get a meh response from potential adopters, which in turn causes mediocre animals to die too. The cheaters aren't cheating legit human beings out of money as much as the cheaters are cheating good animals out of life.

      • by cusco ( 717999 )

        I'd put these guys in the same classification as the scum who sell fake anti-malarial drugs. There is something seriously wrong with some people.

      • "cheating good animals out of life."?
        Settle down there, francis.

        While I get your point about this competition cheating result leading to an inferior tech being adopted...this isn't brain surgery we're talking about.

        This was an algorithm to determine how effective pet-shelter advertisements were at getting the pets adopted. They weren't trying to make them ACCURATE, they were trying to make them APPEALING. As much as this cheater-algorithm likely didn't do that as well as a real one would have, at the same

      • I don't believe you are being fair to the poster. the discussion at hand is, what is the reason for doing the competition.

        It's already understood that people cheated, that it's wrong and it's corrupted the outcome. I am writing on Slashdot where I would like to think people like to know that the data is real and no one is cheating the system.

        the question seems to really ask, "are we doing it for the money".

        In which the poster replies ( and I summarize ) "yes, doing it for the money is why they do it"

        I happe

      • It is not clear in the article weather they cheated by obtaining the most and most accurate data possible to train their AI with, or they cheated by creating an ai that will only return the correct response to the test data and nothing else.

        Also, as an AI that will predict the adaptability of a pet advertisement is rather worthless outside of this contest, does it really matter?

      • Um, you do realize that the inferior technology won

        You missed my point. The inferior technology won due to cheating. All technologies were however developed. Additionally my point was more general: just because a single cheater was exposed does not mean that the incentive to climb a leaderboard automatically means that the world won't be made a better place as a result of personal self interest.

    • by DRJlaw ( 946416 )

      Why is it everything that needs to be done has to be done for altruism?

      Not everything has to be done for altruism, but this was to be done for altruism. Per the report:

      The goal, according to the competition, was to help discover what makes a shelter pet's online profile appealing for adopters. The winning team's entry would be "adapted into AI tools that will guide shelters and rescuers around the world on improving their pet profiles' appeal, reducing animal suffering and euthanization," the competition s

      • by Brain-Fu ( 1274756 ) on Wednesday January 15, 2020 @01:06PM (#59623504) Homepage Journal

        If they wanted only contestants motivated by altruism, they should have not offered a monetary reward.

        They offered a monetary reward and then were surprised that some contestants just did it for the money.

        • by cusco ( 717999 )

          No, they were surprised that some contestants cheated. The whole **point** was to bring in talent that would otherwise not be interested in helping shelter animals but who would be motivated by money.

          Why is this not obvious?

          • The world is full of people willing to cheat for money. People even break the law for money.

            If you offer money, you will attract such people. There shouldn't be any reason for surprise.

            The expectation that you can post a monetary reward, and then only receive a pool of contestants who are doing it for altruistic reasons, is naive.

            • by cusco ( 717999 )

              Well, maybe I'm naive but it does surprise me that people are willing to steal funds meant to assist shelter animals. Of course then I was also surprised that there were people who were selling fake antibiotics and anti-malarial medicines to the poor.

            • Well, what I suspect is the following:

              We will have in the future, very secure data sources prior to these competitions.

              I think of this as the first of many steps of creating a more and fair testing field. Very similar to when the USA started to get rid of the snake oil peddlers and the creation of the FDA.

              "The expectation that you can post a monetary reward, and then only receive a pool of contestants who are doing it for altruistic reasons, is naive." this is such a true statement and should be ingrained i

          • Why is this not obvious?

            If you want the police the care, you have to translate it into some sort of "think of the other puppies."

            There will be lots of hyperbole.

            Why is this not obvious?

        • by DRJlaw ( 946416 )

          They offered a monetary reward and then were surprised that some contestants just did it for the money.

          Which "some contestants" are you referring to?

          "On Twitter, Pleskov apologized on behalf of his team, and noted that he intended to return the prize money to PetFinder.my. 'For me, it was never about the money but rather about the Kaggle points: a constant struggle of becoming #1 in rating had compromised my judgment.'"

          • Did Pleskov apologize and offer to return the money before, or after, he got caught?

            I'll go ahead and make my point...acting noble after one gets caught is no evidence of noble intent. People do that to try and lessen the punishment.

            • YES, someone posted what the reason why they are admitting " do that to try and lessen the punishment "

              collect the money , then don't let them into the game again

        • Not necessarily. Some people would still enter the competition just so that they could possibly win it. Many of the entrants were probably motivated to see how their abilities stack up against other competitors as opposed to doing it because they particularly care about animals. They'd compete whether it's about matching people with rescue animals or matching them with used cars.

          The article even includes a quote from one of the people on the team that cheated who claims that they cheated to earn more poi
          • Yes, an event which both makes the world better AND offers a reward will attract both kinds of person (altruists, and profit-seekers). But it makes no sense to include a monetary reward and then expect that everyone attracted will be doing it for altruistic reasons.

            Same goes for the glory of winning, though money tends to be a more effective motivator, generally speaking, when the amount is high enough.

    • Actually a better question would be even if the competition has been completely fair, how is this altruistic?. Presumably, the people reading these descriptions have pretty much already decided to adopt one. Hence all this algorithm will do is change which pet is adopted and I'm not sure I see how the world benefits from dog A instead of dog B being adopted. What you need for society to benefit is something to convince people to adopt in the first place.
    • You may not do your work on safety system to give you a cushy feeling, but are you going to cut corners and release a faulty safety system, just so you can win the highest output employee of the year award?

      I work in healthcare because it is a steady job, and I get paid for the most part what I fell I am worth. However I am not going to skimp on patient care quality, just to beat a metric, even if I know I can get away with it, I still wont do it, It is called professional ethics.

      That said Altruistic actio

    • by Ranbot ( 2648297 )

      Why is it everything that needs to be done has to be done for altruism? Let people climb leaderboards for selfcentred reasons. If they happen to make the world a better place as they go then all for them.

      Good things absolutely can happen from self-centered individuals without explicit altruism. BUT, in this case the self-centered leaderboard climber didn't make anything that helped the world be better place. It was just a cheat, plain and simple. If you RFTA you would have read this: "their submission would have scored [about] 100th place... without the cheat.”

      If there's any silver-lining it's that Kaggle and similar groups hopefully learned to be more aware and careful of cheating, so that future com

    • Why is it everything that needs to be done has to be done for altruism?

      It doesn't but altruistic reasons yield the best (honest) results.

      Let people climb leaderboards for selfcentred reasons.

      While they can and are allowed to, this invites people to game the competitions rather than actually solve the problem presented.

      If they happen to make the world a better place as they go then all for them.

      As previously pointed out, [slashdot.org] the winning system did not make the world a better place as it was in reality an inferior system.

    • by lazarus ( 2879 )

      Why is it that nothing is done for altruistic reasons anymore? Once upon a time we had to rely on each other to survive and looking out for the good of your neighbour was something you just did. It was strange not to.

      Now everybody is a fucking sociopath.

      You may work on safety systems because they pay you money to do it, and that is the only reason. But you may find that you are missing out on a much happier life by being invested emotionally in the difference you are making. Work is a big part of our ex

    • It's the cheating for an altruistic cause that has sand in our panties.

      Their submission would have scored ~ 100th place with a score of 0.427526 without the cheat

  • by Anonymous Coward
  • by EmagGeek ( 574360 ) on Wednesday January 15, 2020 @12:04PM (#59623314) Journal

    Do you mean to tell me that when money is on the line, people will cheat to get it??? You don't say...

    • As shown from the numerous types of Scams out there, often targeting at risk people.
      Often posing as some sort of charity.

  • Who has even heard of "Kaggle"? This does show the stupidity of ML though. You are just training algorithms and assuming it applies to all data thrown in afterwards. This works fine, until it doesn't work at all.

    • Hopefully that was a joke or troll post. Otherwise you just announced to the world that you are an idiot. lol
      • by cusco ( 717999 )

        Yeah, he does that pretty frequently.

      • Yeah, I am an idiot because I don't know about some contest website for people who learned how to use Google's proprietary technology to train networks. The data science community is much larger than some Millennials playing around with APIs.

    • by Ksevio ( 865461 )

      People familiar with machine learning research have heard of Kaggle. You seem to have at least the basics down (that the training data needs to be representative of the test data), so maybe you'd find interest in one of the on-line courses

      • I've been doing ML for longer than most people and I have never heard of that website. What "data science community" are they talking about? We don't all have time to play games and win prizes.

  • This has a very simple solution. Don't store any test data in advance, have a deadline then collect your test data after the deadline.
    • by ceoyoyo ( 59147 )

      Yeah. There was a whole bunch of hand wringing about how this besmirches Kaggle's reputation in machine learning. It should. If the test sets are so easy to get hold of then presumably other teams have "cheated" in the past, in ways that are slightly harder to detect than the klutzy one used in this case.

      Kaggle is kind of a CF. Their system allows all kinds of shady behaviour that, instead of fixing, they seem to expect their contestants to avoid through honour or something. One of the leaders of the team i

      • That's why a lot of the contests now have explicit rules against using external data. However it does seem like the moderators should have questioned the results because the leaderboard shows the current winning score around 0.4. A score of 0.9 with no close competitors should be enough of a red flag as to require a kernel run to approve the prize money.
        • by ceoyoyo ( 59147 )

          How could you ever enforce that?

          If you insist on running contests the correct way to do it is to have a well-protected test set, have an anything goes policy, and use that test set exactly once, at the end of the contest. Anything else is wishful thinking. Or scamming your cut from the contest sponsors.

  • Each 'contest' is different, so the aggregation is stupid. Just give everyone a participation badge.

  • The fact that a team cheated in a competition nominally intended to help shelter animals also raises questions about whether the people who participate in machine learning competitions like Kaggle are actually interested in making the world a better place, or whether they simply want to win prize money and climb virtual leaderboards.

    Seriously? Is the editor in Junior High? If people wanted to use their technical skills for altruistic reasons, they wouldn't need monetary prizes or leaderboards. Those are called incentives, and people respond to incentives. A lot of people respond to financial incentives, and quite a few also respond to raw competitiveness (hence prize money and "gamification").

    The opportunity to help a charity operate its business more effectively would also be an incentive to some people, and the Venn diagrams of the d

    • by ceoyoyo ( 59147 )

      Oh, but #AIforgood while you're busy making better surveillance tools.

    • by cusco ( 717999 )

      The altruism comes from the people putting up the prize money, they want to find new ways to help shelter animals and there aren't enough people with the knowledge to do this work with the same interest. Not sure why this isn't obvious to most people.

      • by crgrace ( 220738 )

        The altruism comes from the people putting up the prize money, they want to find new ways to help shelter animals and there aren't enough people with the knowledge to do this work with the same interest. Not sure why this isn't obvious to most people.

        Well, yes, of course. But the summary sentence I had an issue with was about the contest participants, not the people who put up the prize money.

    • by DRJlaw ( 946416 )

      ...but come on, why offer a money prize if the point of the contest was "altruism"?

      Because, as most of us already know [voxeu.org], altruism and money are not an "either or," they are complementary components of compensation.

      Or did you somehoe miss the whole [careersingovernment.com] members of the military, public education, civil service, etc. are "noble" for not maximizing their private sector earning potential thing.

      • by crgrace ( 220738 )

        Because, as most of us already know, altruism and money are not an "either or," they are complementary components of compensation.

        I mentioned this. Did you read the part of my comment about Venn Diagrams? But if you offer a money prize, you shouldn't be surprised that at least some of the competitors would be in it for that money prize. I issue I had was this fact somehow "called into question" whether the ML community was altruistic. That's just dumb. All it showed was these particular individuals wanted the money enough to cheat. It says absolutely nothing about the ML community as a whole.

        Or did you somehoe miss the whole members of the military, public education, civil service, etc. are "noble" for not maximizing their private sector earning potential thing.

        I'm guessing you're being sarcastic about n

    • *Exactly!* An inordinate number of posts over the last few years have seemed to come from entitled and clueless children, rather than sensiible adults. This is a very good example, the idiotic "world is on fire, panic now!" posts are another, the "what happened to programming, you know, hacking stuff for fun?" It's like talking to a bunch of 10-year-olds most of the time.

  • by tommeke100 ( 755660 ) on Wednesday January 15, 2020 @12:52PM (#59623456)
    There are data leaks in pretty much all Kaggle competitions.
    Someone usually uncovers where the data is from, and uses that extra data in their algorithm (or at least their training). I saw this happening for a financial forecast contest a couple of years ago as well as the building energy consumption prediction contest that ended recently (where teams uncovered the actual buildings and data sets of these buildings online).
    I mean, scraping the data of the actual website makes a lot of sense, there usually are no rules against using external data, I'm sure they aren't the only team who did this.
    Also, it's pretty much the job of the data scientist to have as much data as possible for their ML training.
    Obviously they tried to obfuscate this here, so there's definitely foul play, but after years of Kaggle competitions they could do a better job at keeping their test sets private really.
  • So this article seems to ask the reader to consider the morality of hacking this website in the context of a competition in which participants apply machine learning to the psychology of pet adoption? Oh, they took their eyes off "making the world a better place" for a moment there? Really??

    Sorry, all I could think about was how far up their asses all their heads must be. Especially whoever wrote this, but anyone else who took it seriously, too. This is fertile ground for the writers of "Silicon Valley".

  • The shock everyone is feeling is not just that someone would cheat but who the cheater was. Pavel was an inspiration to many of us. It's like seeing Steve Wozniak indicted for investor fraud.
  • I mean, results "to good to be true" are something I would expect any halfway competent "Data Scientist" to spot immediately. What use are these people if they cannot even get the basics right? I did know that "Data Science" is not really something smart people go into, but this level of incompetence is even worse than I expected.

    • "Data Science" to these types of people means you know how to use Google's "AI" APIs. Not exactly people who actually know anything, but the types who jump on some bandwagon.

      • by gweihir ( 88907 )

        That would make a lot of sense. Coming up with some statistic from some data is easy and things like the Google AI stuff make it even easier. Coming up with a statistic that actually describes reality in a meaningful way is anything but easy.

  • How they did it (Score:5, Informative)

    by Sarten-X ( 1102295 ) on Wednesday January 15, 2020 @02:00PM (#59623694) Homepage

    TFS omits the very next line from TFA, that explains exactly how the perpetrators cheated:

    Benjamin Minixhofer, an Austrian machine learning enthusiast who placed sixth in the pet adoption competition, volunteered to help the company integrate the winning solutions into PetFinder.my’s website. In doing so, he discovered that the BestPetting team obtained PetFinder.my’s testing data, likely by scraping data from Kaggle or PetFinder.my, then encoded and decoded that data into their algorithm to obfuscate their illicit advantage.

  • and expect him/her to react with altruism and anything other than "oOoOoOo... gimme gimme gimme!"? Sounds pretty unrealistic to me.
  • by grep -v '.*' * ( 780312 ) on Wednesday January 15, 2020 @04:48PM (#59624298)

    winning with a nearly perfect score of 0.912 (out of 1.0)

    Meh, I'd give them maybe a B- for that little of an effort.

    An OUTSTANDING score would should be _much_ above the norm, say 1.2 or perhaps 1.1 out of a total of 1.0. Otherwise you're just a no-nothing, no-ambition average plebe who doesn't even deserve self-esteem, never mind getting into college.

The truth of a proposition has nothing to do with its credibility. And vice versa.

Working...