Data Science Community Rocked by Pet Adoption Contest Cheating Scandal (vice.com) 75
A team of programmers scraped a pet adoption website to cheat in a $10,000 contest that was intended to help shelter pets get adopted. From a report: Kaggle, an online data science community that regularly hosts machine learning competitions with prizes often in the tens of thousands of dollars, has uncovered a cheating scandal involving a winning team. The Google subsidiary announced late last week that the winner of a competition involving a pet adoption site had been disqualified from the contest for fraudulently obtaining and obscuring test set data. The fact that a team cheated in a competition nominally intended to help shelter animals also raises questions about whether the people who participate in machine learning competitions like Kaggle are actually interested in making the world a better place, or whether they simply want to win prize money and climb virtual leaderboards.
The competition asked contestants to develop algorithms to predict the rate of pet adoption based on pet listings from PetFinder.my, a Malaysian pet adoption site. The goal, according to the competition, was to help discover what makes a shelter pet's online profile appealing for adopters. The winning team's entry would be "adapted into AI tools that will guide shelters and rescuers around the world on improving their pet profiles' appeal, reducing animal suffering and euthanization," the competition site said. The algorithm from BestPetting, the first place team, seemed to almost perfectly predict the rate of adoption for the test set against which the submissions were evaluated, winning with a nearly perfect score of 0.912 (out of 1.0). As a reward for their winning solution, the team of three was awarded the top prize of $10,000. Nine months after the close of the competition, however, one observant teenager found that the impressive results were too good to be true.
The competition asked contestants to develop algorithms to predict the rate of pet adoption based on pet listings from PetFinder.my, a Malaysian pet adoption site. The goal, according to the competition, was to help discover what makes a shelter pet's online profile appealing for adopters. The winning team's entry would be "adapted into AI tools that will guide shelters and rescuers around the world on improving their pet profiles' appeal, reducing animal suffering and euthanization," the competition site said. The algorithm from BestPetting, the first place team, seemed to almost perfectly predict the rate of adoption for the test set against which the submissions were evaluated, winning with a nearly perfect score of 0.912 (out of 1.0). As a reward for their winning solution, the team of three was awarded the top prize of $10,000. Nine months after the close of the competition, however, one observant teenager found that the impressive results were too good to be true.
Re: (Score:2)
This isn't actually helping the animals. If an AI can tell them how to write an ad to make one pet more appealing, then that pet may be adopted instead of some other pet. But that isn't eliminating any suffering, it is just shifting it to a different animal.
Re:Why are idiots putting on these contests? (Score:4, Funny)
Yes, because making ads for pets more appealing will do absolutely nothing to increase the demand for pets. Demand for pets is entirely inelastic and immutable, as proven by Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel winner ShanghaiBill.
Re: (Score:1)
Data Science Community Rocked by Pet Adoption Contest Cheating Scandal
That headline writer was trying for a strong runner up for most incomprehensible word salad.
Probably got tired of writing about current US politics.
Re: (Score:2)
Re: (Score:2)
The contest was about an analysis of the "old" methods to determine which were 1-7 and which were 8-10, and it seems assumed that changing the listings for 8-10 would change the total number of pets adopted, which seems a silly assumption.
Re: (Score:2)
It's a bit more complex than that. Some animals are naturally more adoptable. So, they don't need as much help. Some are far less adoptable and need help to garner interest or they will simply linger in a shelter or foster care until it's too late for them.
This is true, and not necessarily because the animal has any behavior problems either. Dark-colored dogs and cats are statistically adopted less than the light colored ones. It's called Black Dog Syndrome and it's very real. ( https://en.wikipedia.org/wiki/... [wikipedia.org] )
Re: (Score:3)
This isn't actually helping the animals. ...pet may be adopted instead of some other pet. But that isn't eliminating any suffering, it is just shifting it to a different animal.
Counterpoint: Many prospective pet owners debate getting a dog or cat from a breeder or a rescue. If the latter is more appealing because of more effective ads then more animals will be adopted from rescues. The breeders will read the market and respond by reducing pure-bred supply, so while the total number of pets will probably stay relatively constant* a higher percentage will shift to keeping animals from suffering. A net paw-sitive. (Sorry, I couldn't resist)
* - One could also hypothesize that if peo
Re: (Score:2)
Ranbot posited:
Many prospective pet owners debate getting a dog or cat from a breeder or a rescue. If the latter is more appealing because of more effective ads then more animals will be adopted from rescues. The breeders will read the market and respond by reducing pure-bred supply, so while the total number of pets will probably stay relatively constant* a higher percentage will shift to keeping animals from suffering. A net paw-sitive. (Sorry, I couldn't resist)
Except it doesn't work that way in the real world - at least, not in the USA, it doesn't.
Let me begin by stating that every dog we've ever adopted has been a rescue of one sort or another - except the latest addition to our little pack. (I'll explain about him in a moment.)
What you are failing to take into account are the Petlands of the world. Petland gets essentially all its puppies from puppy mills [humanesociety.org]. Puppy mill operators exploit their breeding stock in ways that would turn the stomach of al
Re: (Score:2)
This, and most animal activist's goals are not to move as much product as fast as possible. Anyone who works with shelters knows the bad that comes from pairing pets with poor owners. It is pretty obvious that any shelter not in it just to make a profit would want any sort of minimal advertising to be as unappealing as possible.
Why altruistic? (Score:5, Insightful)
Why is it everything that needs to be done has to be done for altruism? Let people climb leaderboards for selfcentred reasons. If they happen to make the world a better place as they go then all for them.
I don't do work on safety systems to give me a cushy feeling inside and hug my co-workers every day. I improve safety while climbing the technical ladder to get paid, that doesn't change the results.
Screw these cheaters and their cheat data, but no... It doesn't in the slightest raise questions about making the world a better place.
Re:Why altruistic? (Score:5, Interesting)
Let people climb leaderboards for self-centred reasons.
Um, you do realize that the inferior technology won, which would mean that, in the field-deployment of the inferior technology, excessive amounts of imperfect matches/cuteness-identifications will occur compared to the actually-well-performing technology. Hence, good animals die, while mediocre animals get a meh response from potential adopters, which in turn causes mediocre animals to die too. The cheaters aren't cheating legit human beings out of money as much as the cheaters are cheating good animals out of life.
Re: (Score:2)
I'd put these guys in the same classification as the scum who sell fake anti-malarial drugs. There is something seriously wrong with some people.
Re: (Score:2)
"cheating good animals out of life."?
Settle down there, francis.
While I get your point about this competition cheating result leading to an inferior tech being adopted...this isn't brain surgery we're talking about.
This was an algorithm to determine how effective pet-shelter advertisements were at getting the pets adopted. They weren't trying to make them ACCURATE, they were trying to make them APPEALING. As much as this cheater-algorithm likely didn't do that as well as a real one would have, at the same
Re: (Score:2)
I don't believe you are being fair to the poster. the discussion at hand is, what is the reason for doing the competition.
It's already understood that people cheated, that it's wrong and it's corrupted the outcome. I am writing on Slashdot where I would like to think people like to know that the data is real and no one is cheating the system.
the question seems to really ask, "are we doing it for the money".
In which the poster replies ( and I summarize ) "yes, doing it for the money is why they do it"
I happe
Re: (Score:2)
It is not clear in the article weather they cheated by obtaining the most and most accurate data possible to train their AI with, or they cheated by creating an ai that will only return the correct response to the test data and nothing else.
Also, as an AI that will predict the adaptability of a pet advertisement is rather worthless outside of this contest, does it really matter?
Re: (Score:2)
Um, you do realize that the inferior technology won
You missed my point. The inferior technology won due to cheating. All technologies were however developed. Additionally my point was more general: just because a single cheater was exposed does not mean that the incentive to climb a leaderboard automatically means that the world won't be made a better place as a result of personal self interest.
Re: (Score:2)
Not everything has to be done for altruism, but this was to be done for altruism. Per the report:
Re:Why altruistic? (Score:5, Informative)
If they wanted only contestants motivated by altruism, they should have not offered a monetary reward.
They offered a monetary reward and then were surprised that some contestants just did it for the money.
Re: (Score:3)
No, they were surprised that some contestants cheated. The whole **point** was to bring in talent that would otherwise not be interested in helping shelter animals but who would be motivated by money.
Why is this not obvious?
Re: (Score:2)
The world is full of people willing to cheat for money. People even break the law for money.
If you offer money, you will attract such people. There shouldn't be any reason for surprise.
The expectation that you can post a monetary reward, and then only receive a pool of contestants who are doing it for altruistic reasons, is naive.
Re: (Score:2)
Well, maybe I'm naive but it does surprise me that people are willing to steal funds meant to assist shelter animals. Of course then I was also surprised that there were people who were selling fake antibiotics and anti-malarial medicines to the poor.
Re: (Score:2)
From my point of view fraud=theft. They committed fraud to steal the competition, which was meant to benefit shelters in placing their animals. Now the shelters and their animals have something useless and the scammers have the money.
Re: (Score:2)
Well, what I suspect is the following:
We will have in the future, very secure data sources prior to these competitions.
I think of this as the first of many steps of creating a more and fair testing field. Very similar to when the USA started to get rid of the snake oil peddlers and the creation of the FDA.
"The expectation that you can post a monetary reward, and then only receive a pool of contestants who are doing it for altruistic reasons, is naive." this is such a true statement and should be ingrained i
Re: (Score:2)
Why is this not obvious?
If you want the police the care, you have to translate it into some sort of "think of the other puppies."
There will be lots of hyperbole.
Why is this not obvious?
Re: (Score:2)
Which "some contestants" are you referring to?
"On Twitter, Pleskov apologized on behalf of his team, and noted that he intended to return the prize money to PetFinder.my. 'For me, it was never about the money but rather about the Kaggle points: a constant struggle of becoming #1 in rating had compromised my judgment.'"
Re: (Score:2)
Did Pleskov apologize and offer to return the money before, or after, he got caught?
I'll go ahead and make my point...acting noble after one gets caught is no evidence of noble intent. People do that to try and lessen the punishment.
Re: (Score:2)
YES, someone posted what the reason why they are admitting " do that to try and lessen the punishment "
collect the money , then don't let them into the game again
Re: (Score:2)
The article even includes a quote from one of the people on the team that cheated who claims that they cheated to earn more poi
Re: (Score:2)
Yes, an event which both makes the world better AND offers a reward will attract both kinds of person (altruists, and profit-seekers). But it makes no sense to include a monetary reward and then expect that everyone attracted will be doing it for altruistic reasons.
Same goes for the glory of winning, though money tends to be a more effective motivator, generally speaking, when the amount is high enough.
How is this altruistic? (Score:2)
Re: (Score:2)
You may not do your work on safety system to give you a cushy feeling, but are you going to cut corners and release a faulty safety system, just so you can win the highest output employee of the year award?
I work in healthcare because it is a steady job, and I get paid for the most part what I fell I am worth. However I am not going to skimp on patient care quality, just to beat a metric, even if I know I can get away with it, I still wont do it, It is called professional ethics.
That said Altruistic actio
Re: (Score:2)
Why is it everything that needs to be done has to be done for altruism? Let people climb leaderboards for selfcentred reasons. If they happen to make the world a better place as they go then all for them.
Good things absolutely can happen from self-centered individuals without explicit altruism. BUT, in this case the self-centered leaderboard climber didn't make anything that helped the world be better place. It was just a cheat, plain and simple. If you RFTA you would have read this: "their submission would have scored [about] 100th place... without the cheat.”
If there's any silver-lining it's that Kaggle and similar groups hopefully learned to be more aware and careful of cheating, so that future com
Re: (Score:2)
RFTA
LOL... typo... RTFA
Re: (Score:2)
Why is it everything that needs to be done has to be done for altruism?
It doesn't but altruistic reasons yield the best (honest) results.
Let people climb leaderboards for selfcentred reasons.
While they can and are allowed to, this invites people to game the competitions rather than actually solve the problem presented.
If they happen to make the world a better place as they go then all for them.
As previously pointed out, [slashdot.org] the winning system did not make the world a better place as it was in reality an inferior system.
Re: (Score:2)
Why is it that nothing is done for altruistic reasons anymore? Once upon a time we had to rely on each other to survive and looking out for the good of your neighbour was something you just did. It was strange not to.
Now everybody is a fucking sociopath.
You may work on safety systems because they pay you money to do it, and that is the only reason. But you may find that you are missing out on a much happier life by being invested emotionally in the difference you are making. Work is a big part of our ex
Re: (Score:2)
It's the cheating for an altruistic cause that has sand in our panties.
Their submission would have scored ~ 100th place with a score of 0.427526 without the cheat
Campbells Law (Score:1)
Do you mean to tell me... (Score:3, Funny)
Do you mean to tell me that when money is on the line, people will cheat to get it??? You don't say...
Re: (Score:2)
As shown from the numerous types of Scams out there, often targeting at risk people.
Often posing as some sort of charity.
Rocked? (Score:2)
Who has even heard of "Kaggle"? This does show the stupidity of ML though. You are just training algorithms and assuming it applies to all data thrown in afterwards. This works fine, until it doesn't work at all.
Re: Rocked? (Score:2)
Re: (Score:2)
Yeah, he does that pretty frequently.
Re: (Score:2)
Yeah, I am an idiot because I don't know about some contest website for people who learned how to use Google's proprietary technology to train networks. The data science community is much larger than some Millennials playing around with APIs.
Re: (Score:2)
People familiar with machine learning research have heard of Kaggle. You seem to have at least the basics down (that the training data needs to be representative of the test data), so maybe you'd find interest in one of the on-line courses
Re: (Score:3)
I've been doing ML for longer than most people and I have never heard of that website. What "data science community" are they talking about? We don't all have time to play games and win prizes.
Simple solution (Score:1)
Re: (Score:2)
Yeah. There was a whole bunch of hand wringing about how this besmirches Kaggle's reputation in machine learning. It should. If the test sets are so easy to get hold of then presumably other teams have "cheated" in the past, in ways that are slightly harder to detect than the klutzy one used in this case.
Kaggle is kind of a CF. Their system allows all kinds of shady behaviour that, instead of fixing, they seem to expect their contestants to avoid through honour or something. One of the leaders of the team i
Re: Simple solution (Score:2)
Re: (Score:2)
How could you ever enforce that?
If you insist on running contests the correct way to do it is to have a well-protected test set, have an anything goes policy, and use that test set exactly once, at the end of the contest. Anything else is wishful thinking. Or scamming your cut from the contest sponsors.
Get rid of the points and leaderboards (Score:2)
Each 'contest' is different, so the aggregation is stupid. Just give everyone a participation badge.
How is this even a question? (Score:2)
The fact that a team cheated in a competition nominally intended to help shelter animals also raises questions about whether the people who participate in machine learning competitions like Kaggle are actually interested in making the world a better place, or whether they simply want to win prize money and climb virtual leaderboards.
Seriously? Is the editor in Junior High? If people wanted to use their technical skills for altruistic reasons, they wouldn't need monetary prizes or leaderboards. Those are called incentives, and people respond to incentives. A lot of people respond to financial incentives, and quite a few also respond to raw competitiveness (hence prize money and "gamification").
The opportunity to help a charity operate its business more effectively would also be an incentive to some people, and the Venn diagrams of the d
Re: (Score:2)
Oh, but #AIforgood while you're busy making better surveillance tools.
Re: (Score:3)
The altruism comes from the people putting up the prize money, they want to find new ways to help shelter animals and there aren't enough people with the knowledge to do this work with the same interest. Not sure why this isn't obvious to most people.
Re: (Score:2)
The altruism comes from the people putting up the prize money, they want to find new ways to help shelter animals and there aren't enough people with the knowledge to do this work with the same interest. Not sure why this isn't obvious to most people.
Well, yes, of course. But the summary sentence I had an issue with was about the contest participants, not the people who put up the prize money.
Re: (Score:2)
Because, as most of us already know [voxeu.org], altruism and money are not an "either or," they are complementary components of compensation.
Or did you somehoe miss the whole [careersingovernment.com] members of the military, public education, civil service, etc. are "noble" for not maximizing their private sector earning potential thing.
Re: (Score:2)
Because, as most of us already know, altruism and money are not an "either or," they are complementary components of compensation.
I mentioned this. Did you read the part of my comment about Venn Diagrams? But if you offer a money prize, you shouldn't be surprised that at least some of the competitors would be in it for that money prize. I issue I had was this fact somehow "called into question" whether the ML community was altruistic. That's just dumb. All it showed was these particular individuals wanted the money enough to cheat. It says absolutely nothing about the ML community as a whole.
Or did you somehoe miss the whole members of the military, public education, civil service, etc. are "noble" for not maximizing their private sector earning potential thing.
I'm guessing you're being sarcastic about n
Re: (Score:2)
*Exactly!* An inordinate number of posts over the last few years have seemed to come from entitled and clueless children, rather than sensiible adults. This is a very good example, the idiotic "world is on fire, panic now!" posts are another, the "what happened to programming, you know, hacking stuff for fun?" It's like talking to a bunch of 10-year-olds most of the time.
That's called learning (Score:2)
Humans cheat too.
Data Leaks in almost all Kaggle competitions (Score:4, Informative)
Someone usually uncovers where the data is from, and uses that extra data in their algorithm (or at least their training). I saw this happening for a financial forecast contest a couple of years ago as well as the building energy consumption prediction contest that ended recently (where teams uncovered the actual buildings and data sets of these buildings online).
I mean, scraping the data of the actual website makes a lot of sense, there usually are no rules against using external data, I'm sure they aren't the only team who did this.
Also, it's pretty much the job of the data scientist to have as much data as possible for their ML training.
Obviously they tried to obfuscate this here, so there's definitely foul play, but after years of Kaggle competitions they could do a better job at keeping their test sets private really.
Silicon Valley (Score:2)
So this article seems to ask the reader to consider the morality of hacking this website in the context of a competition in which participants apply machine learning to the psychology of pet adoption? Oh, they took their eyes off "making the world a better place" for a moment there? Really??
Sorry, all I could think about was how far up their asses all their heads must be. Especially whoever wrote this, but anyone else who took it seriously, too. This is fertile ground for the writers of "Silicon Valley".
Dude. But why? (Score:1)
It took them 9 months? (Score:2)
I mean, results "to good to be true" are something I would expect any halfway competent "Data Scientist" to spot immediately. What use are these people if they cannot even get the basics right? I did know that "Data Science" is not really something smart people go into, but this level of incompetence is even worse than I expected.
Re: (Score:2)
"Data Science" to these types of people means you know how to use Google's "AI" APIs. Not exactly people who actually know anything, but the types who jump on some bandwagon.
Re: (Score:2)
That would make a lot of sense. Coming up with some statistic from some data is easy and things like the Google AI stuff make it even easier. Coming up with a statistic that actually describes reality in a meaningful way is anything but easy.
How they did it (Score:5, Informative)
TFS omits the very next line from TFA, that explains exactly how the perpetrators cheated:
Benjamin Minixhofer, an Austrian machine learning enthusiast who placed sixth in the pet adoption competition, volunteered to help the company integrate the winning solutions into PetFinder.my’s website. In doing so, he discovered that the BestPetting team obtained PetFinder.my’s testing data, likely by scraping data from Kaggle or PetFinder.my, then encoded and decoded that data into their algorithm to obfuscate their illicit advantage.
Dangle $10k in front of teenager... (Score:2)
winning with a nearly perfect score of 0.912 (Score:3)
winning with a nearly perfect score of 0.912 (out of 1.0)
Meh, I'd give them maybe a B- for that little of an effort.
An OUTSTANDING score would should be _much_ above the norm, say 1.2 or perhaps 1.1 out of a total of 1.0. Otherwise you're just a no-nothing, no-ambition average plebe who doesn't even deserve self-esteem, never mind getting into college.