Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Education Technology

Software Takes On School Science Tests In Search For Common Sense 44

holy_calamity writes: Making software take school tests designed for human kids can help the quest for machines with common sense, says researchers at the Allen Institute for Artificial Intelligence. They've made software called Aristo that scores 75 percent on the multiple choice questions that make up most of New York State's 4th grade science exam. The researchers are urging other researchers to pit their best software against school tests, too, to provide a way to benchmark progress and spur competition.
This discussion has been archived. No new comments can be posted.

Software Takes On School Science Tests In Search For Common Sense

Comments Filter:
  • by pjbgravely ( 751384 ) <pjbgravely2 AT gmail DOT com> on Wednesday September 09, 2015 @06:21PM (#50490625) Homepage Journal
    Good sense is no longer common.
    • I'm seeing the slow de-evolution into the blob people of Wall-E

    • by Anonymous Coward

      'Common Sense' never was 'Common', or we would not have a particular phrase for it

      'Common Sense' as an idea arises when 'others' do not realize what 'we' know, therefore we are free to look down at them as 'substandard' because they lack common knowledge

      In the case of 4th grade science tests, I would be much more inclined to question the validity of the tests if they are so complex that a machine designed to answer tests cannot perform above a C average. It is much more likely that they author of the tests

      • by OzPeter ( 195038 )

        I would be much more inclined to question the validity of the tests if they are so complex that a machine designed to answer tests cannot perform above a C average.

        The point of TFAI is not so much that the AI does badly but rather that the author of the AI (^w Slashvertisement) wants to use the tests as a benchmark. From TFA

        Aristo is being developed by researchers at the Allen Institute for Artificial Intelligence in Seattle, who want to give machines a measure of common sense about the world. The institute’s CEO, Oren Etzioni, says the best way to benchmark the development of their digital offspring is to use tests designed for schoolchildren. He’s trying to convince other AI researchers to adopt standardized school tests as a way to measure progress in the field.

        • Tests are a terrible benchmark for AI. It would be easy to get ~75% correct just by looking for keywords and simple pattern matching, without actually understanding the question, or using any AI.

          Winograd Schema [wikipedia.org] are a better test. You can't get them correct with tricks. They test actual understanding.

          • Re:Hopeless (Score:4, Insightful)

            by narcc ( 412956 ) on Thursday September 10, 2015 @01:34AM (#50492529) Journal

            It would be easy to get ~75% correct just by looking for keywords and simple pattern matching, without actually understanding the question,

            Pattern matching, without any understanding, is state-of-the-art AI.

            • Sufficiently advanced pattern matching is understanding.

              • by narcc ( 412956 )

                That remains to be seen.

                • Nope, it's easy enough to prove. Suppose the AI can match the pattern of neural connections in a person who understands a concept -- then that pattern matching is at the very least understanding and perhaps more. Of course, I expect the pattern matching required to be considered understanding is much simpler than this.

                  • by narcc ( 412956 )

                    It's still just idle speculation. Beliefs without evidence, are still beliefs without evidence; no matter how reasonable you believe them to be.

                    • There is ridiculous amounts of evidence that neurons is how we think.

                    • by narcc ( 412956 )

                      Which is completely unrelated to your initial claims, which you'll quickly discover if you follow my advice here: Consider first what you've proposed, then ask what evidence exists that supports the specific model to which you've alluded.

                      Then do some reading. A lot of reading, I suspect. You'll discover that what you believe is nothing more than idle speculation, with no evidence to support the claim you made in your earlier post. A lot of work has been done along those lines, none have yet proven fruit

      • I'm not certain you understand what the term means. You might want to Google it. Aristotle most certainly had a different interpretation of the phrase.

      • Re:Hopeless (Score:4, Informative)

        by mythosaz ( 572040 ) on Wednesday September 09, 2015 @07:34PM (#50490981)

        Test-taking is a skill, and most test-givers include clues (and even answers) in their tests. Some test-givers, of course, mean to give these clues; many are oblivious to it. If I remember some of the bigger lessons from my test-taking classes.

        Multiple choice questions, for example (which is what this software uses) often have choices like:

        Stamen
        Pistil
        Filament
        Pistol

        While some test-givers might include the homophone pistol as a red herring, words like that are a clue that the answer isn't Stamen or Filament, but that you're expected to know how to spell "Pistil."

        Similarly, if you read page-2 of a test, you might find more detailed questions regarding the pistil, questions that might spell out exact what that part of the flower does, solidifying the answer.

        Numbers in the middle of ranges are more likely correct, as are exact numbers near general numbers (e.g. Water boils at a. 10, b. 100, c. 200, d. 212, e. 2000)

        Long answers, when not absurd, are generally correct.

        Middle answers, when not randomized by test software, are more likely to be true.

        A pair of similar answers (see above, Piltil, Pistol) generally narrows you down to 50/50.

        "Absolutes" in true-false questions are almost always false, and true is more common than false.

        Continuity errors like using the wrong article (a/an) often narrow choices.

        Some test-writers who don't randomize also don't repeat answers, or never repeat beyond a limit. Patterns may emerge after simple processes reveal some of the clues.

        ----

        After practice in this test-taking class, we all took multiple choice exams on a variety of complex subjects and passed them.

        • by Macdude ( 23507 )

          I wish I knew about this when I was in school, I wasted so much time learning the subject matter...

    • You posted your tagline from your blog here, which means

      1) You must be correct
      2) There's no point in continuing to do anything

      I guess that wraps it up. pjbgravely has spoken, there's nothing we can do. Let's just all commit mass extinction and get it over with. We're only killing time until the inevitable heat death of the universe, after all.

      Wait, wait a sec. When was it common?

  • There's no such thing as "Common sense". It's just a myth to oppress those with different opinions.

    And if you actually depend on people's common sense for things to work, you're doing it wrong./p.

    • There's no such thing as "Common sense". It's just a myth to oppress those with different opinions.

      And if you actually depend on people's common sense for things to work, you're doing it wrong./p.

      And when "common sense and "science" are paired together, it's usually codespeak for creationism.

  • by phantomfive ( 622387 ) on Wednesday September 09, 2015 @06:52PM (#50490787) Journal
    This quote from the article (by a professor in New York) is all that needs to be said:

    “What’s difficult for humans is very different from what’s difficult for machines,” says Davis, who also works on giving software common sense. “Standardized tests for humans don’t get very good coverage of the kinds of problems that are hard for computers.”

  • Actually using these as benchmarks would bring "teach the test" to a whole new level.
  • 75%? Everybody knows that you get 25% when you just guess randomly... So being able to add another 50% isn't all that amazing.

    Understand how they do this though... They have taken the existing study guides and have constructed an algorithm that does basic word association. Multiple choice tests are written to have one right answer, one plausible answer and two answers which are distracters, designed to trick you. So the trick to multiple choice when you don't know the answer is to identify the distracte

    • Now, if they only would teach KIDS how to take multiple choice tests using similar techniques, THAT would be something worthwhile....

      I don't think it would.

  • ... fail at this once a week while watching, Are You Smarter Than A 5th Grader.

  • This is clearly a step in outsourcing childhood. Why have real children take a test when both the test taking and the test generation can be automated. The kids don't need to go to school, but can work at home, i.e. play video games and eat junk food.

    Standardized test scores will go up so the education establishment will look good. Plus they can fire all the teachers and replace them with hourly contract workers with 1-HB visas who will work even more unpaid overtime then current teachers. English fluency

  • Test taking skills do not equal common sense.

    Multiple choice tests aren't that hard to pass, even if you don't know the material. Typically, there are four choices. Two are usually so obviously NOT the answer, that they can be easily discounted. Then it's just a matter of guessing which of the remaining two is more likely to be correct.

    If I can use this technique to pass a test on a subject I know nothing about, then a machine certainly doesn't have to have common sense to duplicate the feat.

    • by moeinvt ( 851793 )

      I don't see how that technique enables you to pass a test on a subject you know nothing about.
      Even assuming you can correctly eliminate two of the choices from 100% of the questions, you're guessing between the remaining choices. Over a sufficient number of questions, that technique will therefore tend to result in a score of ~ '50' which is typically not a passing grade.

      • Even if you are taking a test on subject on which you aren't an expert, you will generally know at least a few of the answers. Those questions raise your chances of getting a passing grade. No, it's not a foolproof method, but my point was that multiple choice tests are nearly always flawed, and often do more to test a person's test-taking ability, than their actual knowledge.

  • ...I would suggest we make electing human beings to public office illegal.

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...