Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Open Source Privacy Programming Security Software The Internet

Avast Launches Open-Source Decompiler For Machine Code (techspot.com) 113

Greg Synek reports via TechSpot: To help with the reverse engineering of malware, Avast has released an open-source version of its machine-code decompiler, RetDec, that has been under development for over seven years. RetDec supports a variety of architectures aside from those used on traditional desktops including ARM, PIC32, PowerPC and MIPS. As Internet of Things devices proliferate throughout our homes and inside private businesses, being able to effectively analyze the code running on all of these new devices becomes a necessity to ensure security. In addition to the open-source version found on GitHub, RetDec is also being provided as a web service.

Simply upload a supported executable or machine code and get a reasonably rebuilt version of the source code. It is not possible to retrieve the exact original code of any executable compiled to machine code but obtaining a working or almost working copy of equivalent code can greatly expedite the reverse engineering of software. For any curious developers out there, a REST API is also provided to allow third-party applications to use the decompilation service. A plugin for IDA disassembler is also available for those experienced with decompiling software.

This discussion has been archived. No new comments can be posted.

Avast Launches Open-Source Decompiler For Machine Code

Comments Filter:
  • PIC32 and MIPS!

    It's like a PIC32 isn't actually a MIPS based MCU.... oh wait, it is.

    • by Anonymous Coward on Wednesday December 13, 2017 @08:39PM (#55735421)

      Get over yourself and stop complaining about things being given away to you for free. It's a shame that people complain about open source software when it's being given to them for free. The decompiler could have never been released to the public or released as a closed source program. Your complaint about the architectures it supports or doesn't support totally rings hollow.

      • by sad_ ( 7868 )

        Get over yourself and stop complaining about things being given away to you for free. It's a shame that people complain about open source software when it's being given to them for free.

        isn't that true. what i find even more amazing is that those same people mostly never complain about shortcomings of commercial software.

    • by J053 ( 673094 ) <J053@[ ]ngri-la.cx ['sha' in gap]> on Wednesday December 13, 2017 @08:39PM (#55735423) Homepage Journal

      ...but no x86_64.

      • by bws111 ( 1216812 )

        Or any other 64 bit arch.

      • ...but no x86_64.

        yet.

      • by jonwil ( 467024 ) on Thursday December 14, 2017 @02:31AM (#55736513)

        The thing is open source, if you really want x86-64, grab the code and write something :)

        • The thing is open source, if you really want x86-64, grab the code and write something :)

          x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.

          • by DrYak ( 748999 ) on Thursday December 14, 2017 @08:27AM (#55737477) Homepage

            x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.

            The simple code dumper that comes with garden variety debugger won't easily deobfuscate that. (You need to manually ask the debugger to start dumping from the 2 overlapping point).

            That why, the best decompilers available in the 90s used some sort of virtual machine to follow through the execution flow, and be able to distinguish such kind of "frame shifts" (that's actually a biology term, I've forgotten what the proper CS term is), and also be able to understand a bit of self-modifying code.
            (Basically, the decompiler will notice that various part of the code make calls into the same region but at an odd offset, and will automatically try dumping with from each overlapping point)

            Makes it also possible to put actually-useful label/names into variable. (call something "sound_frequency" instead of "var184" because by following the data flow, the decompiler release that this is the parameter the is output to the PC-Speaker tone generator).

            Sourcer by V-Com [wordpress.com] was one such good decompiler.
            (I managed to learn quite a ton of tricks like PCM play on the PC Speaker, tweaked graphical modes, etc. simply by using sr to inspect interesting executables.
            I even manage to desinfect a cracked game that was saddly being distributed infected with some virus)

    • If it does PIC32 specific functionality like decode that chip's MMIOs, that's a nice feature of simply decoding MIPS object files.

    • PIC32 binaries are pronounced with more of a guttural accent than MIPS ones.

  • A debugger does this (Score:3, Interesting)

    by Snotnose ( 212196 ) on Wednesday December 13, 2017 @08:50PM (#55735491)
    Back in the late 70's I loaded TRS-80 games into my debugger, it also let me dump the results into a text file. Finding things like "jump to label_foo" helped, but was not the be-all end-all.

    The killer was when I debugged my TRS-80 BASIC interpreter in ROM. You'd have some 3 byte instruction, "jump here", then somewhere else you'd have a 3 byte instruction "jump into the middle of this 3 byte instruction to do something completely different". My understanding is Bill did those, but for all the evil he did I have major respect for his coding abilities.

    I beat a lot of games running my debugger on them. 90% sure it was called TRS-MON, but wouldn't bet my retirement on it.
    • Wow, I just came to the comment section to talk about using a disassembler on the trs-80 to beat games. Is that you capn K?
    • by Tony Isaac ( 1301187 ) on Thursday December 14, 2017 @12:16AM (#55736235) Homepage

      One problem with a lot of those old debuggers and disassemblers was that they weren't that smart about what they were looking at. You often had to tell them a range of memory to disassemble, and they would blindly treat everything they saw as code, even if it was actually data. This was partly a problem because in those days, code and data weren't so neatly divided from one another, everything could live anywhere in memory. It was actually common for software to "poke" data into memory and then execute it. Ah, the good old days.

      • by AmiMoJo ( 196126 ) on Thursday December 14, 2017 @03:55AM (#55736663) Homepage Journal

        Indeed, poking code is often the fastest way to do stuff on those older systems where memory bandwidth and CPU clocks are very limited.

        We called it speedcode back in the day. Say you wanted to calculate and plot a load of points on the screen. Normally you would calculate the coordinates, store them and then later pass a reference to some plotting function. To do it faster you could turn calls to the plot function into an unrolled series of instructions, and instead of reading the coordinates every time just poke them directly into the immediate instruction op-codes.

        • We called it self-modifying code. It was really useful for handling interrupts on low end chips like the 6502. In the same sort of way you described, you could STA/STX/STY the register values in the bytes after the LDA/LDY/LDX opcodes at the end of the interrupt handler to save intermediate storage.
  • by shess ( 31691 ) on Wednesday December 13, 2017 @09:02PM (#55735537) Homepage

    Perhaps if you built a fingerprint based on the structure of calls across functions, you could map it back to source code from github. Not that malware is generally posted to github, but I'd be surprised if they didn't use a TON of third_party libraries, and factoring all of those out would make what's left easier to understand and also let you focus better.

    • the structure of calls across functions

      Recognizing some code flow was a staple of the best decompiler back in the 90s :
      e.g. being able to recognize a certain code pattern (a sequence of ports smashing) as a high-level abstraction (initializing sound hardware).

      Your idea would certainly be the 2010s-era equivalent. (= This portion looks like code reuse from "Zstd" decompressor)

  • One of the big issues with decompilers is that compilers do not generate the same output for the same input. In addition, multiple versions of a compiler and different flags yield different results as well. After some thought, I've come to the conclusion that the only viable solution is to build a neural network that can detect and compensate for all the idiosyncrasy using many different test cases (and their binaries) as training data. Ultimately be able to return not only the most likely version of the

  • Probably also helpful when searching for vulnerabilities?

  • AVR, MSP and L106 (Tensilica/ESP8266) missing...

    Especially for MSP, there seem to be a lot of products using it (Honeywell thermostats, Ikea lighting)...

  • encryption (Score:5, Funny)

    by bugs2squash ( 1132591 ) on Wednesday December 13, 2017 @10:53PM (#55735969)
    unfortunately it de-compiles the machine code to perl.
  • I ran some of my own ARM code through this. While I did build with -Os, I did not strip the .elf. The source it produced was a reasonable approximation of what I wrote, but it was far from legible. Little things like using hexadecimal for memory addresses are a minor nitpick, but I found it had trouble even with basic interrupt handlers. I would have expected something aimed at targeting embedded systems would do a better job of of this, but still... very interesting (and very fast)!

  • no mention in the article of what the decompiler actually decompiles to ..
    • Re: (Score:3, Informative)

      by Anonymous Coward

      "no mention in the article of what the decompiler actually decompiles to .."

      According to https://github.com/avast-tl/retdec:
      Output in two high-level languages: C and a Python-like language.

  • UltraEdit (Text editor) will show all text in a file, one can fairly call a files function with just that.

    Long ago there was a program called "Peek" that showed all text in a file none of the hex/high Ascii that UltraEdit also shows; W2K broke it and I've missed it every since.

    I'll be giving this program a try.

  • Of course, this only helps the 5 of us left who still code in ASM. "Kids these days" seem to think that ASM "sucks" because "it's old". If the language doesn't have trait based generics, zero cost abstractions, and a partridge in a pear tree then again it's "old" and it "sucks". It's entertaining to watch your average 20-something java/python/PHP coder try to take on ASM. Their efforts generally don't last more than about five minutes when they find out they have to build their own control structures, and m
  • How does this project compare to the existing machine code compilers, namely Valgrind's VEX library and Qemu's tiny code generator (https://wiki.qemu.org/Documentation/TCG [qemu.org])?

"Why should we subsidize intellectual curiosity?" -Ronald Reagan

Working...