Avast Launches Open-Source Decompiler For Machine Code (techspot.com) 113
Greg Synek reports via TechSpot: To help with the reverse engineering of malware, Avast has released an open-source version of its machine-code decompiler, RetDec, that has been under development for over seven years. RetDec supports a variety of architectures aside from those used on traditional desktops including ARM, PIC32, PowerPC and MIPS. As Internet of Things devices proliferate throughout our homes and inside private businesses, being able to effectively analyze the code running on all of these new devices becomes a necessity to ensure security. In addition to the open-source version found on GitHub, RetDec is also being provided as a web service.
Simply upload a supported executable or machine code and get a reasonably rebuilt version of the source code. It is not possible to retrieve the exact original code of any executable compiled to machine code but obtaining a working or almost working copy of equivalent code can greatly expedite the reverse engineering of software. For any curious developers out there, a REST API is also provided to allow third-party applications to use the decompilation service. A plugin for IDA disassembler is also available for those experienced with decompiling software.
Simply upload a supported executable or machine code and get a reasonably rebuilt version of the source code. It is not possible to retrieve the exact original code of any executable compiled to machine code but obtaining a working or almost working copy of equivalent code can greatly expedite the reverse engineering of software. For any curious developers out there, a REST API is also provided to allow third-party applications to use the decompilation service. A plugin for IDA disassembler is also available for those experienced with decompiling software.
Wow! So many architectures! (Score:1, Offtopic)
PIC32 and MIPS!
It's like a PIC32 isn't actually a MIPS based MCU.... oh wait, it is.
Re: Wow! So many architectures! (Score:5, Insightful)
Get over yourself and stop complaining about things being given away to you for free. It's a shame that people complain about open source software when it's being given to them for free. The decompiler could have never been released to the public or released as a closed source program. Your complaint about the architectures it supports or doesn't support totally rings hollow.
Re: (Score:2)
Get over yourself and stop complaining about things being given away to you for free. It's a shame that people complain about open source software when it's being given to them for free.
isn't that true. what i find even more amazing is that those same people mostly never complain about shortcomings of commercial software.
Re:Wow! So many architectures! (Score:5, Informative)
...but no x86_64.
Re: (Score:3)
Or any other 64 bit arch.
Re: (Score:3)
It's accurate. According to retdec.com, RetDec only supports 32bit architectures.
Re: (Score:2)
...but no x86_64.
yet.
Re:Wow! So many architectures! (Score:4, Insightful)
The thing is open source, if you really want x86-64, grab the code and write something :)
Re: (Score:3)
The thing is open source, if you really want x86-64, grab the code and write something :)
x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.
Re: (Score:2)
Even if you ignore the few 32-bit instructions in thumb it is still common to interleave data with the code.
The difference is that with x86 you can interleave code with code.
You can't do that with RISC.
Decompiler are not simple debugger/dumper (Score:4, Informative)
x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.
The simple code dumper that comes with garden variety debugger won't easily deobfuscate that. (You need to manually ask the debugger to start dumping from the 2 overlapping point).
That why, the best decompilers available in the 90s used some sort of virtual machine to follow through the execution flow, and be able to distinguish such kind of "frame shifts" (that's actually a biology term, I've forgotten what the proper CS term is), and also be able to understand a bit of self-modifying code.
(Basically, the decompiler will notice that various part of the code make calls into the same region but at an odd offset, and will automatically try dumping with from each overlapping point)
Makes it also possible to put actually-useful label/names into variable. (call something "sound_frequency" instead of "var184" because by following the data flow, the decompiler release that this is the parameter the is output to the PC-Speaker tone generator).
Sourcer by V-Com [wordpress.com] was one such good decompiler.
(I managed to learn quite a ton of tricks like PCM play on the PC Speaker, tweaked graphical modes, etc. simply by using sr to inspect interesting executables.
I even manage to desinfect a cracked game that was saddly being distributed infected with some virus)
Sub-architectures have value (Score:1)
If it does PIC32 specific functionality like decode that chip's MMIOs, that's a nice feature of simply decoding MIPS object files.
Re: (Score:1)
PIC32 binaries are pronounced with more of a guttural accent than MIPS ones.
A debugger does this (Score:3, Interesting)
The killer was when I debugged my TRS-80 BASIC interpreter in ROM. You'd have some 3 byte instruction, "jump here", then somewhere else you'd have a 3 byte instruction "jump into the middle of this 3 byte instruction to do something completely different". My understanding is Bill did those, but for all the evil he did I have major respect for his coding abilities.
I beat a lot of games running my debugger on them. 90% sure it was called TRS-MON, but wouldn't bet my retirement on it.
Re: (Score:2)
I'm guessing the reverse engineered C++ code is gonna cost a hella amount of time to reverse engineer the reverse engineered code the tool generates.
I've reversed engineered C. C++? Not seeing how a tool is gonna be a lot of help. Basing this on going from C to ASM is pretty straightforward. Going from C++ to C is problematic, especially as you are going C++ -> ASMas opposed to C++ -> C.
Does care (Score:2)
a decompiler won't care whether you compiled a C++, assembler, C or whatever language the program being reversed was compiled on.
It will care, because some language (e.g.: C++) have specific data structures and ways (vtables) to handle some language specific features (object virtual member inheritance) which could be detected by the specific plugin (i.e.: instead of spewing a weird mess of nested "struct" and pointer-to-pointers, it can recognize that his is just a call to a virtual method)
(for the few hipsters outthere : think the difference between vala and the corresponding GObject pure-C code).
Re: (Score:1)
Re:A debugger does this (Score:4, Insightful)
One problem with a lot of those old debuggers and disassemblers was that they weren't that smart about what they were looking at. You often had to tell them a range of memory to disassemble, and they would blindly treat everything they saw as code, even if it was actually data. This was partly a problem because in those days, code and data weren't so neatly divided from one another, everything could live anywhere in memory. It was actually common for software to "poke" data into memory and then execute it. Ah, the good old days.
Re:A debugger does this (Score:5, Interesting)
Indeed, poking code is often the fastest way to do stuff on those older systems where memory bandwidth and CPU clocks are very limited.
We called it speedcode back in the day. Say you wanted to calculate and plot a load of points on the screen. Normally you would calculate the coordinates, store them and then later pass a reference to some plotting function. To do it faster you could turn calls to the plot function into an unrolled series of instructions, and instead of reading the coordinates every time just poke them directly into the immediate instruction op-codes.
Re: (Score:3)
Should crossref with github. (Score:5, Interesting)
Perhaps if you built a fingerprint based on the structure of calls across functions, you could map it back to source code from github. Not that malware is generally posted to github, but I'd be surprised if they didn't use a TON of third_party libraries, and factoring all of those out would make what's left easier to understand and also let you focus better.
Old technique, actually (Score:2)
the structure of calls across functions
Recognizing some code flow was a staple of the best decompiler back in the 90s :
e.g. being able to recognize a certain code pattern (a sequence of ports smashing) as a high-level abstraction (initializing sound hardware).
Your idea would certainly be the 2010s-era equivalent. (= This portion looks like code reuse from "Zstd" decompressor)
Re: (Score:2)
gcc --reverse prog -o prog.c
Re: (Score:2)
Re: (Score:2)
eFast (Score:2)
I am assuming your version of APK Hosts File Engine 10++ 32/64-bit is MALWARE.
I'm guessing others have tested it in a sandbox for malicious behavior. Do you assume Intel and AMD CPUs contain malware? And if you do, do you use them despite said assumption?
So why not just open source it
If this post is to be believed [slashdot.org], APK doesn't want people adding malware, building it, and distributing it, like eFast did with Chromium [slashdot.org].
The other option is for some Slashdot user to make a free replacement. Does the functionality described in this specification [pineight.com] appear useful?
Re: (Score:2)
If this post is to be believed [slashdot.org], APK doesn't want people adding malware, building it, and distributing it
Since you seem to have a little reading comprehension issue, let me copypaste the question again:
What would stop someone from creating a malicious software and naming it APK Hosts File Engine 10++ 32/64-bit?
Code signing with trust on first use (Score:2)
What would stop someone from creating a malicious software and naming it APK Hosts File Engine 10++ 32/64-bit?
The fact that its hash wouldn't match that of the existing APK Hosts File Engine 10++ 32/64-bit posted all over forums.
Now if you replace "10" with "11" in your question, you have a more interesting problem: how to distinguish subsequent versions of the same publisher's application from an impostor's malware. The publisher of the authentic application could generate a self-signed code signing certificate and sign each version of all of its programs. Then each user would configure his devices to "Trust other
Re: (Score:2)
Re: (Score:2)
ClamAV Possibly Unwanted Application. While not necessarily malicious, the scanned file presents certain characteristics which depending on the user policies and environment may or may not represent a threat. For full details see: https://www.clamav.net/documen... [clamav.net] . Symantec reputation Suspicious.Insight
Sounds like malware to me.
Re: (Score:2)
This is probably a waste of time, but ... When you're typing a message and say "see the p.s. below", it means you _know_ at that point that you will be having a p.s. But in that case you could just place the text where you are, and not _need_ a p.s.
Re: (Score:1)
Re: (Score:2)
Not every program packed with UPX [wikipedia.org] is a virus.
Re: (Score:2)
Good programs use exe packers too as I said
Name one that's from this decade.
Re: (Score:2)
I don't have to prove a negative. That's like saying "prove that god doesn't exist". The onus is on those who make claims to back them up, not on others to disprove them.
All it would take was one example to prove your claim. How hard would that be, if what you claimed were true?
Hint: Instead of posting URLs to posts that nobody will bother to follow, try to actually back up your wild claims with some actual meaningful text. Without bolding random words, without changing the subject and referring to it,
Re: (Score:2)
"I personally use a HOSTS file blocker produced from a genius called APK." by 110010001000 on Friday October 27, 2017
The irony in this is brilliant -- you're actually too stupid to realize that 110010001000 is the guy you're "arguing" with.
You' probably even think that he genuinely thinks you're a genius rather then openly mocking you. Oh boy.
Now, why don't you stop with your obnoxious ads? Wasn't one of your marketing points that your shitware removes ads? Does it remove your spammed ads on /.?
Re: (Score:2)
No I'm not (it helps hide how I detect it) by obfuscation hiding functions/methods where I summon an .exe size for even 1 BYTE in sizecheck (no virus is that small)
You don't know much about writing viruses; that much is clear, because checking the size is a waste.
One popular approach for viruses is to put the original file elsewhere (where elsewhere can either be elsewhere on a file system, or for file systems that support it, in a resource fork or attribute list of the same file), and then pad the virus to the desired file length.
For weak CRCs, even change the padding to return the same CRC.
But worse, you also then prevent the binary from running on systems where the
Re: (Score:2)
It's known protectionvs. reverse engineering: PROOF: "Packing an executable file is a way of compressing executable code firstly to minimize filesizes, but often it is also used to complicate the reverse engineering process"
Also known as "security through obscurity", as I said in my post.
Re: (Score:2)
that proof's SO RIGHT you had to try "downmod hide it"
Um, no. I only have one account, and don't post as an anonymous coward, so I don't get to downmod anyone in this thread. It's others that downmodded your post, likely because of your incoherent ramblings being, well, "so wrong".
YOU DEFINITELY CAN'T, troll!
Think about it: Are everybody who disagrees with you trolls, or could it be that you are a smidgeon paranoid?
combine with neural network (Score:2)
One of the big issues with decompilers is that compilers do not generate the same output for the same input. In addition, multiple versions of a compiler and different flags yield different results as well. After some thought, I've come to the conclusion that the only viable solution is to build a neural network that can detect and compensate for all the idiosyncrasy using many different test cases (and their binaries) as training data. Ultimately be able to return not only the most likely version of the
Re: (Score:2)
How the FUCK are you going to recover the variable names, the preprocessor directives, and the comments?
You don't need them. Really.
And what if the original program had inline assembler? What are you going to do with that?
It will will generate code that does the same - it does not have to look the same, as long as what it does is the same.
double-edged? (Score:2)
Probably also helpful when searching for vulnerabilities?
other CPUs/archs are missing (Score:2)
AVR, MSP and L106 (Tensilica/ESP8266) missing...
Especially for MSP, there seem to be a lot of products using it (Honeywell thermostats, Ikea lighting)...
encryption (Score:5, Funny)
Interesting to watch this develop (Score:2)
I ran some of my own ARM code through this. While I did build with -Os, I did not strip the .elf. The source it produced was a reasonable approximation of what I wrote, but it was far from legible. Little things like using hexadecimal for memory addresses are a minor nitpick, but I found it had trouble even with basic interrupt handlers. I would have expected something aimed at targeting embedded systems would do a better job of of this, but still... very interesting (and very fast)!
decompiles INTO WHAT ? (Score:2)
Re: (Score:3, Informative)
"no mention in the article of what the decompiler actually decompiles to .."
According to https://github.com/avast-tl/retdec:
Output in two high-level languages: C and a Python-like language.
UltraEdit is my file "detector" (Score:2)
UltraEdit (Text editor) will show all text in a file, one can fairly call a files function with just that.
Long ago there was a program called "Peek" that showed all text in a file none of the hex/high Ascii that UltraEdit also shows; W2K broke it and I've missed it every since.
I'll be giving this program a try.
ASM? These kids don't need no stinking ASM! (Score:2)
Other machine code decompilers (Score:2)