Abstract Both PHP and JavaScript are frequently being targeted for exploiting web applications. This article elaborates on the idea of building a set of virtual machines on top of each programming language. As a result a single type of bytecode can be executed by both VMs. Particular emphasis is put on designing virtual machines to be most suitable for code obfuscation in a post exploitation scenario.
Tuesday, June 1. 2010
Virtual Meta-Scripting Bytecode for PHP and JavaScript
Tuesday, March 23. 2010
Asterisk and the Blink
It's just one of those spring days where your may wonder how to control your friendly BlinkenLights neighbourhood building with Asterisk. Ordinarily the installation comes with a control program called BlinkenLights Chaos Control Center (BLCCC) which can be seen as a jukebox controller handling movies and games. The BLCCC expects incoming ISDN phone calls to be relayed by a UDP based protocol. A suitable Asterisk AGI application can now take over the role of a mediator between ISDN and BLCCC, thus transparently substitute a real ISDN line.
The PoC version of this mediator program is the newest extension to the PoC Telephony Application Suite.
Sunday, October 11. 2009
AVR SPI und USBasp
Wednesday, June 10. 2009
Das Eingabefeld
Eine spontane Frage: Du siehst ein Eingabefeld. Was tippst du ein?
Ich habe diese Frage gestellt und mehrere Antworten erhalten:
Hallo Du bist doof. 1 Who am I? srcjbhenoth (random)
foo bar bla boo test fnord 42 Mr. Foo Bar
"><script> ';drop database;-- 0000 admin abc aaa %00 ../.. `ls` 12345 테스트
Saturday, May 9. 2009
VirtualDocumentRoot and ImaginaryProtection
Apache's mod_vhost_alias provides a neat little feature for easing up virtual host configurations:
This module creates dynamically configured virtual hosts, by allowing the IP address and/or the Host: header of the HTTP request to be used as part of the pathname to determine what files to serve.
Now suppose the following not so uncommon scenario: Two IP addresses share the same host - let's say 10.0.0.10 and 10.0.0.11. One IP is meant for production services, the other IP may be totally unrelated. However, both IPs are configured for named virtual hosts with overlapping wildcards and share the same VirtualDocumentRoot:
span style="color: #7f007f;">"/protected/"
Our second virtualhost beta.foo.bar blocks access to /protected/ - the first virtualhost has no such protection. Let's check out beta's protection manually:
span style="color: #ff0000;">"-//IETF//DTD HTML 2.0//EN"
Now, just for fun, the very same request goes to the other IP:
# telnet 10.0.0.10 80
"e8127-1c-4697bd6c57000"
Please, keep that in mind when rolling out VirtualDocumentRoots on more than one IP with overlapping wildcard hostnames.
Wednesday, May 6. 2009
SSL und andere Geschichten
Ein kurzer Abriss der Geschichte: Bis Ende der 90er wurde starke Kryptographie unter anderem von den USA als militärische Komponente gesehen (siehe WP/Export of crypto und USML). Gleichzeitig erfand Netscape den Verschlüsselungsstandard SSL, der als Basis für TLS diente. In der Kombination wurde die erste öffentliche Version von SSL (SSLv2) absichtlich unsicher gestaltet, um nicht durch US-Exportbestimmungen eingeschränkt zu werden. In aktuellen Browsern wird SSLv3 und TLSv1 verwendet. Der erste Netscape Browser mit SSL wurde mit 128-Bit symmetrischer Verschlüsselung als US-only Variante gebaut und mit 40-Bit zum Export für den Rest der Welt. Die so abgeschwächte "Verschlüsselung" kann heute in Minuten gebrochen werden.
In einer Phase des TLS-Protokolls handeln Server und Client aus, welche Verschlüsselungsmethode (Cipher) verwendet werden soll. Das OpenSSL-Tool zeigt leicht eine Liste der möglichen Cipher für auf OpenSSL basierende Anwendungen an:
$ openssl ciphers -v ALL:eNULL DHE-RSA-AES256-SHA SSLv3 Kx=DH Au=RSA Enc=AES(256) Mac=SHA1 DHE-DSS-AES256-SHA SSLv3 Kx=DH Au=DSS Enc=AES(256) Mac=SHA1 AES256-SHA SSLv3 Kx=RSA Au=RSA Enc=AES(256) Mac=SHA1 ...
Nicht verwenden möchte man eben SSLv2, unsichere 40-Bit Export-Cipher, niedrige 56-Bit oder garkeine Verschlüsselung (eNULL) oder keine Authentifizierung (aNULL). Übrig bleibt noch folgendes:
$ openssl ciphers -v 'ALL:!SSLv2:!LOW:!EXPORT:!eNULL:!aNULL' DHE-RSA-AES256-SHA SSLv3 Kx=DH Au=RSA Enc=AES(256) Mac=SHA1 DHE-DSS-AES256-SHA SSLv3 Kx=DH Au=DSS Enc=AES(256) Mac=SHA1 AES256-SHA SSLv3 Kx=RSA Au=RSA Enc=AES(256) Mac=SHA1 ...
Einen Apache Webserver passend zu konfigurieren, macht nur eine Zeile aus:
SSLCipherSuite ALL:!SSLv2:!LOW:!EXPORT:!eNULL:!aNULL
oder noch besser
SSLCipherSuite HIGH:!SSLv2:!EXP:!aNULL:!eNULL
Randnotiz: Firefox zeigt mit CipherFox den aktuell verwendeten Cipher an.
Heute, etwa zehn Jahre nach dem Wegfall absurder US-Exportbeschränkungen antworten Webserver immer noch auf Anfragen mit SSLv2 oder mit LOW/EXPORT Ciphern. Das OpenSSL-Tool testet das gerne:
$ openssl s_client -no_tls1 -no_ssl3 -cipher EXP -connect www.my-bank.foo:443 ... New, SSLv2, Cipher is EXP-RC2-CBC-MD5 ...
Leute, Leute, Leute, - und damit soll sich mein Internetbankingprovider angesprochen fühlen - schaltet bitte den alten und unsicheren Krempel ab.
Monday, November 24. 2008
Playing hide and seek in a flash
Imagine a warm and bright Saturday afternoon the summer you were just eight years old. Can you remember playing hide and seek with other children from the neighbourhood or from school? Everybody likes to be the one hiding somewhere. You choose a seemingly hidden spot and wait. After a while it would become boring, if the spot is just too well concealed, so you declare a time-out and win the round. However once a hideout is known to any of the seekers, you will be found eventually. From the seeker's perspective, most likely hiding spots are being searched first, depending on where the seeker would hide if he were on the opposing team. Most likely, some even obvious spots will be missed during the first round, like right on top of you inside the trees. Seekers learn and will check there first in the next round. However the hiding party is learning as well, always coming up with tons of new hideouts and ideas to conceal themselves even better. But they will all be found eventually. It is not surprising that discovering at least one person is rather easy if most of the group are hiding and few are seeking, so we'll assume the opposite: Many are searching, few are hiding.
The same game may be applied to Flash/SWF. An attacker wants to execute fraudulent code on a victim's machine. In this case, it should be sufficient to execute arbitrary code inside someone's flash player. The "seeker's" objective is (1) to recognise an attack, preferably before execution, and (2) to know the threat in detail. Obviously, the attacker's role in this game consists of suitable counterparts: (1) Hide the existence of an attack, at least until the code is being executed without being found before and (2) obfuscate the code to discourage easy analysis.
You may see certain similarities to the game, virus writers and the antivirus industry have been playing for some time now. The word 'virus' in this context may stand for trojan horses, spyware, malware or any kind of unwanted software. The dominating virus detection technique - at least referring to static analysis - is a signature match against a dictionary of known viruses (see antivirus software). Once a virus has been identified, a fingerprint of its program code or parts of the code results in a new signature for the dictionary. Round one for the hide and seek goes to the seekers. The natural response to avoid signature detection is a self-modifying code, otherwise known as polymorphic or metamorphic code (see computer virus).
Once again, applied to Flash, a signature approach seems appropriate. Flash code can not (easily) alter and strore itself. Even though Flash files are usually not stored locally for constant analysis by virus checkers, the static nature can be observed in the wild. But there is no reason, why a server should not be able to recreate a different version of the same SWF for each request, which is somewhat like an outsourced metamorphism. So, attackers score round two.
For the analysis of non-static code with static function range, heuristic approaches come to mind. [ Georgia Frantzeskou, Efstathios Stamatatos, and Stefanos Gritzalis - Suppοrting the Cybercrime Investigation Process: Effective Discrimination of Source Code Authors Based on Byte-Level Information - 2007] suggested a statistical classification method based on n-grams (see Ngram). The front row application for n-grams is language detection of written text. The occurrence of every N successive characters (including whitespace) of a text is counted and then compared relatively to a reference count of known language.
This classification method can be applied to Flash as well. Instead of N characters of a text, we'll take a sequence of N ABC OP-codes (aka. AVM2 bytecode). The figure shows a graph representation of several arbitrarily chosen SWF9/10 files and their distance based upon the n-gram analysis. (n=3, hidden edges by distance threshold).
Three clusters become apparent: {9,10}, {17,16,12,15,20} and {3,13,19,11,18,5,21,7}. Clustering is an expression of similarity between the SWF's bytecode. N-gram profiles contain characteristics of the compiler or IDE, std. libraries and the code's author(s), each with different intensity. I'd say, that's another point for the seekers.
Now, in order to defy heuristics, two ways pop into consciousness. Either imitate another profile's appearance by adding NOPs and dead code, or hide the bytecode entirely. While imitation techniques can be matched up by even more advanced filtering and statistical methods, we are going to explore more hiding and obfuscation on the byte level. The AVM2 incorporates a byte loader, which can be used to load and evaluate ABC bytecode during runtime. Consider that we can hide our code anywhere inside the SWF or load seemingly unsuspicious data from external sources - e.g. a picture, sound file, timing data or even data encoded as fake dead code. The data would then be transformed back to our original payload and handed over to the byte loader. Of course, our transformation algorithm and the byte loader itself must undergo a procedure, too, in order to look as harmless as possible. Fortunately there are numerous everyday tasks to be solved by data conversion and loading algorithms, so that our few lines of code can be fingerprinted heuristically without arising any suspicion.
With alchemy Adobe hands out a toolkit for fast ByteArray manipulation free of charge, which happens to coincide with bytecode obfuscation/deobfuscation as described. Hence Adobe scores yet another point for the attackers - yay.
All these elaborations are by no means only of theoretical nature. erlswf has been specifically designed to match the needs of SWF bytecode analysis up to this point in this train of thoughts.
A few concluding remarks: The game of hide and seek goes on forever. If anyone wondered what our current state of the game was, my personal guess would put it somewhere near the end of round one. That means, there is more to look forward to and much more to come.
Thursday, September 11. 2008
Die Tüten-Tüte
Wer kennt die Tüten-Tüte? Niemand? Vielleicht den Tütenkarton, die Taschenkiste oder die Krempelschublade? Was sagt die Nasenweissheit über den Naseweiss? Only god nose? Kann man auch kriminiert werden, oder nur diskriminiert? Kann man sich nur benehmen, oder auch begeben? Stell dir vor es wird in bestimmender Lautstärke am Essenstisch gerufen “Begib dich!”. Oder die Flughafenansage ertönt “Herr Müller, bitte begeben Sie sich!”; obwohl es am Flughafen viel witziger wäre zu hören: “Herr Müller, bitte benehmen Sie sich!”. Aber das ergäbe in dem Zusammenhang keinen Sinn. Ich stelle mir auch vor, wie jemand in seinem jugendlichen Leichtsinn den Aufschrei nach solch absurder Durchsage in seinem Block aufschreibt. Das ist natürlich der Aufschreibär. Wie der wohl aussieht? Vielleicht ist sein Gesicht total zerknautscht, vielleicht entknautscht. Die Tüten, die ich in einer Tüten-Tüte sammele, sind auch zerknautscht.
Thursday, September 4. 2008
django & CouchDB - a match made outside of heaven
First of all, if you don't already know django and CouchDB, take a look at their websites. You might ask "Why? Why this combination?". Both django as an application framework and CouchDB as a database engine are state of the art technologies. So, why not? While searching the net, numerous forums and websites propagate their user's silent wish to incorporate a CouchDB backend into django: 1 2 3 Let's take a closer look. django's backend engines are all SQL based and suitable for relational data organisation - oracle, mysql, postgres, sqlite. That means tables can be created according to a data description and have relationships, e.g. a group contains many users and a user can be in many groups; both users and groups have predefined attributes such as a name. CouchDB on the other hand is document based and schemafree. Each document can be structured differently. You just throw whichever data you have serialised as JSON object into the database. That's it. A document could be an address or details of a book in your personal library or any other data representable as JSON. As a bonus, each document may have any number of file attachments. Now, in order to use django and CouchDB hand in hand there are two major strategies, both with it's catches: One. Develop a proper and seamlessly integrating django model backend using CouchDB. Since most database queries in django use either django's query class django.db.models.sql.query.Query or plain SQL, a new django model must either be able to parse SQL or implement all functions of this query class. (You could also re-implement each save() function of all uses of a django model for starters, but that would be the opposite of an abstracted model component.) Two. Completely ignore the existence of a model abstraction and implement data storage directly into django views -- who needs MVC anyway. PHP versions 1-3 have taught us to implement everything inside a single view anyway A nice example can be found here. While you may already have thoughts about how easy it is to implement a SQL parser, map a relational model upon a document based model and stick it all together into a django model backend (which - by the way - is quite possible), I found that django rather emphasises the "rapid" in rapid development. So, we'll linger with option number two for the moment. Let's see, what we can use of the django world now: - urlpatterns - templates - views - the file upload handler - sessions (with SESSION_ENGINE = 'django.contrib.sessions.backends.cache') - caching (CACHE_BACKEND = 'locmem://') - the authentication backend (hm?)
In order to use the authentication backend without a django model backend, sessions and caching must already be configured as above, django.contrib.sites must be disabled, and a custom auth backend must be implemented as documented. Then, it is advisable to prevent anyone from calling save() or get_and_delete_messages() on a User object:
Friday, May 2. 2008
나비 한글 입력기 & DVORAK & Linux
While being used to a comfortable keymap switching behaviour in MacOSX, my new eeepc had formerly lacked such functionality. The fairly common case to switch between DVORAK and 한글 input can be accomplished easily with a helper program such as 나비, which incidentally happens to be aware of both input methods. In order to use special characters (punctuation, dash, ...) within the 한글 input method, however, these keys should be mapped back to the standard US (qwerty) keymap. This file placed under /usr/share/nabi/keyboard may be of special interest.
Friday, April 18. 2008
web drei punkt null
Es ist offiziell. Gestern wurde im Netzladen das Web3.0 definiert:
Das Web3.0 launcht am 1. Mai. Web3.0 ist wie Einsteins driter Weltkrieg. Web3.0 glänzt durch die Abwesenheit jeglicher Komplexität. Das Web3.0 hat nur von 9 bis 17 Uhr offen. Web3.0 hat keine Pre-Launch-Phase. Web3.0 braucht kein Venturekapital. Web3.0 kommt gleich nach Web2.9. Web3.0 wird immer von den fiesen Löschnazis von Wikipedia gelöscht. "Lieber einen giten Freund verloren als einen guten Witz verpasst" ist auch ein Web3.0 Phänomen. Das Web3.0 hat keine Gürtellinie. Web3.0 hat mehr lunch als launch. Jens arbeitet am Web-PI - aber das ist ein bischen irrational. Daß in der Tagesschau zu alles Beiträgen überall Links sind, wie bei Wikipedia, ist Web3.0. Lasst uns ein Web3.0-Startup gründen. Web3.0 ist hype-immun.
Wednesday, March 12. 2008
Pooling and Automated Code Distribution with Erlang
Erlang's pool module provides a very easy to use load-balancing pooling mechanism implementing a master-slave pooling paradigm with one master and many slaves. By starting the pool, the master tries to log in to all slave machines and start the slave (see slave(3)). At that point the pool is set up and ready to use - basically. Stey by step: First of all a useful pool needs at least one additional slave node (with the pool module the master incorporates a slave node at the same time). The full hostnames must be listed as erlang atoms in the file .hosts.erlang, which resides either in the current working directory or in your home directory. Example:
'foo.bar.priv'. 'blubb.bar.priv'. (newline)
Make sure that it is possible to log in to all machines without a password prompt. ~/.ssh/authorized_keys and ~/.ssh/config might be of help here. If you get an error involving "ssh-askpass" later, try to log in manually first. In order to let the erlang nodes communicate with each other their cookies must be synced. This can be done by setting the ~/.erlang.cookie or by passing the command line argument -setcookie COOKIE. That's all for the basic setup. To try it out we could start erlang like so:
erl -pa boo -setcookie pooltest000 -name pooltest@`hostname` -rsh ssh
and start the pool by
pool:start(pooltest, lists:concat(["-setcookie ", erlang:get_cookie()])).
The argument -pa boo adds boo to the code search path and -rsh ssh tells pool to use ssh instead of rsh. Next, it would be nice to automatically distribute our local code base to all the slave nodes. Luckily the code module provides a simple way to do this:
... {_Module, Binary, Filename} = code:get_object_code(Module), rpc:call(Node, code, load_binary, [Module, Filename, Binary]), ...
Distributed processes can be easily created using pool:pspawn/3.
A complete example:
Continue reading "Pooling and Automated Code Distribution with..." »Monday, February 18. 2008
Erlang unscrables SWF
Using the Erlang bit syntax it's an easy task to unpack the tags of an SWF file. With this thought in mind erlswf has been specifically designed to analyse SWF Tags and ActionScript ByteCode for security issues such as the previously mentioned oversized branch offset or pattern matching against URLs loaded during runtime. The toolkit could also be used to implement a transparent proxy filter for exchanging pictures inside Flash files on the fly. Or if you had no choice but to accept prebuilt SWFs from a third party (e.g. ad hosters), it would still be possible to check for arbitrary conditions or restrictions respectively prior to delivery. The other pure erlang SWF library eswf places emphasis on SWF construction and related data formats (AMF, ABC).
Thursday, January 24. 2008
SWF in a nutshell and the malware tragedy
SWF - or otherwise known as the flash file format - recently caught my attention while discussing web security issues. It can be played on virtually any platform's browser nowadays, which makes it a perfect environment for cross-platform applications (including malware). But before getting into exploring our options of how to exploit the format, let's just get a brief insight into the binary structure of SWF. The file starts with the string FWS or CWS, followed by an 8-bit version number and 32-bit file length field. In case of CWS all the remaining file contents are zlib compressed:
[FWS] [Version] [Length] [Data] or [CWS] [Version] [Length] [Zlib Data]
The complete SWF specification can be found on Adobe's site (registration required), or here. Now, the uncompressed data part starts with a header followed by a list of tags.
[Header] [Tag] [Tag] ...
Each tag acts as a container for a datatype, e.g. for a jpeg image, rgb color or an actionscript bytecode. A tag starts with a tag type identifier and the tag's length, followed by arbitrary data.
[tag code and length (16 bits)] [data (length bytes)]
The complete swf looks like this:
[FWS/CWS] [Version] [Length] [ [Header] [[Tag Code + Length] [Tag Contens]] ... [0] ]
As indicated, the last tag is a tag with tag type 0 and length 0 hence resulting in a 16 bit representation of 0. If we wanted to analyze an SWF file, it would be best to uncompress where needed, parse the header and then break down each tag by its code first. When doing so with real world data we may encounter undocumented or unknown codes. There can be several reasons for these mysterious tag codes, for example the file could be corrupted or our parser could be incomplete. More likely, however, is either that a commonly used - yet undocumented - tag was used correctly according to the programmer's point of view (tag type IDs 16, 29, 31, 38, 40, 42, 47, 49, 50, 51,52, 63, 72), OR the tag was deliberately marked with an unknown code in order to hide bytecode or other data. We'll go along with the latter case, so let's assume - just for a moment - that we are programming a malware flash file. As such our code needs to avoid detection and should be obfuscated as well. The actionscript2 bytecode as located inside doAction tags can issue a branch action (aka. jump or goto) which is ordinarily being used for loops and conditions. Each branch action comes with a relative address of the next action. Example:
0x00: action 1 0x01: some actions... ... 0x10: jump -0x10
Ominously the branch offset (here -0x16) is not restricted to the current code block, but could jump to an entirely different tag instead, where the code is being executed as if it were a code block. Example:
0x100: tag1 header with unknown code 0x104: code in tag 1 ... 0x200: doAction tag 0x204: jump -0x100
This way the code inside tag1 is hidden from ordinary SWF analyzer tools and can still be executed. In order to make it even harder to find the hidden code, random bytecode could be inserted in between actual bytecode, or dormant bytecode (which is never executed) could be used as distraction. Fortunately this technique is also really easy to detect since a checker only needs to be able to check for uncommon branch offsets, however most disassemblers (such as flare) can be fooled. Another interesting way to hide code, which is by far not the last one, would be a base64 encoded SWF file ebmedded in an image of another swf file, such as
<img src="data:application/x-shockwave-flash;base64,..."/>
In the end it does not really matter, which way your code is protected or even if it is hidden at all, because there is no security or malware check anywhere within a flash advertising deployment process. An evil attacker could simply buy ad space from an ad broker, the delivered ad is then quickly checked (possibly manually) for style guidelines such as size or close buttons, and finally delivered to their ad servers. That's the end of the (slightly simplified) deployment process.
Let's explore a few technical possibilities on how to protect yourself from flash malware. (Non-technical solutions such as contract fines or national law are not applicable for the anonymous evil hacker.) Java applets - for example - can have signatures. Since there is no way specified to embed cryptographic signatures in SWF files, and by the way only few people would grasp the signature's relevance anyway, this is not a viable option here. Then there is a sort of capability whitelisting: The SWF file could be checked against allowed capabilities, which include having obfuscated code hidden in unknown tags as described above. The check could be done automatically on client side (e.g. by a browser plugin) or by a proxy either intermediately or on server side. But such a capability filter is yet to be written.
related URLs: https://www.flashsec.org/ http://osflash.org/
Sunday, December 16. 2007
yaws' json-rpc error reply
Just in case you ever wanted to develop a web-application in Erlang and Javascript, you probably stumbled upon JSON-RPC. The idea of JSON-RPC is as trivial as it sounds: You assemble a JSON object describing a remote procedure call, which usually consists of a method name, parameters and a unique id for asynchronous calls. This might look like the following:
{"version":"1.1","method":"login","id":2,"params":["myusername","mypassword"]}
The string representation of a JSON object will then be sent to our yaws-JSON-RPC-server. The documentation describes a simple case which is always successful in returning the requested result. A JSON-RPC reply looks like this:
{"result""id":2}
Using the recommended yaws_rpc module, our erlang program looks - in essence - somewhat like this:
out(A) -> ... yaws_rpc:handler_session(A2, {?MODULE, handler}). handler(..., {call, login, Params}, .....) -> ... {true, 0, Session, {response, true}}.
Now we assume, that every once in a while your server function fails internally, let's say due to an unstable database connection. Naturally we have to reply with an error in this case. The error could - for example - be indicated by a HTTP return code other than 200 (200=success). The handler function of our server code would then simply return the error like {error, "message", 500} instead of {true, ...} (the last line). Alternatively the error could be coded into the JSON-RPC reply like so:
{"id":6,"error":{"code":23,"message":"this and that"}}
Unfortunately this error reply is not easily determined by the handler's return value using the yaws_rpc module, unless it's been patched: yaws_rpc.erl-1.73.diff yaws_rpc_fixed.erl
After applying the patch, a return value of {jsonrpcerror, 23, "this and that"} should do the trick.