Long before we had XML-based web services or JSON, people typically used binary protocols for the same kinds of tasks. These protocols often operated right above the TCP-layer allowing them to be connection-oriented and even stateful. Of course there are many good reasons to prefer XML over some proprietary binary format, but as ever so often, binary has some advantages as well. Imagine you want to send a 32 bit integer number to a web server. Now the largest number you can encode with 32 bits is 2 to the power of 32 or 4,294,967,296 - that's a ten-digit number. In an XML-based format, such a number would use up 10 bytes whereas in a binary format the same number could be encoded in only four. Admittedly in real-life applications, these space savings are usually nothing to write home about. The real issue with XML-based formats is that XML is a markup language and as such not only holds the actual data, but also a description of the structure. Take a look at a random SOAP request and you'll notice that in many cases, the markup uses up more space than the actual payload. JSON has adressed this problem and while it's still a self-describing plain-text format, it's more space-efficient than anything XML-based could ever be. Problem solved?

In reality, the transport you use for your data doesn't make much of a difference. First of, you (hopefully) have a layer of abstraction between your code and XMLHttpRequest that takes whatever data structure you throw at it and serializes it into XML or JSON. If all goes well, you'll never even see the data in its serialized form and so whether its particularly easy to read or complete gibberish shouldn't matter. Secondly, the real problem with web services is not the amount of data you transfer back and forth but the fact that in a worst case scenario, you'll have to re-establish a TCP-connection for every request you make - something not even JSON can prevent (to some degree, the keep-alive mechanism can).

Motivation

So when I decided to try and implement a binary web service protocol in JavaScript, I didn't do so to solve a particular problem, but to see if it was actually possible. The only advantage a binary protocol would give you is that for someone sniffing your packages, it would be a little bit more difficult to make sense of them than it would be with a plain-text format. But then again that wouldn't stop anyone with a little ambition. So this is really just a proof of concept that shows once more that JavaScript can do a lot more than some people will give it credit for. It's not intended at all to be better, faster or more space efficient than XML or JSON.

The gritty details

I come from a background of traditional (read non-web) programming and so for me the problem of serializing a variable into a stream of bytes seemed trivial. If you've programmed in a language like C or C++ before, you'll probably know that these languages support the concept of pointers. Now pointers aren't particularly popular anymore, but they allow you to do something that modern programming languages aren't so good at: treat a variable as something that it's not. Let's say you have a 64 bit floating point variable and you want to save it to a file or send it over a socket connection. In order to turn the variable into a stream of bytes, all you have to do is create a byte-pointer and have it point at the floating point variable. Then you can access the individual bytes of the floating point variable and do whatever you want with them.
Unfortunately JavaScript is one of those modern programming languages that doesn't do pointers so all of a sudden serializing a variable becomes a little bit more challenging. Challenging but not impossible. I'm not going to get into too much detail about how the serialization works, but I basically ended up serializing everything on the bit-level which isn't particularly difficult but requires a lot more more code to be written. If you care about the details of the implementation, just check out the source code (see the end of this page).

Once I had the serialization and deserialization part figured out, it was time to put it to the test. So I wrote a PHP class to do the serialization and deserialization on the server and developed a little test application that would send a JavaScript object to the PHP script and have the PHP script echo it back to the JavaScript client. That's when I ran into the first problem: the XMLHTTPRequest object which I used on the client side didn't seem to like null-bytes. In many programming languages, the null-byte is used to mark the end of a string. So when I sent my binary messages over XHR, it would ignore anything past the first null-byte. I wasn't going to give up so quickly, so I looked for a solution and found yEnc. yEnc is a mechanism for encoding text messages that is often found on usenet. Unlike base64 encoding which can sometimes be twice the size of the original unencoded message, yEnc has very little overhead and will get rid of any null-bytes. Once I had added yEnc to my serializer, my little test application finally worked.

However, when I looked at the size of the messages I was sending, I quickly noticed that I had yet another problem. The XMLHttpRquest object's send method automatically applies utf-8 encoding to anything it sends. This may be fine for text messages, but what I was sending wasn't exactly text. Now utf-8 encoding will encode characters with a numeric represtation that is larger than 128 with anything between two and four bytes. Meaning that when I was thinking I had sent a single byte, I might have actually sent four. Now this problem I could not yet circumvent and while sending and receiving works just fine, the messages are a lot bigger than they have to be. Typically they're about the same size as a JSON message but sometimes they're also bigger.

The bottom line

So now I have a binary web service protocol and an implementation that sort of works but that suffers from a problem that I cannot fix and that makes the whole thing a lot less useful. I guess the most sensible thing to do now is to come up with a catchy name for it and get it out there. How about BISON (binary interchange standard and object notation)? It sounds remotely like JSON and is very Web 2.0.
As for the "get it out there" part: I'm releasing the JavaScript and PHP source code as well as the documentation under the LGPL so you can play around with it. If anyone comes up with a solution for the utf-8 problem, please let me know!