Working with Binary Data

Motivation

Splash assumes that most strings in a script are encoded to UTF-8. This is true for HTML content - even if the original response was not UTF-8, internally browser works with UTF-8, so splash:html result is always UTF-8.

When you return a Lua table from the main function Splash encodes it to JSON; JSON is a text protocol which can’t handle arbitrary binary data, so Splash assumes all strings are UTF-8 when returning a JSON result.

But sometimes it is necessary to work with binary data: for example, it could be raw image data returned by splash:png or a response body of a non-UTF-8 page returned by splash:http_get.

Binary Objects

To pass non-UTF8 data to Splash (returning it as a result of main or passing as arguments to splash methods) a script may mark it as a binary object using treat.as_binary function.

Some of the Splash functions already return binary objects: splash:png, splash:jpeg; response.body attribute is also a binary object.

A binary object can be returned as a main result directly. It is the reason the following example works (a basic render.png implementation in Lua):

-- basic render.png emulation
function main(splash)
    assert(splash:go(splash.args.url))
    return splash:png()
end

All binary objects have content-type attached. For example, splash:png result will have content-type image/png.

When returned directly, a binary object data is used as-is for the response body, and Content-Type HTTP header is set to the content-type of a binary object. So in the previous example the result will be a PNG image with a proper Content-Type header.

To construct your own binary objects use treat.as_binary function. For example, let’s return a 1x1px black GIF image as a response:

treat = require("treat")
base64 = require("base64")

function main(splash)
    local gif_b64 = "AQABAIAAAAAAAAAAACH5BAAAAAAALAAAAAABAAEAAAICTAEAOw=="
    local gif_bytes = base64.decode(gif_b64)
    return treat.as_binary(gif_bytes, "image/gif")
end

When main result is returned, binary object content-type takes a priority over a value set by splash:set_result_content_type. To override content-type of a binary object create another binary object with a required content-type:

lcoal treat = require("treat")
function main(splash)
    -- ...
    local img = splash:png()
    return treat.as_binary(img, "image/x-png") -- default was "image/png"
end

When a binary object is serialized to JSON it is auto-encoded to base64 before serializing. For example, it may happen when a table is returned as a main function result:

function main(splash)
    assert(splash:go(splash.args.url))

    -- result is a JSON object {"png": "...base64-encoded image data"}
    return {png=splash:png()}
end