Splash Scripts Reference¶
Warning
Scripting support is an experimental feature for early adopters; API could change in future releases.
splash object is passed to main function; via this object a script can control the browser. Think of it as of an API to a single browser tab.
splash:go¶
Go to an URL. This is similar to entering an URL in a browser address bar, pressing Enter and waiting until page loads.
Signature: ok, reason = splash.go{url, baseurl=nil}
Parameters:
- url - URL to load;
- baseurl - base URL to use, optional. When baseurl argument is passed the page is still loaded from url, but it is rendered as if it was loaded from baseurl: relative resource paths will be relative to baseurl, and the browser will think baseurl is in address bar.
Returns: ok, reason pair. If ok is nil then error happened during page load; reason provides an information about error type.
Two types of errors are reported (ok can be nil in two cases):
- There is nothing to render. This can happen if a host doesn’t exist, server dropped connection, etc. In this case reason is "error".
- Server returned a response with 4xx or 5xx HTTP status code. reason is "http<code>" in this case, i.e. for HTTP 404 Not Found reason is "http404".
Error handling example:
local ok, reason = splash:go("http://example.com")
if not ok:
if reason:sub(0,4) == 'http' then
-- handle HTTP errors
else
-- handle other errors
end
end
-- process the page
-- assert can be used as a shortcut for error handling
assert(splash:go("http://example.com"))
Errors (ok==nil) are only reported when “main” webpage request failed. If a request to a related resource failed then no error is reported by splash:go. To detect and handle such errors (e.g. broken image/js/css links, ajax requests failed to load) use splash:har.
splash:go follows all HTTP redirects before returning the result, but it doesn’t follow HTML <meta http-equiv="refresh" ...> redirects or redirects initiated by JavaScript code. To give the webpage time to follow those redirects use splash:wait.
splash:wait¶
Wait for time seconds. When script is waiting WebKit continues processing the webpage.
Signature: ok, reason = splash:wait{time, cancel_on_redirect=false, cancel_on_error=true}
Parameters:
- time - time to wait, in seconds;
- cancel_on_redirect - if true (not a default) and a redirect happened while waiting, then splash:wait stops earlier and returns nil, "redirect". Redirect could be initiated by <meta http-equiv="refresh" ...> HTML tags or by JavaScript code.
- cancel_on_error - if true (default) and an error which prevents page from being rendered happened while waiting (e.g. an internal WebKit error or a network error like a redirect to a non-resolvable host) then splash:wait stops earlier and returns nil, "error".
Returns: ok, reason pair. If ok is nil then the timer was stopped prematurely, and reason contains a string with a reason. Possible reasons are "error" and "redirect".
Usage example:
-- go to example.com, wait 0.5s, return rendered html, ignore all errors.
function main(splash)
splash:go("http://example.com")
splash:wait(0.5)
return {html=splash:html()}
end
By default wait timer continues to tick when redirect happens. cancel_on_redirect option can be used to restart the timer after each redirect. For example, here is a function that waits for a given time after each page load in case of redirects:
function wait_restarting_on_redirects(splash, time, max_redirects)
local redirects_remaining = max_redirects
while redirects_remaining do
local ok, reason = self:wait{time=time, cancel_on_redirect=true}
if reason ~= 'redirect' then
return ok, reason
end
redirects_remaining = redirects_remaining - 1
end
return nil, "too_many_redirects"
end
splash:jsfunc¶
Convert JavaScript function to a Lua callable.
Signature: lua_func = splash:jsfunc(func)
Parameters:
- func - a string which defines a JavaScript function.
Returns: a function that can be called from Lua to execute JavaScript code in page context.
Example:
function main(splash)
local get_div_count = splash:jsfunc([[
function (){
var body = document.body;
var divs = body.getElementsByTagName('div');
return divs.length;
}
]])
splash:go(splash.args.url)
return get_div_count()
end
Note how Lua [[ ]] string syntax is helpful here.
JavaScript functions may accept arguments:
local vec_len = splash:jsfunc([[
function(x, y) {
return Math.sqrt(x*x + y*y)
}
]])
return {res=vec_len(5, 4)}
Global JavaScript functions can be wrapped directly:
local pow = splash:jsfunc("Math.pow")
local twenty_five = pow(5, 2) -- 5^2 is 25
local thousand = pow(10, 3) -- 10^3 is 1000
Lua strings, numbers, booleans and tables can be passed as arguments; they are converted to JS strings/numbers/booleans/objects. Currently it is not possible to pass other Lua objects. For example, it is not possible to pass a wrapped JavaScript function or a regular Lua function as an argument to another wrapped JavaScript function.
Lua → JavaScript conversion rules:
Lua | JavaScript |
---|---|
string | string |
number | number |
boolean | boolean |
table | Object |
nil | undefined |
Function result is converted from JavaScript to Lua data type. Only simple JS objects are supported. For example, returning a function or a JQuery selector from a wrapped function won’t work.
JavaScript → Lua conversion rules:
JavaScript | Lua |
---|---|
string | string |
number | number |
boolean | boolean |
Object | table |
Array | table |
undefined | nil |
null | "" (an empty string) |
Date | string: date’s ISO8601 representation, e.g. 1958-05-21T10:12:00Z |
RegExp | table {_jstype='RegExp', caseSensitive=true/false, pattern='my-regexp'} |
function | an empty table {} (don’t rely on it) |
Function arguments and return values are passed by value. For example, if you modify an argument from inside a JavaScript function then the caller Lua code won’t see the changes, and if you return a global JS object and modify it in Lua then object won’t be changed in webpage context.
Note
The rule of thumb: if an argument or a return value can be serialized via JSON, then it is fine.
If a JavaScript function throws an error, it is re-throwed as a Lua error. To handle errors it is better to use JavaScript try/catch because some of the information about the error can be lost in JavaScript → Lua conversion.
splash:runjs¶
Execute a JavaScript snippet in page context and return the result of the last statement.
Signature: result = splash:runjs(snippet)
Parameters:
- snippet - a string with JavaScript source code to execute.
Returns: the result of the last statement in snippet, converted from JavaScript to Lua data types.
JavaScript → Lua conversion rules are the same as for splash:jsfunc.
splash:runjs is useful to evaluate short snippets of code or to execute some code without defining a wrapper function.
Example:
local title = splash:runjs("document.title")
splash:jsfunc() is more versatile because it allows to pass arguments to JavaScript functions; to do that with splash:runjs string formatting must be used. Compare:
-- Lua function to scroll window to (x, y) position.
function scroll_to(splash, x, y)
local js = string.format(
"window.scrollTo(%s, %s);",
tonumber(x),
tonumber(y)
)
return splash:runjs(js)
end
-- a simpler version using splash:jsfunc
function scroll_to2(splash, x, y)
local window_scroll = splash:jsfunc("window.scrollTo")
return window_scroll(x, y)
end
splash:html¶
Return a HTML snapshot of a current page (as a string).
Signature: html = splash:html()
Returns: contents of a current page (as a string).
Example:
-- A simplistic implementation of render.html endpoint
function main(splash)
splash:set_result_content_type("text/html; charset=utf-8")
assert(splash:go(splash.args.url))
return splash:html()
end
Nothing prevents us from taking multiple HTML snapshots. For example, let’s visit first 10 pages on a website, and for each page store initial HTML snapshot and an HTML snapshot after waiting 0.5s:
-- Given an url, this function returns a table with
-- two HTML snapshots: HTML right after page is loaded,
-- and HTML after waiting 0.5s.
function page_info(splash, url)
local ok, msg = splash:go(url)
if not ok then
return {ok=false, reason=msg}
end
local res = {before=splash:html()}
assert(splash:wait(0.5)) -- this shouldn't fail, so we wrap it in assert
res.after = splash:html() -- the same as res["after"] = splash:html()
res.ok = true
return res
end
-- visit first 10 http://example.com/pages/<num> pages,
-- return their html snapshots
function main(splash)
local result = {}
for i=1,10 do
local url = "http://example.com/pages/" .. page_num
result[i] = page_info(splash, url)
end
return result
end
splash:png¶
Return a width x height screenshot of a current page in PNG format.
Signature: png = splash:png{width=nil, height=nil}
Parameters:
- width - optional, width of a screenshot in pixels;
- height - optional, height of a screenshot in pixels.
Returns: PNG screenshot data.
TODO: document what default values mean
width and height arguments set a size of the resulting image, not a size of an area screenshot is taken of. For example, if the viewport is 1024px wide then splash:png{width=100} will return a screenshot of the whole viewport, but an image will be downscaled to 100px width.
To set the viewport size use splash:set_viewport method.
If the result of splash:png() is returned directly as a result of “main” function, the screenshot is returned as binary data:
-- A simplistic implementation of render.png endpoint
function main(splash)
splash:set_result_content_type("image/png")
assert(splash:go(splash.args.url))
return splash:png{
width=splash.args.width,
height=splash.args.height
}
end
If the result of splash:png() is returned as a table value, it is encoded to base64 to make it possible to embed in JSON and build a data:uri on a client (magic!):
function main(splash)
assert(splash:go(splash.args.url))
return {png=splash:png()}
end
If your script returns the result of splash:png() in a top-level "png" key (as we’ve done in a previous example) then Splash UI will display it as an image.
splash:har¶
Signature: har = splash:har()
Returns: information about pages loaded, events happened, network requests sent and responses received in HAR format.
If your script returns the result of splash:har() in a top-level "har" key then Splash UI will give you a nice diagram with network information (similar to “Network” tabs in Firefox or Chrome developer tools):
function main(splash)
assert(splash:go(splash.args.url))
return {har=splash:har()}
end
splash:history¶
Signature: entries = splash:history()
Returns: information about requests/responses for the pages loaded, in HAR entries format.
splash:history doesn’t return information about related resources like images, scripts, stylesheets or AJAX requests. If you need this information use splash:har.
Let’s get a JSON array with HTTP headers of the response we’re displaying:
function main(splash)
assert(splash:go(splash.args.url))
local entries = splash:history()
-- #entries means "entries length"; arrays in Lua start from 1
local last_entry = entries[#entries]
return {
headers = last_entry.response.headers
}
end
splash:set_result_content_type¶
Set Content-Type of a result returned to a client.
Signature: splash:set_result_content_type(content_type)
Parameters:
- content_type - a string with Content-Type header value.
If a table is returned by “main” function then splash:set_result_content_type has no effect: Content-Type of the result is set to application/json.
This function does not set Content-Type header for requests initiated by splash:go; this function is for setting Content-Type header of a result.
Example:
function main(splash)
splash:set_result_content_type("text/xml")
return [[
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
]]
end
splash:set_images_enabled¶
Enable/disable images.
Signature: splash:set_images_enabled(enabled)
Parameters:
- enabled - true to enable images, false to disable them.
By default, images are enabled. Disabling of the images can save a lot of network traffic (usually around ~50%) and make rendering faster. Note that this option can affect the JavaScript code inside page: disabling of the images may change sizes and positions of DOM elements, and scripts may read and use them.
Splash uses in-memory cache; cached images will be displayed even when images are disabled. So if you load a page, then disable images, then load a new page, then likely first page will display all images and second page will display some images (the ones common with the first page). Splash cache is shared between scripts executed in the same process, so you can see some images even if they are disabled at the beginning of the script.
Example:
function main(splash)
splash:set_images_enabled(false)
assert(splash:go("http://example.com"))
return {png=splash:png()}
end
splash:set_viewport¶
Set the browser viewport.
Signature: width, height = splash:set_viewport(size)
Parameters:
- size - string, width and height of the viewport. Format is "<width>x<heigth>", e.g. "800x600". It also accepts "full" as a value; "full" means that the viewport size will be auto-detected to fit the whole page (possibly very tall).
Returns: two numbers: width and height the viewport is set to, in pixels.
splash:set_viewport("full") should be called only after page is loaded, and some time passed after that (use splash:wait). This is an unfortunate restriction, but it seems that this is the only way to make rendering work reliably with size=”full”.
splash:png uses the viewport size.
Example:
function main(splash)
assert(splash:go("http://example.com"))
assert(splash:wait(0.5))
splash:set_viewport("full")
return {png=splash:png()}
end
splash.args¶
splash.args is a table with incoming parameters. It contains merged values from the orignal URL string (GET arguments) and values sent using application/json POST request.