Changes

3.4.1 (2020-01-09)

HTTP2 support is now disabled by default when using the default Splash engine, WebKit. We discovered that it does not work properly on some websites, which results in network399 errors or incorrect rendering (if those network399 errors happen for HTML resources such as style of script files).

It can be enabled with the http2 argument, and with request:set_http2_enabled or splash.http2_enabled in Lua scripts.

3.4 (2019-10-25)

In this release qtwebkit is updated to a more recent version. It is still the same rendering engine, but with some bugs fixed (e.g. handling of redirects where # is present), and with HTTP2 support enabled.

In addition to webkit, Splash 3.4 got an experimental Chromium support (v73.0.3683.105); it can be enabled per-request using engine argument of render.html, render.png and render.jpeg endpoints: engine=chromium. It is in pre-alpha stage, and not suggested to use in production: many (most) features don’t work, there are known bugs.

Main new features:

  • Splash now supports HTTP2, and it’s enabled by default. It can be disabled with http2 argument, and with request:set_http2_enabled or splash.http2_enabled in Lua scripts.
  • new --dont-log-args startup option allows to replace certain argument values with "***" in logs. Use it for sensitive data or for arguments with long values which you don’t want in logs, e.g. --dont-log-args=lua_source,mypassword. Note that sensitive data may still appear in logs, e.g. if you pass it via GET parameters instead of POST.

Other improvements and bug fixes:

  • --browser-engines startup option allows to disable browser engines globally;
  • Max allowed viewport size is increased.
  • For requests which are cancelled (e.g. because client closed a connection) GlobalTimeoutError error no longer appears in logs; it is CancelledError now instead.
  • In case of timeouts, error dict returned to the user now contains “remaining” field with the time remaining, in seconds. It should be negative in most cases (no time remaining => timeout happens). Requests are cancelled not at exact timeout time, there is a small difference, and “remaining” field gives a visibility into that.
  • Better log messages on segfaults (faulthandler is enabled).
  • More robust handling of internal errors in the API.
  • DelayedCall objects are now tracked.
  • Fixed incorrect exception when error happens in splash:autoload() script.
  • Dockerfile is rewritten to use multi-stage builds; provision.sh script is split into several smaller scripts. This makes development easier, e.g. large downloads (qt, etc.) are now cached.
  • Testing improvements.

Dependency updates:

  • qtwebkit is updated to 5.212/1570542016 snapshot.
  • Qt is updated to 5.13.1; PyQT is updated to 5.13.1.
  • Ubuntu 18.04 is used as the base docker image.
  • Splash now uses Python 3.6.
  • Twisted is updated to 19.7.0.

3.3.1 (2019-02-21)

  • Fix a crash in splash:wait_for_resume - Splash used to crash when resume() or error() are called more than once, e.g. by delayed JS code;
  • new FAQ section about debugging Splash crashes.

3.3 (2019-02-06)

Backwards incompatible:

  • --manhole support is dropped for now: it was untested and not really documented, and it stopped working after software upgrades;
  • default --slots value is now 20 instead of 50 (which is still too high for most practical tasks).

New features:

Bug fixes:

  • fixed crash on pages which call window.prompt, prompts are discarded now;
  • fixed response.request.method and response.request.url in splash:on_response callbacks;
  • fixed an edge case with logging causing an exception;
  • proper log level is used for “image is trimmed vertically” message.

Other improvements:

  • qt5reactor is upgraded to 0.5 - this should slightly reduce idle CPU usage;
  • Twisted is upgraded from 16.1.0 to 18.9.0;
  • PyQT5 is upgraded from 5.9 to 5.9.2;
  • Pillow is upgraded to 5.4.1 - as a side effect, taking large JPEG screenshots should use slightly less RAM;
  • a workaround for JPEG + transparency on a web page is removed, as it seems to do nothing;
  • Splash-Jupyter is updated to latest jupyter (ipykernel==5.1.0, notebook==5.7.4);
  • testing improvements;
  • typo fixes and documentation improvements.

3.2 (2018-02-15)

HTML5 media (e.g. <video> tags playback) is disabled by default in this release, because it was a source of some of Splash crashes. This is backwards incompatible, as it can affect rendering. If you need old behavior (it was working on sites you’re crawling), use either html5_media=1 HTTP API argument or splash.html5_media_enabled attribute to re-enable HTML5 media.

Other changes:

  • html5_media HTTP API argument and splash.html5_media_enabled attribute allow to enable/disable HTML5 media;
  • splash.webgl_enabled attribute allows to enable/disable WebGL;
  • splash.media_source_enabled attribute allows to enable/disable Media Source Extension API;
  • --xvbf_screen_size Splash startup argument allows to customize xvfb screen size (it could be helpful sometimes to have it matching with a viewport size you’re using in a crawl);
  • documentation and test improvements.

3.1 (2018-01-31)

  • IndexedDB can be enabled by setting splash.indexeddb_enabled attribute to true in a Lua script;
  • Bengali and Assamese fonts are added to the default Docker image;
  • splash:runjs and splash:autoload are fixed for scripts which end with a line comment (//);
  • --ip startup argument allows to set an IP address Splash listens on;
  • Documentation and testing improvements.

3.0 (2017-07-06)

WebKit is upgraded in this Splash release - Splash now uses https://github.com/annulen/webkit instead of official (deprecated and unsupported) QtWebKit. Splash rendering engine is now similar to Safari from mid-2016. It fixes a lot of problems with compatibility, speed and quality of rendering.

Backwards incompatible changes:

  • there are rendering changes, as WebKit is upgraded;
  • wait argument for render.??? endpoints no longer increases timeout automatically. If you increase timeout by wait value requests to render.??? endpoints will work as before. Also, 30s limit (10s prior to Splash 2.3.3) for wait argument is removed - you can set any wait value, as soon as it is smaller than timeout.
  • Python 2 support is removed. You can still use Python 2 to make requests to Splash, but Splash server itself now runs on Python 3.4+.
  • element:mouse_click and element:mouse_hover now click/hover element center by default, not element top-left corner. Also, they scroll to the element being clicked/hovered if needed, to make it work when an element is outside the current viewport. These methods are now async; they wait for events to propagate (unlike splash:mouse_click and splash:mouse_hover).

New features:

  • An alternative way to access splash.args: it can be received as a second argument of main function (i.e. function main(splash, args) ...);
  • new run endpoint is an alternative to execute endpoint; it is almost the same, but it doesn’t require putting code into function main(splash, args) ... end;
  • new splash.scroll_position attribute allows to get and set window scroll position;
  • Qt is upgraded to 5.9.1, PyQT is upgraded to 5.9;
  • official Docker image now uses Ubuntu 16.04.

Other changes and bug fixes:

  • default timeout limit (i.e. max allowed value) is increased from 60s to 90s; default timeout value is still 30s.
  • Lua sandbox: instruction count limit is increased further (10M instructions instead of 5M)
  • new docs section: Splash Lua API Overview;
  • new FAQ entries: How to send requests to Splash HTTP API?, Website is not rendered correctly;
  • Fixed an issue with splash:runjs: previously in case of an error it returned a table with error information. This approach didn’t play well with Lua assert, so now a string with an error message is returned instead. It was always documented that a string is returned by splash:runjs as a second value when error happens.
  • Fixed element:png and element:jpeg for elements outside curent viewport;
  • DOM attributes and methods are documented as accessible on elements directly, without .node - i.e. splash:select('.my-element'):getAttribute('foo') instead of splash:select('.my-element').node:getAttribute('foo');
  • exposed element:scrollIntoViewIfNeeded() method;
  • improved validation of headers arguments in splash:go, splash:set_custom_headers, splash:http_get and splash:http_post;
  • Splash shouldn’t crash if an exception happens while creating a request in network manager;
  • cleanup of JS event handlers is improved;
  • documentation and testing improvements.

2.3.3 (2017-06-07)

  • WebGL support in default Docker image;
  • Maximum value for wait argument in render.??? endpoints is increased from 10 seconds to 30 seconds;
  • Lua sandbox limits (RAM and CPU) are raised;
  • documentation and testing improvements.

2.3.2 (2017-03-03)

  • security fix: Xvfb shouldn’t listen to tcp.

2.3.1 (2017-01-24)

  • Fixed proxy authentication for proxies set using ‘proxy’ HTTP argument;
  • minor documentation fixes.

2.3 (2016-12-01)

This release adds lots of scraping helpers to Splash: CSS selectors, form filling, easy access to HTML node attributes. Scraping helpers were implemented by Michael Manukyan as a Google Summer of Code 2016 project.

New features:

2.2.2 (2016-11-10)

This is a bug fix release:

  • Splash-Jupyter is fixed;
  • fix an issue with non-ascii HTTP status messages;
  • upgrade Pillow to 3.4.2.

2.2.1 (2016-10-17)

This is a bug fix release:

  • fix Splash UI in Chrome when serving from localhost;
  • upgrade adblockparser to 0.7 to support recent easylist filters;
  • upgrade Pillow to 3.3.3.

2.2 (2016-09-10)

New features:

Bug fixes:

Other changes:

  • internal cleanup of Lua <-> Python interaction;
  • Pillow library is updated in Docker image;
  • HarViewer is upgraded to a recent version.

2.1 (2016-04-20)

New features:

Bug fixes:

  • User-Agent is set correctly for requests with baseurl;
  • “download” links in Splash UI are fixed;
  • an issue with ad blockers preventing Splash UI to work is fixed.

2.0.3 (2016-03-04)

This is a bugfix release:

  • Splash Notebook is fixed to work with recent ipykernel versions;
  • segfaults in adblock middleware are fixed;
  • adblock parsing issues are fixed by upgrading adblockparser to v0.5;
  • fixed handling of adblock rules with ‘domain’ option: domain is now extracted from the page URL, not necessarily from ‘url’ Splash argument.

2.0.2 (2016-02-26)

This is a bugfix release:

  • an issue which may cause segfaults is fixed.

2.0.1 (2016-02-25)

This is a bugfix release:

  • XSS in HTTP UI is fixed;
  • Splash-Jupyter docker image is fixed.

2.0 (2016-02-21)

Splash 2.0 uses Qt 5.5.1 instead of Qt 4; it means the rendering engine now supports more HTML5 features and is more modern overall. Also, the official Docker image now uses Python 3 instead of Python 2. This work is largely done by Tarashish Mishra as a Google Summer of Code 2015 project.

Splash 2.0 release introduces other cool new features:

  • many Splash HTTP UI improvements;
  • better support for binary data;
  • built-in json and base64 libraries;
  • more control for result serialization (support for JSON arrays and raw bytes);
  • it is now possible to turn Private mode OFF at startup using command-line option or at runtime using splash.private_mode_enabled attribute;
  • _ping endpoint is added;
  • cookie handling is fixed;
  • downloader efficiency is improved;
  • request processing is stopped when client disconnects;
  • logging inside callbacks now uses proper verbosity;
  • sandbox memory limit for user objects is increased to 50MB;
  • some sandboxing issues are fixed;
  • splash:evaljs and splash:jsfunc results are sanitized better;
  • it is possible to pass arguments when starting Splash-Jupyter - it means now you can get a browser window for splash-jupyter when it is executed from docker;
  • proxy authentication is fixed;
  • logging improvements: logs now contain request arguments in JSON format; errors are logged;

There are backwards-incompatible changes to Splash Scripting: previously, different Splash methods were returning/receiving inconsistent response and request objects. For example, splash:http_get response was not in the same format as response received by splash:on_response callbacks. Splash 2.0 uses Request and Response objects consistently. Unfortunately this requires changes to existing user scripts:

  • replace resp = splash:http_get(...) and resp = splash:http_post(...) with resp = splash:http_get(...).info and resp = splash:http_post(...).info. Client code also may need to be changed: the default encoding of info['content']['text'] is now base64. If you used resp.content.text consider switching to response.body.
  • response object received by splash:on_response_headers and splash:on_response callbacks is changed: instead of response.request write response.request.info.

Serialization of JS objects in splash:jsfunc, splash:evaljs and splash:wait_for_resume is changed: circular objects are no longer returned, Splash doesn’t try to serialize DOM elements, and error messages are changed.

Splash no longer supports QT-based disk cache; it was disable by default and it usage was discouraged since Splash 1.0, in Splash 2.0 --cache command-line option is removed. For HTTP cache there are better options like Squid.

Another backwards-incompatible change is that Splash-as-a-proxy feature is removed. Please use regular HTTP API instead of this proxy interface. Of course, Splash will still support using proxies to make requests, these are two different features.

1.8 (2015-09-29)

New features:

Bug fixes and improvements:

  • fixed an issue: proxies were not applied for POST requests;
  • improved argument validation for various methods;
  • more detailed logs;
  • it is now possible to load a combatibility shim for window.localStorage;
  • code coverage integration;
  • improved Splash-Jupyter tests;
  • Splash-Jupyter is upgraded to Jupyter 4.0.

1.7 (2015-08-06)

New features:

Other changes:

  • HTTP error detection is improved;
  • MS fonts are added to the Docker image for better rendering quality;
  • Chinese fonts are added to the Docker image to enable rendering of Chinese websites;
  • validation of timeout and wait arguments is improved;
  • documentation: grammar is fixed in the tutorial;
  • assorted documentation improvements and code cleanups;
  • splash:set_images_enabled method is deprecated.

1.6 (2015-05-15)

The main new feature in Splash 1.6 is splash:on_request function which allows to process individual outgoing requests: log, abort, change them.

Other improvements:

  • a new _gc endpoint which allows to clear QWebKit caches;
  • Docker images are updated with more recent package versions;
  • HTTP arguments validation is improved;
  • serving Splash UI under HTTPS is fixed.
  • documentation improvements and typo fixes.

1.5 (2015-03-03)

In this release we introduce Splash-Jupyter - a web-based IDE for Splash Lua scripts with syntax highlighting, autocompletion and a connected live browser window. It is implemented as a kernel for Jupyter (IPython).

Docker images for Splash 1.5 are optimized - download size is much smaller than in previous releases.

Other changes:

  • splash:go() returned incorrect result after an unsuccessful splash:go() call - this is fixed;
  • Lua main function can now return multiple results;
  • there are testing improvements and internal cleanups.

1.4 (2015-02-10)

This release provides faster and more robust screenshot rendering, many improvements in Splash scripting engine and other improvements like better cookie handling.

From version 1.4 Splash requires Pillow (built with PNG support) to work.

There are backwards-incompatible changes in Splash scripts:

To upgrade check all splash:runjs() usages: if the returned result is used then replace splash:runjs() with splash:evaljs().

viewport=full argument is deprecated; use render_all=1.

New scripting features:

Other improvements:

  • –max-timeout option can be passed to Splash at startup to increase or decrease maximum allowed timeout value;
  • cookies are no longer shared between requests;
  • PNG rendering becomes more efficient: less CPU is spent on compression. The downside is that the returned PNG images become 10-15% larger;
  • there is an option (scale_method=vector) to resize images while painting to avoid pixel-based resize step - it can make taking a screenshot much faster on image-light webpages (up to several times faster);
  • when ‘height’ is set and image is downscaled the rendering is more efficient because Splash now avoids rendering unnecessary parts;
  • /debug endpoint tracks more objects;
  • testing setup improvements;
  • application/json POST requests handle invalid JSON better;
  • undocumented splash:go_and_wait() and splash:_wait_restart_on_redirects() methods are removed (they are moved to tests);
  • Lua sandbox is cleaned up;
  • long log messages from Lua are truncated in logs;
  • more detailed error info is logged;
  • example script in Splash UI is simplified;
  • stress tests now include PNG rendering benchmark.

Bug fixes:

  • default viewport size and window geometry are now set to 1024x768; this fixes PNG screenshots with viewport=full;
  • PNG rendering is fixed for huge viewports;
  • splash:go() argument validation is improved;
  • timer is properly deleted when an exception is raised in an errback;
  • redirects handling for baseurl requests is fixed;
  • reply is deleted only once when baseurl is used.

1.3.1 (2014-12-13)

This release fixes packaging issues with Splash 1.3.

1.3 (2014-12-04)

This release introduces an experimental scripting support.

Other changes:

  • manhole is disabled by default in Debian package;
  • more objects are tracked in /debug endpoint;
  • “history” in render.json now includes “queryString” keys; it makes the output compatible with HAR entry format;
  • logging improvements;
  • improved timer cancellation.

1.2.1 (2014-10-16)

  • Dockerfile base image is downgraded to Ubuntu 12.04 to fix random crashes;
  • Debian/buildbot config is fixed to make Splash UI available when deployed from deb;
  • Qt / PyQt / sip / WebKit / Twisted version numbers are logged at startup.

1.2 (2014-10-14)

  • All Splash rendering endpoints now accept Content-Type: application/json POST requests with JSON-encoded rendering options as an alternative to using GET parameters;
  • headers parameter allows to set HTTP headers (including user-agent) for all endpoints - previously it was possible only in proxy mode;
  • js_source parameter allows to execute JS in page context without application/javascript POST requests;
  • testing suite is switched to pytest, test running can now be parallelized;
  • viewport size changes are logged;
  • /debug endpoint provides leak info for more classes;
  • Content-Type header parsing is less strict;
  • documentation improvements;
  • various internal code cleanups.

1.1 (2014-10-10)

  • An UI is added - it allows to quickly check Splash features.
  • Splash can now return requests/responses information in HAR format. See render.har endpoint and har argument of render.json endpoint. A simpler history argument is also available. With HAR support it is possible to get timings for various events, HTTP status code of the responses, HTTP headers, redirect chains, etc.
  • Processing of related resources is stopped earlier and more robustly in case of timeouts.
  • wait parameter changed its meaning: waiting now restarts after each redirect.
  • Dockerfile is improved: image is updated to Ubuntu 14.04; logs are shown immediately; it becomes possible to pass additional options to Splash and customize proxy/js/filter profiles; adblock filters are supported in Docker; versions of Python dependencies are pinned; Splash is started directly (without supervisord).
  • Splash now tries to start Xvfb automatically - no need for xvfb-run. This feature requires xvfbwrapper Python package to be installed.
  • Debian package improvements: Xvfb viewport matches default Splash viewport, it is possible to change Splash option using SPLASH_OPTS environment variable.
  • Documentation is improved: finally, there are some install instructions.
  • Logging: verbosity level of several logging events are changed; data-uris are truncated in logs.
  • Various cleanups and testing improvements.

1.0 (2014-07-28)

Initial release.