- Upgraded some dependencies, fixing some crashes: * qtwebkit 5.212.0-alpha-4 * Qt 5.14.1 * PyQt 5.14.2 * PyQtWebEngine 5.14.0 * SIP 4.19.22
- It is now possible to build Splash with a custom qtwebkit binary or build
- Improved the error message about out-of-range viewports
- Enabled logs on Jupyter Notebook
- Fixed a few typos in the documentation
- Fixed Qt installation on Docker after upstream changes to the installer
HTTP2 support is now disabled by default when using the default Splash engine,
WebKit. We discovered that it does not work properly on some websites, which
network399 errors or incorrect rendering (if those
network399 errors happen for HTML resources such as style of script files).
It can be enabled with the http2 argument, and with request:set_http2_enabled or splash.http2_enabled in Lua scripts.
In this release qtwebkit is updated to a more recent version.
It is still the same rendering engine, but with some bugs fixed
(e.g. handling of redirects where
# is present),
and with HTTP2 support enabled.
In addition to webkit, Splash 3.4 got an experimental
Chromium support (v73.0.3683.105); it can be enabled per-request using
engine argument of render.html, render.png and render.jpeg
engine=chromium. It is in pre-alpha stage, and not suggested
to use in production: many (most) features don’t work, there are known bugs.
Main new features:
- Splash now supports HTTP2, and it’s enabled by default. It can be disabled with http2 argument, and with request:set_http2_enabled or splash.http2_enabled in Lua scripts.
--dont-log-argsstartup option allows to replace certain argument values with
"***"in logs. Use it for sensitive data or for arguments with long values which you don’t want in logs, e.g.
--dont-log-args=lua_source,mypassword. Note that sensitive data may still appear in logs, e.g. if you pass it via GET parameters instead of POST.
Other improvements and bug fixes:
--browser-enginesstartup option allows to disable browser engines globally;
- Max allowed viewport size is increased.
- For requests which are cancelled (e.g. because client closed a connection) GlobalTimeoutError error no longer appears in logs; it is CancelledError now instead.
- In case of timeouts, error dict returned to the user now contains “remaining” field with the time remaining, in seconds. It should be negative in most cases (no time remaining => timeout happens). Requests are cancelled not at exact timeout time, there is a small difference, and “remaining” field gives a visibility into that.
- Better log messages on segfaults (faulthandler is enabled).
- More robust handling of internal errors in the API.
- DelayedCall objects are now tracked.
- Fixed incorrect exception when error happens in
- Dockerfile is rewritten to use multi-stage builds;
provision.shscript is split into several smaller scripts. This makes development easier, e.g. large downloads (qt, etc.) are now cached.
- Testing improvements.
- qtwebkit is updated to 5.212/1570542016 snapshot.
- Qt is updated to 5.13.1; PyQt is updated to 5.13.1.
- Ubuntu 18.04 is used as the base docker image.
- Splash now uses Python 3.6.
- Twisted is updated to 19.7.0.
- Fix a crash in splash:wait_for_resume - Splash used to crash when
error()are called more than once, e.g. by delayed JS code;
- new FAQ section about debugging Splash crashes.
--manholesupport is dropped for now: it was untested and not really documented, and it stopped working after software upgrades;
--slotsvalue is now 20 instead of 50 (which is still too high for most practical tasks).
- splash:on_navigation_locked allows to register a function to be called before a request is discarded when navigation is locked.
--disable-browser-cachescommand-line option allows to disable browser caching. See Why are CSS styling and images missing from the .har archive? for an use case.
- request_body and splash.request_body_enabled allow to enable request bodies in HAR output and splash:on_response callbacks.
- fixed crash on pages which call
window.prompt, prompts are discarded now;
response.request.urlin splash:on_response callbacks;
- fixed an edge case with logging causing an exception;
- proper log level is used for “image is trimmed vertically” message.
- qt5reactor is upgraded to 0.5 - this should slightly reduce idle CPU usage;
- Twisted is upgraded from 16.1.0 to 18.9.0;
- PyQt5 is upgraded from 5.9 to 5.9.2;
- Pillow is upgraded to 5.4.1 - as a side effect, taking large JPEG screenshots should use slightly less RAM;
- a workaround for JPEG + transparency on a web page is removed, as it seems to do nothing;
- Splash-Jupyter is updated to latest jupyter (ipykernel==5.1.0, notebook==5.7.4);
- testing improvements;
- typo fixes and documentation improvements.
HTML5 media (e.g.
<video> tags playback) is disabled by default in this
release, because it was a source of some of Splash crashes. This is
backwards incompatible, as it can affect rendering. If you need old
behavior (it was working on sites you’re crawling), use either
html5_media=1 HTTP API argument
or splash.html5_media_enabled attribute to re-enable HTML5 media.
- html5_media HTTP API argument and splash.html5_media_enabled attribute allow to enable/disable HTML5 media;
- splash.webgl_enabled attribute allows to enable/disable WebGL;
- splash.media_source_enabled attribute allows to enable/disable Media Source Extension API;
--xvbf_screen_sizeSplash startup argument allows to customize xvfb screen size (it could be helpful sometimes to have it matching with a viewport size you’re using in a crawl);
- documentation and test improvements.
- IndexedDB can be enabled by setting splash.indexeddb_enabled
truein a Lua script;
- Bengali and Assamese fonts are added to the default Docker image;
- splash:runjs and splash:autoload are fixed for scripts
which end with a line comment (
--ipstartup argument allows to set an IP address Splash listens on;
- Documentation and testing improvements.
WebKit is upgraded in this Splash release - Splash now uses https://github.com/annulen/webkit instead of official (deprecated and unsupported) QtWebKit. Splash rendering engine is now similar to Safari from mid-2016. It fixes a lot of problems with compatibility, speed and quality of rendering.
Backwards incompatible changes:
- there are rendering changes, as WebKit is upgraded;
- wait argument for render.??? endpoints
no longer increases timeout automatically.
If you increase
waitvalue requests to render.??? endpoints will work as before. Also, 30s limit (10s prior to Splash 2.3.3) for wait argument is removed - you can set any
waitvalue, as soon as it is smaller than
- Python 2 support is removed. You can still use Python 2 to make requests to Splash, but Splash server itself now runs on Python 3.4+.
- element:mouse_click and element:mouse_hover now click/hover element center by default, not element top-left corner. Also, they scroll to the element being clicked/hovered if needed, to make it work when an element is outside the current viewport. These methods are now async; they wait for events to propagate (unlike splash:mouse_click and splash:mouse_hover).
- An alternative way to access splash.args: it can be received
as a second argument of
function main(splash, args) ...);
- new run endpoint is an alternative to execute endpoint; it is
almost the same, but it doesn’t require putting code into
function main(splash, args) ... end;
- new splash.scroll_position attribute allows to get and set window scroll position;
- Qt is upgraded to 5.9.1, PyQt is upgraded to 5.9;
- official Docker image now uses Ubuntu 16.04.
Other changes and bug fixes:
- default timeout limit (i.e. max allowed value)
is increased from 60s to 90s; default
timeoutvalue is still 30s.
- Lua sandbox: instruction count limit is increased further (10M instructions instead of 5M)
- new docs section: Splash Lua API Overview;
- new FAQ entries: How to send requests to Splash HTTP API?, Website is not rendered correctly;
- Fixed an issue with splash:runjs: previously in case of an error
it returned a table with error information. This approach didn’t play well
assert, so now a string with an error message is returned instead. It was always documented that a string is returned by splash:runjs as a second value when error happens.
- Fixed element:png and element:jpeg for elements outside curent viewport;
- DOM attributes and methods are documented as accessible on
elements directly, without
- improved validation of
headersarguments in splash:go, splash:set_custom_headers, splash:http_get and splash:http_post;
- Splash shouldn’t crash if an exception happens while creating a request in network manager;
- cleanup of JS event handlers is improved;
- documentation and testing improvements.
- WebGL support in default Docker image;
- Maximum value for
waitargument in render.??? endpoints is increased from 10 seconds to 30 seconds;
- Lua sandbox limits (RAM and CPU) are raised;
- documentation and testing improvements.
- security fix: Xvfb shouldn’t listen to tcp.
- Fixed proxy authentication for proxies set using ‘proxy’ HTTP argument;
- minor documentation fixes.
This release adds lots of scraping helpers to Splash: CSS selectors, form filling, easy access to HTML node attributes. Scraping helpers were implemented by Michael Manukyan as a Google Summer of Code 2016 project.
- splash:select and splash:select_all methods which allow to execute CSS selectors;
This is a bug fix release:
- Splash-Jupyter is fixed;
- fix an issue with non-ascii HTTP status messages;
- upgrade Pillow to 3.4.2.
This is a bug fix release:
- fix Splash UI in Chrome when serving from localhost;
- upgrade adblockparser to 0.7 to support recent easylist filters;
- upgrade Pillow to 3.3.3.
- new splash:send_keys and splash:send_text methods allow to send native keyboard events to browser;
- new splash:with_timeout method allows to limit execution time of blocks of code;
- new splash.plugins_enabled attribute which allows to enable Flash; Flash is now available in Docker image, but it is still disabled by default.
- new splash.response_body_enabled attribute, request:enable_response_body method and response_body argument allows to access and export response bodies.
- fixed handling of splash:call_later, splash:on_request, splash:on_response and splash:on_response_headers callback arguments;
- fixed cleanup of various callbacks;
- fixed save_args in Python 2.x;
- internal cleanup of Lua <-> Python interaction;
- Pillow library is updated in Docker image;
- HarViewer is upgraded to a recent version.
- ‘region’ argument for splash:png and splash:jpeg methods allow to take screenshots of parts of pages;
- save_args and load_args parameters allow to save network traffic by caching large request arguments inside Splash server;
- new splash:mouse_click, splash:mouse_press, splash:mouse_release and splash:mouse_hover methods for sending mouse events to web pages.
- User-Agent is set correctly for requests with baseurl;
- “download” links in Splash UI are fixed;
- an issue with ad blockers preventing Splash UI to work is fixed.
This is a bugfix release:
- Splash Notebook is fixed to work with recent ipykernel versions;
- segfaults in adblock middleware are fixed;
- adblock parsing issues are fixed by upgrading adblockparser to v0.5;
- fixed handling of adblock rules with ‘domain’ option: domain is now extracted from the page URL, not necessarily from ‘url’ Splash argument.
This is a bugfix release:
- an issue which may cause segfaults is fixed.
This is a bugfix release:
- XSS in HTTP UI is fixed;
- Splash-Jupyter docker image is fixed.
Splash 2.0 uses Qt 5.5.1 instead of Qt 4; it means the rendering engine now supports more HTML5 features and is more modern overall. Also, the official Docker image now uses Python 3 instead of Python 2. This work is largely done by Tarashish Mishra as a Google Summer of Code 2015 project.
Splash 2.0 release introduces other cool new features:
- many Splash HTTP UI improvements;
- better support for binary data;
- built-in json and base64 libraries;
- more control for result serialization (support for JSON arrays and raw bytes);
- it is now possible to turn Private mode OFF at startup using command-line option or at runtime using splash.private_mode_enabled attribute;
- _ping endpoint is added;
- cookie handling is fixed;
- downloader efficiency is improved;
- request processing is stopped when client disconnects;
- logging inside callbacks now uses proper verbosity;
- sandbox memory limit for user objects is increased to 50MB;
- some sandboxing issues are fixed;
- splash:evaljs and splash:jsfunc results are sanitized better;
- it is possible to pass arguments when starting Splash-Jupyter - it means now you can get a browser window for splash-jupyter when it is executed from docker;
- proxy authentication is fixed;
- logging improvements: logs now contain request arguments in JSON format; errors are logged;
There are backwards-incompatible changes
to Splash Scripting: previously, different
Splash methods were returning/receiving inconsistent
response and request objects. For example, splash:http_get response was
not in the same format as
response received by splash:on_response
callbacks. Splash 2.0 uses Request and
Response objects consistently.
Unfortunately this requires changes to existing user scripts:
resp = splash:http_get(...)and
resp = splash:http_post(...)with
resp = splash:http_get(...).infoand
resp = splash:http_post(...).info. Client code also may need to be changed: the default encoding of
info['content']['text']is now base64. If you used
resp.content.textconsider switching to response.body.
responseobject received by splash:on_response_headers and splash:on_response callbacks is changed: instead of
Serialization of JS objects in splash:jsfunc, splash:evaljs and splash:wait_for_resume is changed: circular objects are no longer returned, Splash doesn’t try to serialize DOM elements, and error messages are changed.
Splash no longer supports QT-based disk cache; it was disable by default
and it usage was discouraged since Splash 1.0, in Splash 2.0
command-line option is removed. For HTTP cache there are better options like
Another backwards-incompatible change is that Splash-as-a-proxy feature is removed. Please use regular HTTP API instead of this proxy interface. Of course, Splash will still support using proxies to make requests, these are two different features.
- POST requests support: http_method and
body arguments for render endpoints;
new splash:go arguments:
formdata; new splash:http_post method.
- Errors are now returned in JSON format; error mesages became more detailed; Splash UI now displays detailed error information.
- new splash:call_later method which allows to schedule tasks in future;
- new splash:on_response method allows to register a callback to be executed after each response;
- proxy can now be set directly, without using proxy profiles - there is a new proxy argument for render endpoints;
- more detailed splash:go errors: a new “render_error” error type can be reported;
- new splash:set_result_status_code method;
- new splash.resource_timeout attribute as a shortcut for
- new splash:get_version method;
- new splash:autoload_reset, splash:on_response_reset,
splash:har_reset methods and a new
reset=trueargument for splash:har. They are most useful with Splash-Jupyter.
Bug fixes and improvements:
- fixed an issue: proxies were not applied for POST requests;
- improved argument validation for various methods;
- more detailed logs;
- it is now possible to load a combatibility shim for window.localStorage;
- code coverage integration;
- improved Splash-Jupyter tests;
- Splash-Jupyter is upgraded to Jupyter 4.0.
- render.jpeg endpoint and splash:jpeg function allow to take screenshots in JPEG format;
- splash:on_response_headers Lua function and allowed_content_types / forbidden_content_types HTTP arguments allow to discard responses earlier based on their headers;
- splash.images_enabled attribute to enable/disable images from Lua scripts;
- splash.js_enabled attribute to enable/disable JS processing from Lua scripts;
- splash:set_result_header method for setting custom HTTP headers returned to Splash clients;
- resource_timeout argument for setting network request timeouts in render endpoints;
request:set_timeout(timeout)method (ses splash:on_request) for setting request timeouts from Lua scripts;
- SOCKS5 proxy support: new ‘type’ argument
in proxy profile config files
request:set_proxymethod (ses splash:on_request)
- enabled HTTPS proxying;
- HTTP error detection is improved;
- MS fonts are added to the Docker image for better rendering quality;
- Chinese fonts are added to the Docker image to enable rendering of Chinese websites;
- validation of
waitarguments is improved;
- documentation: grammar is fixed in the tutorial;
- assorted documentation improvements and code cleanups;
splash:set_images_enabledmethod is deprecated.
The main new feature in Splash 1.6 is splash:on_request function which allows to process individual outgoing requests: log, abort, change them.
- a new _gc endpoint which allows to clear QWebKit caches;
- Docker images are updated with more recent package versions;
- HTTP arguments validation is improved;
- serving Splash UI under HTTPS is fixed.
- documentation improvements and typo fixes.
In this release we introduce Splash-Jupyter - a web-based IDE for Splash Lua scripts with syntax highlighting, autocompletion and a connected live browser window. It is implemented as a kernel for Jupyter (IPython).
Docker images for Splash 1.5 are optimized - download size is much smaller than in previous releases.
- splash:go() returned incorrect result after an unsuccessful splash:go() call - this is fixed;
mainfunction can now return multiple results;
- there are testing improvements and internal cleanups.
This release provides faster and more robust screenshot rendering, many improvements in Splash scripting engine and other improvements like better cookie handling.
From version 1.4 Splash requires Pillow (built with PNG support) to work.
There are backwards-incompatible changes in Splash scripts:
- splash:set_viewport() is split into splash:set_viewport_size() and splash:set_viewport_full();
- old splash:runjs() method is renamed to splash:evaljs();
To upgrade check all splash:runjs() usages: if the returned result is used then replace splash:runjs() with splash:evaljs().
viewport=full argument is deprecated; use
New scripting features:
- it is now possible to write custom Lua plugins stored server-side;
- a restricted version of Lua
requireis enabled in sandbox;
- splash:autoload() method for setting JS to load on each request;
- splash:wait_for_resume() method for interacting with async JS code;
- splash:lock_navigation() and splash:unlock_navigation() methods;
- splash:set_viewport() is split into splash:set_viewport_size() and splash:set_viewport_full();
- splash:get_viewport_size() method;
- splash:http_get() method for sending HTTP GET requests without loading result to the browser;
- splash:set_content() method for setting page content from a string;
- splash:get_cookies(), splash:add_cookie(), splash:clear_cookies(), splash:delete_cookies() and splash:init_cookies() methods for working with cookies;
- splash:set_user_agent() method for setting User-Agent header;
- splash:set_custom_headers() method for setting other HTTP headers;
- splash:url() method for getting current URL;
- splash:go() now accepts
- splash:runjs() method no longer returns the result of last computation;
- splash:get_perf_stats() method for getting Splash resource usage.
- –max-timeout option can be passed to Splash at startup to increase or decrease maximum allowed timeout value;
- cookies are no longer shared between requests;
- PNG rendering becomes more efficient: less CPU is spent on compression. The downside is that the returned PNG images become 10-15% larger;
- there is an option (
scale_method=vector) to resize images while painting to avoid pixel-based resize step - it can make taking a screenshot much faster on image-light webpages (up to several times faster);
- when ‘height’ is set and image is downscaled the rendering is more efficient because Splash now avoids rendering unnecessary parts;
- /debug endpoint tracks more objects;
- testing setup improvements;
- application/json POST requests handle invalid JSON better;
- undocumented splash:go_and_wait() and splash:_wait_restart_on_redirects() methods are removed (they are moved to tests);
- Lua sandbox is cleaned up;
- long log messages from Lua are truncated in logs;
- more detailed error info is logged;
- example script in Splash UI is simplified;
- stress tests now include PNG rendering benchmark.
- default viewport size and window geometry are now set to 1024x768; this fixes PNG screenshots with viewport=full;
- PNG rendering is fixed for huge viewports;
- splash:go() argument validation is improved;
- timer is properly deleted when an exception is raised in an errback;
- redirects handling for baseurl requests is fixed;
- reply is deleted only once when baseurl is used.
This release fixes packaging issues with Splash 1.3.
This release introduces an experimental scripting support.
- manhole is disabled by default in Debian package;
- more objects are tracked in /debug endpoint;
- “history” in render.json now includes “queryString” keys; it makes the output compatible with HAR entry format;
- logging improvements;
- improved timer cancellation.
- Dockerfile base image is downgraded to Ubuntu 12.04 to fix random crashes;
- Debian/buildbot config is fixed to make Splash UI available when deployed from deb;
- Qt / PyQt / sip / WebKit / Twisted version numbers are logged at startup.
- All Splash rendering endpoints now accept
Content-Type: application/jsonPOST requests with JSON-encoded rendering options as an alternative to using GET parameters;
headersparameter allows to set HTTP headers (including user-agent) for all endpoints - previously it was possible only in proxy mode;
js_sourceparameter allows to execute JS in page context without
- testing suite is switched to pytest, test running can now be parallelized;
- viewport size changes are logged;
/debugendpoint provides leak info for more classes;
- Content-Type header parsing is less strict;
- documentation improvements;
- various internal code cleanups.
- An UI is added - it allows to quickly check Splash features.
- Splash can now return requests/responses information in HAR format. See render.har endpoint and har argument of render.json endpoint. A simpler history argument is also available. With HAR support it is possible to get timings for various events, HTTP status code of the responses, HTTP headers, redirect chains, etc.
- Processing of related resources is stopped earlier and more robustly in case of timeouts.
- wait parameter changed its meaning: waiting now restarts after each redirect.
- Dockerfile is improved: image is updated to Ubuntu 14.04; logs are shown immediately; it becomes possible to pass additional options to Splash and customize proxy/js/filter profiles; adblock filters are supported in Docker; versions of Python dependencies are pinned; Splash is started directly (without supervisord).
- Splash now tries to start Xvfb automatically - no need for xvfb-run.
This feature requires
xvfbwrapperPython package to be installed.
- Debian package improvements: Xvfb viewport matches default Splash viewport, it is possible to change Splash option using SPLASH_OPTS environment variable.
- Documentation is improved: finally, there are some install instructions.
- Logging: verbosity level of several logging events are changed; data-uris are truncated in logs.
- Various cleanups and testing improvements.