Linux + Docker¶
Pull the image:
$ sudo docker pull scrapinghub/splash
Start the container:
$ sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash
Splash is now available at 0.0.0.0 at ports 8050 (http), 8051 (https) and 5023 (telnet).
OS X + Docker¶
Pull the image:
$ docker pull scrapinghub/splash
Start the container:
$ docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash
Figure out the ip address of boot2docker:
$ boot2docker ip The VM's Host only interface IP address is: 192.168.59.103
Splash is available at the returned IP address at ports 8050 (http), 8051 (https) and 5023 (telnet).
Ubuntu 12.04 (manual way)¶
Install system dependencies:
$ sudo add-apt-repository -y ppa:pi-rho/security $ sudo apt-get update $ sudo apt-get install libre2-dev $ sudo apt-get install netbase ca-certificates liblua5.2-dev \ python python-dev build-essential libicu48 \ xvfb libqt4-webkit python-twisted python-qt4
TODO: install Python dependencies using pip, clone repo, chdir to it, start splash.
To run the server execute the following command:
python -m splash.server
python -m splash.server --help to see options available.
By default, Splash API endpoints listen to port 8050 on all available
IPv4 addresses. To change the port use
python -m splash.server --port=5000
# install PyQt4 (Splash is tested on PyQT 4.9.x and PyQT 4.11.x) # and the following packages: twisted qt4reactor psutil adblockparser >= 0.4 re2 >= 0.2.21 xvfbwrapper Pillow # for scripting support lupa >= 1.1 funcparserlib >= 0.3.6 # the following libraries are only required by tests pytest pyOpenSSL requests >= 1.0 jsonschema >= 2.0 strict-rfc3339
docker pull scrapinghub/splash will give you the latest stable Splash
release. To obtain the latest development version use
docker pull scrapinghub/splash:master. Specific Splash versions
are also available, e.g.
docker pull scrapinghub/splash:1.5.
Customizing Dockerized Splash¶
Passing Custom Options¶
To run Splash with custom options pass them to
For example, let’s increase log verbosity:
$ docker run -p 8050:8050 scrapinghub/splash -v3
To see all possible options pass
--help. Not all options will work the
same inside Docker: changing ports doesn’t make sense (use docker run options
instead), and paths are paths in the container.
To set custom Request Filters use -v Docker option. First, create a folder with request filters on your local filesystem, then make it available to the container:
$ docker run -p 8050:8050 -v <my-filters-dir>:/etc/splash/filters scrapinghub/splash
<my-filters-dir> with a path of your local folder with request
Docker Data Volume Containers can also be used. Check https://docs.docker.com/userguide/dockervolumes/ for more info.
$ docker run -p 8050:8050 \ -v <my-proxy-profiles-dir>:/etc/splash/proxy-profiles \ -v <my-js-profiles-dir>:/etc/splash/js-profiles \ scrapinghub/splash
$ docker run -p 8050:8050 \ -v <my-lua-modules-dir>:/etc/splash/lua_modules \ --lua-sandbox-allowed-modules 'module1;module2' \ scrapinghub/splash
Folder sharing (
-v option) doesn’t work on OS X and Windows
It should be fixed in future Docker & Boot2Docker releases.
For now use one of the workarounds mentioned in issue comments
or clone Splash repo and customize its Dockerfile.
Splash in Production¶
In production you may want to daemonize Splash, start it on boot and restart
on failures. Since Docker 1.2 an easy way to do this is to use
-d options together; another way to do that is to use standard tools
like upstart, systemd or supervisor.
--restart option won’t work without
Please also take into account the memory usage: Splash uses an unbound
in-memory cache and so it will eventually consume all RAM. A workaround is
to restart the process when it uses too much memory; there is Splash
--maxrss option for that. You can also add Docker
to the mix.
In production it is a good idea to pin Splash version - instead of
scrapinghub/splash it is usually better to use something like
The final command for starting a long-running Splash server which uses up to 4GB RAM and daemonizes & restarts itself could look like this:
$ docker run -d -p 8050:8050 --memory=4.5G --restart=always scrapinghub/splash:1.6 --maxrss 4000
Building Local Docker Images¶
To build your own Docker image, checkout Splash source code using git, then execute the following command from Splash source root:
$ docker build -t my-local-splash .
To build Splash-Jupyter Docker image use this command:
$ docker build -t my-local-splash-jupyter -f dockerfiles/splash-jupyter/Dockerfile .
You may have to change FROM line in
if you want it to be based on your local Splash Docker container.