Liveweb Proxy Configuration

The liveweb-proxy can be configured using various command-line options and/or a config file.

Config file can be specified as:

$ liveweb-proxy -c liveweb.ini

or:

$ liveweb-proxy --config liveweb.ini

This section describes the available config settings. For each config setting, there is a command line option with the same name.

For example, config setting archive-format is available as command line argument –archive-format.

The config file is specified in INI format. Here is a sample config file.

[liveweb]

archive-format = arc

output-directory = /tmp/records

dns-timeout = 2s

Archive Settings

archive-format

Specifies the archive format. Should be one if arc or warc.

The default value is arc.

Warning

As of now only arc is supported.

output-directory

Output directory to write ARC/WRC files. Default value is “records”.

filename-pattern

The pattern of the filename specified as Python string formatting template. The default value is live-%(timestamp)s-%(serial)05d.arc.gz.

Available substitutions are timestamp, serial, pid, fqdn (fully qualified domain name) and port.

filesize-limit

The limit on the size of file. If a file crosses this size, it will be closed a new file will be created to write new records.

num-writers

The number of concurrent writers.

The default value is 1.

Cache Settings

cache

Type of cache to use. Available options are redis, sqlite and none.

The default value is none.

redis-host

redis-port

redis-db

Redis host, port and db number. Used only when cache=redis.

redis-expire-time

Expire time to set in redis. Used only when cache=redis.

The default value is 1h (1 hour).

redis-max-record-size

Maximum allowed size of a record that can be cached. Used only when cache=redis.

The default value is 100KB.

sqlite-db

Path to the sqlite database to use. This option is valid only when cache=sqlite.

The default value is liveweb.db.

Timeouts and Resource Limits

default-timeout

This is the default timeout value for connect-timeout, initial-data-timeout and read-timeout.

The default value is 10s.

dns-timeout

Specifies the max amount of time can a DNS resolution can take.

Python doesn’t support a way to specify DNS timeout. On Linux, the dns timeout can be specified via the RES_OPTIONS environment variable. This enviroment variable is set at the startup of the application based on this config setting.

If unspecified, the DNS timeout is decided by the system default behavior.

See resolv.conf man page for more details.

connect-timeout

Specifies the connect timeout in seconds. Connections that take longer to establish will be aborted.

initial-data-timeout

Specifies the maximum time allowed before receiving initial data (HTTP headers) from the remote server.

read-timeout

Specifies the read timeout in seconds. This indicates the idle time. If no data is received for more than this time, the request will fail.

max-request-time

Specifies the total amout of time a HTTP request can take. If it takes more than this, the current request will fail.

The default value is 2m.

max-response-size

Specifies the maximum allowed size of response.

The default value is 100MB.

Other Settings

user-agent

Specifies the value of the User-Agent request header.

The default value is ia_archiver(OS-Wayback).

http-passthrough

This is a boolean parameter, setting it to true will make it work like a http proxy with archiving. Useful for testing and recording personal browsing.

Project Versions

Table Of Contents

Previous topic

Development Setup

Next topic

Error Codes

This Page