aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: 4788c0869adebe05a52f39eb5617936611c22526 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119

**es-public-proxy**: simple read-only HTTP reverse-proxy for exposing an
Elasticsearch node to the public internet

* type-safe de-serialization and re-serialization of all user data
* single-binary, easy to install
* simple configuration with sane defaults
* low-overhead in network latency and compute resources
* optional CORS headers for direct browser requests
* SSL, transport compression, load-balancing, observability, and rate-limiting
  are left to other tools like nginx, caddy, or HAproxy
* free software forever: AGPLv3+ license

The Elasticsearch REST API is powerful, well documented, and has client library
implementations for many programming languages. For datasets and services which
contain only public information, it would be convenient to provide direct
access to at least a subset of the API for anybody to take advantage of. The
Elasticsearch maintainers warn against this behavior, on the basis that the API
is not designed for public use. Recent versions of Elasticsearch have an
authentication/authorization subsystem, and there are third-party plugins for
read-only access (such as [ReadonlyREST](https://readonlyrest.com/)), but these
solutions require careful configuration and knowledge of which endpoints are
"safe" for users. Elasticsearch accepts request bodies on `GET` requests, and
one proposed solution is to filter to only `GET` requests using a reverse proxy
like nginx. However, some safe endpoints (such as deleting scroll objects)
require other HTTP verbs, and most browsers do not support `GET` bodies, so
this is only a partial hack.

`es-public-proxy` is intended to be a simple and reliable alternative for the
use case of exposing popular search queries on specific indices to the public
web. HTTP requests are parsed and filtered in a safe, compiled language (Rust),
then only safe queries are re-serialized and forwarded to the backend search
instance listening on a different port.

Note that of course clients can still submit "expensive" queries of various
kinds which will slow down the host. Some of these can be disabled in
the elasticsearch configuration (this would disable those queries for all
connections, not just via the proxy). Some query types are simply not supported
by this proxy. In the future the proxy could gain configruation parameters and
smarter parsing of some query types (like `query_string`) to try and prevent
even more expensive queries.


## Installation

On Debian/Ubuntu Linux systems, the easiest way to get started is to download
and install an unsigned `.deb` from
<https://archive.org/download/es-public-proxy-deb>. This will include a
manpage, configuration file, and systemd unit file. After installing, edit the
configuration file (`/etc/es-public-proxy.toml`) and start the service like:

    sudo systemctl start es-public-proxy
    sudo systemctl enable es-public-proxy

On other platforms you can install and run on a per-user basis using the rust
toolchain with:

    cargo install es-public-proxy
    es-public-proxy --example-config > example.toml

    # edit the configuration file

    es-public-proxy --config example.toml

There is also a Dockerfile, but it isn't actively used and hasn't been pushed
to any image repository. Eg, unsure how best to inject configuration into a
docker image. You can build the image with:

    docker build -f extra/Dockerfile .


## Configuration

In all cases you will want to explicitly enumerate all of the indices to have
public access. There is an `unsafe_all_indices` intended for prototyping, but
this may allow access to additional non-index API endpoints.

One simple deployment pattern is to put `nginx`, `es-public-proxy`, and
`elasticsearch` all on the same server. In this configuration, `nginx` would
listen on all network interfaces on ports 80 and 443, and handle SSL upgrade
redirects from 80 to 443, as well as add transport compression, restrict client
body payload limits, etc. `es-public-proxy` would listen on localhost port
9292, and connect back to elasticsearch on localhost port 9200.


## Limitations

Not all of the elasticsearch API has been implemented yet. In general, this
service is likely to be more strict in parsing and corner-cases. For example:

* URL query parameters like `?human` must be expanded into a boolean like `?human=true`
* Some cases where elasticsearch will allow short-cutting a full object into a
  string, this proxy requires the full object format
* index patterns in configuration are not supported


## Development

To build this package you need the rust toolchain installed. We target stable
Rust, 2018 edition, version 1.45+.

Re-compiling the manpage requires [scdoc](https://git.sr.ht/~sircmpwn/scdoc).

Building a Debian package (`.deb`) requires the `cargo-deb` plugin, which you
can install with: `cargo install cargo-deb`

A Makefile is included to wrap common development commands, for example:

    make test
    make lint
    make deb

Contributions are welcome! Would prefer to keep the number of dependant crates
low (eg, don't currently use a CLI argument parsing library), but open to
discussion. When sending patches or merge requests, it is helpful (but not
required) if you can include test coverage, re-run `cargo fmt`, and acknowledge
the license terms ahead of time.

The Minimum Supported Rust Version (MSRV) is 1.49.