1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
|
**es-public-proxy**: simple HTTP reverse-proxy for exposing an Elasticsearch
node to the public internet
* type-safe de-serialization and re-serialization of all user data
* single-binary, easy to install
* simple configuration with sane defaults
* low-overhead in network latency and compute resources
* optional CORS headers for direct browser requests
* SSL, transport compression, load-balancing, observability, and rate-limiting
are left to other tools like nginx, caddy, or HAproxy
* free software forever: AGPLv3+ license
The Elasticsearch REST API is powerful, well documented, and has client library
implementations for many programming languages. For datasets and services which
contain only public information, it would be convenient to provide direct
access to at least a subset of the API for anybody to take advantage of. The
Elasticsearch maintainers warn against this behavior, on the basis that the API
is not designed for public use. Recent versions of Elasticsearch have an
authentication/authorization subsystem, and there are third-party plugins for
read-only access (such as [ReadonlyREST](https://readonlyrest.com/)), but these
solutions require careful configuration and knowledge of which endpoints are
"safe" for users. Elasticsearch accepts request bodies on `GET` requests, and
one proposed solution is to filter to only `GET` requests using a reverse proxy
like nginx. However, some safe endpoints (such as deleting scroll objects)
require other HTTP verbs, and most browsers do not support `GET` bodies, so
this is only a partial hack.
`es-public-proxy` is intended to be a simple and reliable alternative for the
use case of exposing popular search queries on specific indices to the public
web. HTTP requests are parsed and filtered in a safe, compiled language (Rust),
then only safe queries are re-serialized and forwarded to the backend search
instance listening on a different port.
Note that of course clients can still submit "expensive" queries of various
kinds which will slow down the host. Some of these can be disabled in
the elasticsearch configuration (this would disable those queries for all
connections, not just via the proxy). Some query types are simply not supported
by this proxy. In the future the proxy could gain configruation parameters and
smarter parsing of some query types (like `query_string`) to try and prevent
even more expensive queries.
## Installation
On Debian/Ubuntu Linux systems, the easiest way to get started is to download
and install an unsigned `.deb` from
<https://archive.org/download/es-public-proxy-deb>. This will include a
manpage, configuration file, and systemd unit file. After installing, edit the
configuration file (`/etc/es-public-proxy.toml`) and start the service like:
sudo systemctl start es-public-proxy
sudo systemctl enable es-public-proxy
On other platforms you can install and run on a per-user basis using the rust
toolchain with:
cargo install es-public-proxy
es-public-proxy --example-config > example.toml
# edit the configuration file
es-public-proxy --config example.toml
There is also a Dockerfile, but it isn't actively used and hasn't been pushed
to any image repository. Eg, unsure how best to inject configuration into a
docker image. You can build the image with:
docker build -f extra/Dockerfile .
## Configuration
In all cases you will want to explicitly enumerate all of the indices to have
public access. There is an `unsafe_all_indices` intended for prototyping, but
this may allow access to additional non-index API endpoints.
One simple deployment pattern is to put `nginx`, `es-public-proxy`, and
`elasticsearch` all on the same server. In this configuration, `nginx` would
listen on all network interfaces on ports 80 and 443, and handle SSL upgrade
redirects from 80 to 443, as well as add transport compression, restrict client
body payload limits, etc. `es-public-proxy` would listen on localhost port
9292, and connect back to elasticsearch on localhost port 9200.
## Limitations
Not all of the elasticsearch API has been implemented yet. In general, this
service is likely to be more strict in parsing and corner-cases. For example:
* URL query parameters like `?human` must be expanded into a boolean like `?human=true`
* Some cases where elasticsearch will allow short-cutting a full object into a
string, this proxy requires the full object format
* index patterns in configuration are not supported
## Development
To build this package you need the rust toolchain installed. We target stable
Rust, 2018 edition, version 1.45+.
Re-compiling the manpage requires [scdoc](https://git.sr.ht/~sircmpwn/scdoc).
Building a Debian package (`.deb`) requires the `cargo-deb` plugin, which you
can install with: `cargo install cargo-deb`
A Makefile is included to wrap common development commands, for example:
make test
make lint
make deb
Contributions are welcome! Would prefer to keep the number of dependant crates
low (eg, don't currently use a CLI argument parsing library), but open to
discussion. When sending patches or merge requests, it is helpful (but not
required) if you can include test coverage, re-run `cargo fmt`, and acknowledge
the license terms ahead of time.
|