diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 158 |
1 files changed, 116 insertions, 42 deletions
@@ -1,43 +1,117 @@ -**es-public-proxy**: Elasticsearch API proxy intended to be exposed to the -public internet (or any non-localhost clients) for safe read-only queries - -This is intended as a simple alternative to other "read-only" plugins or -authentication solutions for elasticsearch. A benefit of keeping the -elasticsearch API itself, instead of building a application-layer wrapper, is -that there already exist client libraries, tools, and integrations in many -languages. - -Plan: - -- single Rust executable -- fast and simple enough to never impact performance or latency -- TOML configuration -- some modern async/await framework -- use official elasticsearch crate? or just reqwest? -- small subset of total public API: get, search, scroll -- per-index permissions -- return response bodies untouched -- parse queries with serde JSON, then re-serialize - -Stretch or future goals: - -- parsing Lucene `query_string` -- provide an alternate simpler API -- query caching -- index aliases and routing -- version mapping (eg, expose 7.x API for 6.x index) - -Non-features: - -- TLS (use a general purpose reverse proxy) - -## Deployment - -The imagined use case is that you have elasticsearch proper listening only to -localhost connections with plain HTTP. This makes adminstration easy from -authenticated local UNIX users. No non-localhost connections to elasticsearch -are allowed, even from trusted clients. This daemon runs as a small sidecar -proxy on localhost, listening on a public port. All non-localhost clients -direct queries through the proxy, which parses the query, ensures it is "safe", -then passes through to backend. +**es-public-proxy**: simple HTTP reverse-proxy for exposing an Elasticsearch +node to the public internet + +* type-safe de-serialization and re-serialization of all user data +* single-binary, easy to install +* simple configuration with sane defaults +* low-overhead in network latency and compute resources +* optional CORS headers for direct browser requests +* SSL, transport compression, load-balancing, observability, and rate-limiting + are left to other tools like nginx, caddy, or HAproxy +* free software forever: AGPLv3+ license + +The Elasticsearch REST API is powerful, well documented, and has client library +implementations for many programming languages. For datasets and services which +contain only public information, it would be convenient to provide direct +access to at least a subset of the API for anybody to take advantage of. The +Elasticsearch maintainers warn against this behavior, on the basis that the API +is not designed for public use. Recent versions of Elasticsearch have an +authentication/authorization subsystem, and there are third-party plugins for +read-only access (such as [ReadonlyREST](https://readonlyrest.com/)), but these +solutions require careful configuration and knowledge of which endpoints are +"safe" for users. Elasticsearch accepts request bodies on `GET` requests, and +one proposed solution is to filter to only `GET` requests using a reverse proxy +like nginx. However, some safe endpoints (such as deleting scroll objects) +require other HTTP verbs, and most browsers do not support `GET` bodies, so +this is only a partial hack. + +`es-public-proxy` is intended to be a simple and reliable alternative for the +use case of exposing popular search queries on specific indices to the public +web. HTTP requests are parsed and filtered in a safe, compiled language (Rust), +then only safe queries are re-serialized and forwarded to the backend search +instance listening on a different port. + +Note that of course clients can still submit "expensive" queries of various +kinds which will slow down the host. Some of these can be disabled in +the elasticsearch configuration (this would disable those queries for all +connections, not just via the proxy). Some query types are simply not supported +by this proxy. In the future the proxy could gain configruation parameters and +smarter parsing of some query types (like `query_string`) to try and prevent +even more expensive queries. + + +## Installation + +On Debian/Ubuntu Linux systems, the easiest way to get started is to download +and install an unsigned `.deb` from +<https://archive.org/download/es-public-proxy-deb>. This will include a +manpage, configuration file, and systemd unit file. After installing, edit the +configuration file (`/etc/es-public-proxy.toml`) and start the service like: + + sudo systemctl start es-public-proxy + sudo systemctl enable es-public-proxy + +On other platforms you can install and run on a per-user basis using the rust +toolchain with: + + cargo install es-public-proxy + es-public-proxy --example-config > example.toml + + # edit the configuration file + + es-public-proxy --config example.toml + +There is also a Dockerfile, but it isn't actively used and hasn't been pushed +to any image repository. Eg, unsure how best to inject configuration into a +docker image. You can build the image with: + + docker build -f extra/Dockerfile . + + +## Configuration + +In all cases you will want to explicitly enumerate all of the indices to have +public access. There is an `unsafe_all_indices` intended for prototyping, but +this may allow access to additional non-index API endpoints. + +One simple deployment pattern is to put `nginx`, `es-public-proxy`, and +`elasticsearch` all on the same server. In this configuration, `nginx` would +listen on all network interfaces on ports 80 and 443, and handle SSL upgrade +redirects from 80 to 443, as well as add transport compression, restrict client +body payload limits, etc. `es-public-proxy` would listen on localhost port +9292, and connect back to elasticsearch on localhost port 9200. + + +## Limitations + +Not all of the elasticsearch API has been implemented yet. In general, this +service is likely to be more strict in parsing and corner-cases. For example: + +* URL query parameters like `?human` must be expanded into a boolean like `?human=true` +* Some cases where elasticsearch will allow short-cutting a full object into a + string, this proxy requires the full object format +* index patterns in configuration are not supported + + +## Development + +To build this package you need the rust toolchain installed. We target stable +Rust, 2018 edition, version 1.45+. + +Re-compiling the manpage requires [scdoc](https://git.sr.ht/~sircmpwn/scdoc). + +Building a Debian package (`.deb`) requires the `cargo-deb` plugin, which you +can install with: `cargo install cargo-deb` + +A Makefile is included to wrap common development commands, for example: + + make test + make lint + make deb + +Contributions are welcome! Would prefer to keep the number of dependant crates +low (eg, don't currently use a CLI argument parsing library), but open to +discussion. When sending patches or merge requests, it is helpful (but not +required) if you can include test coverage, re-run `cargo fmt`, and acknowledge +the license terms ahead of time. |