summaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-08-26 20:22:04 -0700
committerBryan Newbold <bnewbold@archive.org>2020-08-26 20:22:04 -0700
commitce3cf99f4de059bb204c1a5846df3be6a3fe5f0c (patch)
tree1659250a38fa833b99b52f889a8727fbe56821f5 /README.md
parent972a23c9c9561828af0824d50e031b99e3027c7d (diff)
downloades-public-proxy-ce3cf99f4de059bb204c1a5846df3be6a3fe5f0c.tar.gz
es-public-proxy-ce3cf99f4de059bb204c1a5846df3be6a3fe5f0c.zip
update README
Diffstat (limited to 'README.md')
-rw-r--r--README.md158
1 files changed, 116 insertions, 42 deletions
diff --git a/README.md b/README.md
index 5920b9c..620d039 100644
--- a/README.md
+++ b/README.md
@@ -1,43 +1,117 @@
-**es-public-proxy**: Elasticsearch API proxy intended to be exposed to the
-public internet (or any non-localhost clients) for safe read-only queries
-
-This is intended as a simple alternative to other "read-only" plugins or
-authentication solutions for elasticsearch. A benefit of keeping the
-elasticsearch API itself, instead of building a application-layer wrapper, is
-that there already exist client libraries, tools, and integrations in many
-languages.
-
-Plan:
-
-- single Rust executable
-- fast and simple enough to never impact performance or latency
-- TOML configuration
-- some modern async/await framework
-- use official elasticsearch crate? or just reqwest?
-- small subset of total public API: get, search, scroll
-- per-index permissions
-- return response bodies untouched
-- parse queries with serde JSON, then re-serialize
-
-Stretch or future goals:
-
-- parsing Lucene `query_string`
-- provide an alternate simpler API
-- query caching
-- index aliases and routing
-- version mapping (eg, expose 7.x API for 6.x index)
-
-Non-features:
-
-- TLS (use a general purpose reverse proxy)
-
-## Deployment
-
-The imagined use case is that you have elasticsearch proper listening only to
-localhost connections with plain HTTP. This makes adminstration easy from
-authenticated local UNIX users. No non-localhost connections to elasticsearch
-are allowed, even from trusted clients. This daemon runs as a small sidecar
-proxy on localhost, listening on a public port. All non-localhost clients
-direct queries through the proxy, which parses the query, ensures it is "safe",
-then passes through to backend.
+**es-public-proxy**: simple HTTP reverse-proxy for exposing an Elasticsearch
+node to the public internet
+
+* type-safe de-serialization and re-serialization of all user data
+* single-binary, easy to install
+* simple configuration with sane defaults
+* low-overhead in network latency and compute resources
+* optional CORS headers for direct browser requests
+* SSL, transport compression, load-balancing, observability, and rate-limiting
+ are left to other tools like nginx, caddy, or HAproxy
+* free software forever: AGPLv3+ license
+
+The Elasticsearch REST API is powerful, well documented, and has client library
+implementations for many programming languages. For datasets and services which
+contain only public information, it would be convenient to provide direct
+access to at least a subset of the API for anybody to take advantage of. The
+Elasticsearch maintainers warn against this behavior, on the basis that the API
+is not designed for public use. Recent versions of Elasticsearch have an
+authentication/authorization subsystem, and there are third-party plugins for
+read-only access (such as [ReadonlyREST](https://readonlyrest.com/)), but these
+solutions require careful configuration and knowledge of which endpoints are
+"safe" for users. Elasticsearch accepts request bodies on `GET` requests, and
+one proposed solution is to filter to only `GET` requests using a reverse proxy
+like nginx. However, some safe endpoints (such as deleting scroll objects)
+require other HTTP verbs, and most browsers do not support `GET` bodies, so
+this is only a partial hack.
+
+`es-public-proxy` is intended to be a simple and reliable alternative for the
+use case of exposing popular search queries on specific indices to the public
+web. HTTP requests are parsed and filtered in a safe, compiled language (Rust),
+then only safe queries are re-serialized and forwarded to the backend search
+instance listening on a different port.
+
+Note that of course clients can still submit "expensive" queries of various
+kinds which will slow down the host. Some of these can be disabled in
+the elasticsearch configuration (this would disable those queries for all
+connections, not just via the proxy). Some query types are simply not supported
+by this proxy. In the future the proxy could gain configruation parameters and
+smarter parsing of some query types (like `query_string`) to try and prevent
+even more expensive queries.
+
+
+## Installation
+
+On Debian/Ubuntu Linux systems, the easiest way to get started is to download
+and install an unsigned `.deb` from
+<https://archive.org/download/es-public-proxy-deb>. This will include a
+manpage, configuration file, and systemd unit file. After installing, edit the
+configuration file (`/etc/es-public-proxy.toml`) and start the service like:
+
+ sudo systemctl start es-public-proxy
+ sudo systemctl enable es-public-proxy
+
+On other platforms you can install and run on a per-user basis using the rust
+toolchain with:
+
+ cargo install es-public-proxy
+ es-public-proxy --example-config > example.toml
+
+ # edit the configuration file
+
+ es-public-proxy --config example.toml
+
+There is also a Dockerfile, but it isn't actively used and hasn't been pushed
+to any image repository. Eg, unsure how best to inject configuration into a
+docker image. You can build the image with:
+
+ docker build -f extra/Dockerfile .
+
+
+## Configuration
+
+In all cases you will want to explicitly enumerate all of the indices to have
+public access. There is an `unsafe_all_indices` intended for prototyping, but
+this may allow access to additional non-index API endpoints.
+
+One simple deployment pattern is to put `nginx`, `es-public-proxy`, and
+`elasticsearch` all on the same server. In this configuration, `nginx` would
+listen on all network interfaces on ports 80 and 443, and handle SSL upgrade
+redirects from 80 to 443, as well as add transport compression, restrict client
+body payload limits, etc. `es-public-proxy` would listen on localhost port
+9292, and connect back to elasticsearch on localhost port 9200.
+
+
+## Limitations
+
+Not all of the elasticsearch API has been implemented yet. In general, this
+service is likely to be more strict in parsing and corner-cases. For example:
+
+* URL query parameters like `?human` must be expanded into a boolean like `?human=true`
+* Some cases where elasticsearch will allow short-cutting a full object into a
+ string, this proxy requires the full object format
+* index patterns in configuration are not supported
+
+
+## Development
+
+To build this package you need the rust toolchain installed. We target stable
+Rust, 2018 edition, version 1.45+.
+
+Re-compiling the manpage requires [scdoc](https://git.sr.ht/~sircmpwn/scdoc).
+
+Building a Debian package (`.deb`) requires the `cargo-deb` plugin, which you
+can install with: `cargo install cargo-deb`
+
+A Makefile is included to wrap common development commands, for example:
+
+ make test
+ make lint
+ make deb
+
+Contributions are welcome! Would prefer to keep the number of dependant crates
+low (eg, don't currently use a CLI argument parsing library), but open to
+discussion. When sending patches or merge requests, it is helpful (but not
+required) if you can include test coverage, re-run `cargo fmt`, and acknowledge
+the license terms ahead of time.