diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2021-07-27 19:54:11 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2021-07-27 19:54:11 -0700 |
commit | bfdbbdd50ab06d28a2099e408ff154b0ce1cbc4b (patch) | |
tree | 95e09e1cc9917d24ebfe59d67fecc5e76f40e3ec | |
parent | ed56037d929d50abab707ee5eb9f583789a8ac7a (diff) | |
download | fatcat-bfdbbdd50ab06d28a2099e408ff154b0ce1cbc4b.tar.gz fatcat-bfdbbdd50ab06d28a2099e408ff154b0ce1cbc4b.zip |
start CHANGELOG for refs work
-rw-r--r-- | CHANGELOG.md | 5 | ||||
-rw-r--r-- | guide/src/SUMMARY.md | 2 | ||||
-rw-r--r-- | guide/src/reference_graph.md | 9 | ||||
-rw-r--r-- | guide/src/search_api.md | 29 |
4 files changed, 45 insertions, 0 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md index ffa4a8b3..3b171fa5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,6 +16,11 @@ See also: ## Unreleased +### Added + +- reference graph views, based on fuzzy reference dataset in `cgraph` and + `fatcat-scholar` projects, stored in elasticsearch index + ### Fixed - viewing deleted release entities no longer result in 500 error diff --git a/guide/src/SUMMARY.md b/guide/src/SUMMARY.md index ffc80ac2..c7d12cb0 100644 --- a/guide/src/SUMMARY.md +++ b/guide/src/SUMMARY.md @@ -8,6 +8,7 @@ - [Goals and Related Projects](./goals.md) - [Data Model](./data_model.md) - [Editing Workflow](./workflow.md) + - [Reference Graph](./reference_graph.md) - [Sources of Metadata](./sources.md) - [Implementation and Infrastructure](./implementation.md) - [Roadmap](./roadmap.md) @@ -21,6 +22,7 @@ - [Release](./entity_release.md) - [Work](./entity_work.md) - [Public API](./http_api.md) + - [Search API](./search_api.md) - [Bulk Exports](./bulk_exports.md) - [Cookbook](./cookbook.md) - [Contributing](./contributing.md) diff --git a/guide/src/reference_graph.md b/guide/src/reference_graph.md new file mode 100644 index 00000000..3b773150 --- /dev/null +++ b/guide/src/reference_graph.md @@ -0,0 +1,9 @@ + +# Reference Graph + +As a new feature, fuzzy-matched references are available on an "inbound" and +"outbound" basis in the web interface. + +The backend reference graph is available via the [Search API](./search_api.md) +under the `fatcat_ref` index. + diff --git a/guide/src/search_api.md b/guide/src/search_api.md new file mode 100644 index 00000000..91b7c8e9 --- /dev/null +++ b/guide/src/search_api.md @@ -0,0 +1,29 @@ + +# Search API + +The Elasticsearch indices used to power metadata search, statistics, and graphs +on the fatcat web interface are exposed publicly at +`https://search.fatcat.wiki`. Third parties can make queries using the +Elasticsearch API, which is well documented online and has client libraries in +many programming languages. + +A thin proxy (`es-public-proxy`) filters requests to avoid expensive queries +which could cause problems for search queries on the web interface, but most of +the Elasticsearch API is supported, including powerful aggregation queries. + +There is a short delay between updates to the fatcat catalog (via the main API) +and updates to the search index. + +Notable indices include: + +- `fatcat_release`: release entity metadata +- `fatcat_container`: container entity metadata +- `fatcat_ref`: reference graph + +Schemas for these indices can be fetched directly from the index (eg, +`https://search.fatcat.wiki/fatcat_release/_mapping`), and are versioned in the +fatcat git repository under `fatcat:extra/eleasticsearch/`. They are a +simplification and transform of the regular entity schemas, and include some +synthesized fields (such as "preservation status" for releases). Note that the +search schemas are likely to change over time with less notice and stability +guarantees than the primary catalog API schema. |