summaryrefslogtreecommitdiffstats
path: root/README.md
blob: 7e2984b48fb78ba79e0ceb5d7d0df7aa88fc9e55 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

              __       _            _   
             / _| __ _| |_ ___ __ _| |_ 
            | |_ / _` | __/ __/ _` | __|
            |  _| (_| | || (_| (_| | |_ 
            |_|  \__,_|\__\___\__,_|\__|

       perpetual access to the scholarly record


[![pipeline status](https://gitlab.com/bnewbold/fatcat/badges/master/pipeline.svg)](https://gitlab.com/bnewbold/fatcat/commits/master)
[![coverage report](https://gitlab.com/bnewbold/fatcat/badges/master/coverage.svg)](https://gitlab.com/bnewbold/fatcat/commits/master)

This repository contains source code for 'fatcat', an editable catalog of
published written works (mostly journal articles), with a focus on tracking
the location and status of full-text copies to ensure "perpetual access".

The [RFC](./fatcat-rfc.md) is the original design document, and the best place
to start for technical background. There is a work-in-progress "guide" at
<https://guide.fatcat.wiki>; the canonical public location of this repository
is <https://github.com/internetarchive/fatcat>.

The public production web interface is <https://fatcat.wiki>.

See the `LICENSE` file for detailed permissions and licensing of both python
and rust code. In short, the auto-generated client libraries are permissively
released, while the API server and web interface are strong copyleft (AGPLv3).

## Building and Tests

There are three main components:

- backend API server and database (in Rust)
- API client libraries and bots (in Python)
- front-end web interface (in Python; built on API and library)

Automated integration tests run on Gitlab CI (see `.gitlab-ci.yml`) on the
Internet Archive's internal (not public) infrastructure.

See `./python/README.md` and `./rust/README.md` for details on building,
running, and testing these components.

The python client library, which is automatically generated from the API
schema, lives under `./python_client/`.

## Status

- SQL and HTTP API schemas
    - [x] Basic entities
    - [x] one-to-many and many-to-many entities
    - [x] JSON(B) "extra" metadata fields
    - [x] full rev1 schema for all entities
    - [x] file sets and web captures
    - [x] editgroup review: annotations
- HTTP API Server
    - [x] base32 encoding of UUID identifiers
    - [x] inverse many-to-many helpers (files-by-release, release-by-creator)
    - [x] Authentication (eg, accounts, OAuth2, JWT)
    - [x] Authorization (aka, roles)
- Web Interface
    - [x] Migrate Python codebase
    - [ ] Creation and editing of all entities
- Other
    - [x] Elasticsearch schema
    - [x] Basic logging
    - [x] Swagger-UI 
    - [x] Bulk metadata exports
    - [x] Sentry (error reporting)
    - [x] Metrics