summaryrefslogtreecommitdiffstats
path: root/proposals/work_schema.md
blob: 1e0f272e3019376b63a7d8383d20c37d32e3eb6a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

## Top-Level

- type: _doc
- key: keyword
- key_type: keyword (work or page)
- `work_id`
- biblio: obj
- fulltext: obj
- sim: obj
- abstracts: nested
    body
    lang
- releases: nested (TBD)
- access
- tags: array of keywords

TODO:
- summary fields to index "everything" into?

## Biblio

Mostly matches existing `fatcat_release` schema.

- `release_id`
- `release_revision`
- `title`
- `subtitle`
- `original_title`
- `release_date`
- `release_year`
- `withdrawn_status`
- `language`
- `country_code`
- `volume` (etc)
- `volume_int` (etc)
- `first_page`
- `first_page_int`
- `pages`
- `doi` etc
- `number` (etc)

NEW:
- `preservation_status`

[etc]

- `license_slug`
- `publisher` (etc)
- `container_name` (etc)
- `container_id`
- `container_issnl`
- `container_issn` (array)
- `contrib_names`
- `affiliations`
- `creator_ids`

## Fulltext

- `status`: web, sim, shadow
- `body`
- `lang`
- `file_mimetype`
- `file_sha1`
- `file_id`
- `thumbnail_url`

## Abstracts

Nested object with:

- body
- lang

For prototyping, perhaps just make it an object with `body` as an array.

Only index one abstract per language.

## SIM (Microfilm)

Enough details to construct a link or do a lookup or whatever. Note that might
be doing CDL status lookups on SERP pages.

Also pass-through archive.org metadata here (collection-level and item-level)

## Access

Start with obj, but maybe later nested?

- `status`: direct, cdl, repository, publisher, loginwall, paywall, etc
- `mimetype`
- `access_url`
- `file_url`
- `file_id`
- `release_id`