1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
|
Status: implemented
# Release External ID Refactor
Goal is to make the external identifier "namespace" (number of external
identifiers supported) less scarce, while still allowing fast lookups. Adding
new identifiers would still require API schema and rust code changes.
This change would also bring the number of columns on `release_rev` back under
32, which makes diesel happier.
Unclear whether existing extids (like DOI) should get merged and the old
columns dropped. This could be done consistently by, eg, a rust worker process
that re-writes tables, probably in multiple stages (eg, first copy, then update
code to read from new location, then drop old columns and indices). Perhaps for
very hot/popular columns (like DOI, and maybe ISBN13?) it is better to have
separate columns/indices.
It would be possible to roll this out as a SQL change first, with identifiers
still at the top level of the API schema, then later switch the API schema. Not
sure this is worth it though.
## New API
All identifiers as text
release_entity
ext_ids (required)
doi
pmid
pmcid
wikidata_qid
core
isbn13
arxiv
jstor
[...]
## New SQL Schema
Something like:
release_rev_extid (
release_rev_id UUID foreign key to release_rev.id
extid_type TEXT
extid_value TEXT
)
PRIMARY KEY (release_rev_id, extid_type)
INDEX (extid_type, extid_value)
### Existing prod Column Use
Queries like:
fatcat_prod=# SELECT COUNT(*) FROM release_rev WHERE wikidata_qid IS NOT NULL;
13460723
Results:
wikidata_qid: 13460723
isbn13: 1
core_id: 7160477
arxiv_id: 3
jstor_id: 0
pmcid: 3688945
Keep in SQL:
- `doi`
- `pmid`
- `pmcid`
- `wikidata_qid`
- `core_id`
Drop columns:
- `isbn13`
- `arxiv_id`
- `jstor_id`
In new table:
- isbn13
- arxiv
- jstor
- mag
- ark
- dblp
|