1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
|
'extra' fields:
doaj
as_of: datetime of most recent check; if not set, not actually in DOAJ
seal: bool
work_level: bool (are work-level publications deposited with DOAJ?)
archiving: array, can include 'library' or 'other'
road
as_of: datetime of most recent check; if not set, not actually in ROAD
pubmed (TODO: delete?)
as_of: datetime of most recent check; if not set, not actually indexed in pubmed
norwegian (TODO: drop this?)
as_of: datetime of most recent check; if not set, not actually indexed in pubmed
id (integer)
level (integer; 0-2)
kbart
lockss
year_rle
volume_rle
portico
...
clockss
...
sherpa_romeo
color
jstor
year_rle
volume_rle
scopus
id
TODO: print/electronic distinction?
wos
id
doi
crossref_doi: DOI of the title in crossref (if exists)
prefixes: array of strings (DOI prefixes, up to the '/'; any registrar, not just Crossref)
ia
sim
nap_id
year_rle
volume_rle
longtail: boolean
homepage
as_of: datetime of last attempt
url
status: HTTP/heritrix status of homepage crawl
issnp: string
issne: string
coden: string
abbrev: string
oclc_id: string (TODO: lookup?)
lccn_id: string (TODO: lookup?)
dblb_id: string
default_license: slug
original_name: native name (if name is translated)
platform: hosting platform: OJS, wordpress, scielo, etc
mimetypes: array of strings (eg, 'application/pdf', 'text/html')
first_year: year (integer)
last_year: if publishing has stopped
primary_language: single ISO code, or 'mixed'
languages: array of ISO codes
region: TODO: continent/world-region
nation: shortcode of nation
discipline: TODO: highest-level subject; "life science", "humanities", etc
field: TODO: narrower description of field
subjects: TODO?
url: homepage
is_oa: boolean. If true, can assume all releases under this container are "Open Access"
TODO: domains, if exclusive?
TODO: fulltext_regex, if a known pattern?
For KBART, etc:
We "over-count" on the assumption that "in-progress" status works will soon actually be preserved.
year and volume spans are run-length-encoded arrays, using integers:
- if an integer, means that year is preserved
- if an array of length 2, means everything between the two numbers (inclusive) is preserved
|