blob: 0328ff7e2e261d3640c74ad6019e02108d6a8612 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
Looking at the Microsoft Academic Graph schema: https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema
My take-aways from this are:
- should allow storing raw affiliations today in release_contrib rows, and some
day have a foreign key to institution there
- maybe should have an "original_title" field for releases? though could go in
'extra' (along with subtitle)
- have a well-known 'extra' key to use saving citation context in references
## Data Model (high-level)
Includes rich affiliation (at the per-paper level) and "field of study"
tagging.
No work/release distinction.
There are URLs, but no file-level metadata.
Don't store full abstracts for legal reasons.
## Details (lower-level)
Across many entities, there are "normalized" and "display" names.
Some stats are aggregated: paper and citation counts
#### Affilitions
Institution names: "normalized" vs. "display"
"GRID" id?
What is the WikiPage? Wikipedia?
#### Authors
Saves "last known" affiliation.
#### Field of Study
Nested hierarchy
#### Citations
"Context" table stores... presumably text around the citaiton itself.
"References" table stores little metadata about the citation itself.
#### Papers
Paper URLs now have types (an int).
"Paper Title" / "Original Title" / "Book Title"
Year and Date separately (same as fatcat)
Stores first and last page separately.
"Original Venue" (string), presumably name of the container/journal
Has arbitrary resources (URLs)
|