diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2019-01-22 12:37:55 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2019-01-22 12:37:55 -0800 |
commit | c364e3cb9c55d36771e274cbac3d8825798b1612 (patch) | |
tree | aa7f3751e2bb8977c3d0f5d701fd1d8236234564 /notes/schema/mag_schema_comparison.txt | |
parent | aaf87ac3ad1355dcc1f534ce8e104e23a99999a6 (diff) | |
download | fatcat-c364e3cb9c55d36771e274cbac3d8825798b1612.tar.gz fatcat-c364e3cb9c55d36771e274cbac3d8825798b1612.zip |
MAG schema notes
Diffstat (limited to 'notes/schema/mag_schema_comparison.txt')
-rw-r--r-- | notes/schema/mag_schema_comparison.txt | 65 |
1 files changed, 65 insertions, 0 deletions
diff --git a/notes/schema/mag_schema_comparison.txt b/notes/schema/mag_schema_comparison.txt new file mode 100644 index 00000000..0328ff7e --- /dev/null +++ b/notes/schema/mag_schema_comparison.txt @@ -0,0 +1,65 @@ + +Looking at the Microsoft Academic Graph schema: https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema + +My take-aways from this are: + +- should allow storing raw affiliations today in release_contrib rows, and some + day have a foreign key to institution there +- maybe should have an "original_title" field for releases? though could go in + 'extra' (along with subtitle) +- have a well-known 'extra' key to use saving citation context in references + + +## Data Model (high-level) + +Includes rich affiliation (at the per-paper level) and "field of study" +tagging. + +No work/release distinction. + +There are URLs, but no file-level metadata. + +Don't store full abstracts for legal reasons. + + +## Details (lower-level) + +Across many entities, there are "normalized" and "display" names. + +Some stats are aggregated: paper and citation counts + +#### Affilitions + +Institution names: "normalized" vs. "display" + +"GRID" id? + +What is the WikiPage? Wikipedia? + +#### Authors + +Saves "last known" affiliation. + +#### Field of Study + +Nested hierarchy + +#### Citations + +"Context" table stores... presumably text around the citaiton itself. + +"References" table stores little metadata about the citation itself. + +#### Papers + +Paper URLs now have types (an int). + +"Paper Title" / "Original Title" / "Book Title" + +Year and Date separately (same as fatcat) + +Stores first and last page separately. + +"Original Venue" (string), presumably name of the container/journal + +Has arbitrary resources (URLs) |