aboutsummaryrefslogtreecommitdiffstats
path: root/notes/schema
diff options
context:
space:
mode:
Diffstat (limited to 'notes/schema')
-rw-r--r--notes/schema/alignments.txt18
-rw-r--r--notes/schema/mag_schema_comparison.txt65
2 files changed, 82 insertions, 1 deletions
diff --git a/notes/schema/alignments.txt b/notes/schema/alignments.txt
index e7678d93..7fc37606 100644
--- a/notes/schema/alignments.txt
+++ b/notes/schema/alignments.txt
@@ -27,9 +27,25 @@ Specifically, the "variables" and type definitions: <http://docs.citationstyles.
- rights/license (for explicit OA)
- version (eg, for software, standards)
- url (eg, for blog posts and other web content; canonical only)
+- authority (for things like patents)
+- collection_title (for book series)
+- short_title
+- edition (eg, "4th")
+- event (eg, conference)
+- chapter_number
+- submitted
+
+"extra" for citations:
+- most of the above, or any fields from 'release"
+- authors (an array)
+- url
+- issue, volume, date, edition
+- accessed_date
+
+release_date aligns with... 'issued'? not original-date
+pages aligns with 'page'. Should this be 'locator'?
other things:
-- align cite-items even closer with CSL? assuming this is what crossref is doing
- anything specially needed for a blog post? url (original/canonical)?
- press_release
diff --git a/notes/schema/mag_schema_comparison.txt b/notes/schema/mag_schema_comparison.txt
new file mode 100644
index 00000000..0328ff7e
--- /dev/null
+++ b/notes/schema/mag_schema_comparison.txt
@@ -0,0 +1,65 @@
+
+Looking at the Microsoft Academic Graph schema: https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema
+
+My take-aways from this are:
+
+- should allow storing raw affiliations today in release_contrib rows, and some
+ day have a foreign key to institution there
+- maybe should have an "original_title" field for releases? though could go in
+ 'extra' (along with subtitle)
+- have a well-known 'extra' key to use saving citation context in references
+
+
+## Data Model (high-level)
+
+Includes rich affiliation (at the per-paper level) and "field of study"
+tagging.
+
+No work/release distinction.
+
+There are URLs, but no file-level metadata.
+
+Don't store full abstracts for legal reasons.
+
+
+## Details (lower-level)
+
+Across many entities, there are "normalized" and "display" names.
+
+Some stats are aggregated: paper and citation counts
+
+#### Affilitions
+
+Institution names: "normalized" vs. "display"
+
+"GRID" id?
+
+What is the WikiPage? Wikipedia?
+
+#### Authors
+
+Saves "last known" affiliation.
+
+#### Field of Study
+
+Nested hierarchy
+
+#### Citations
+
+"Context" table stores... presumably text around the citaiton itself.
+
+"References" table stores little metadata about the citation itself.
+
+#### Papers
+
+Paper URLs now have types (an int).
+
+"Paper Title" / "Original Title" / "Book Title"
+
+Year and Date separately (same as fatcat)
+
+Stores first and last page separately.
+
+"Original Venue" (string), presumably name of the container/journal
+
+Has arbitrary resources (URLs)