diff options
Diffstat (limited to 'notes/schema')
-rw-r--r-- | notes/schema/alignments.txt | 18 | ||||
-rw-r--r-- | notes/schema/mag_schema_comparison.txt | 65 |
2 files changed, 82 insertions, 1 deletions
diff --git a/notes/schema/alignments.txt b/notes/schema/alignments.txt index e7678d93..7fc37606 100644 --- a/notes/schema/alignments.txt +++ b/notes/schema/alignments.txt @@ -27,9 +27,25 @@ Specifically, the "variables" and type definitions: <http://docs.citationstyles. - rights/license (for explicit OA) - version (eg, for software, standards) - url (eg, for blog posts and other web content; canonical only) +- authority (for things like patents) +- collection_title (for book series) +- short_title +- edition (eg, "4th") +- event (eg, conference) +- chapter_number +- submitted + +"extra" for citations: +- most of the above, or any fields from 'release" +- authors (an array) +- url +- issue, volume, date, edition +- accessed_date + +release_date aligns with... 'issued'? not original-date +pages aligns with 'page'. Should this be 'locator'? other things: -- align cite-items even closer with CSL? assuming this is what crossref is doing - anything specially needed for a blog post? url (original/canonical)? - press_release diff --git a/notes/schema/mag_schema_comparison.txt b/notes/schema/mag_schema_comparison.txt new file mode 100644 index 00000000..0328ff7e --- /dev/null +++ b/notes/schema/mag_schema_comparison.txt @@ -0,0 +1,65 @@ + +Looking at the Microsoft Academic Graph schema: https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema + +My take-aways from this are: + +- should allow storing raw affiliations today in release_contrib rows, and some + day have a foreign key to institution there +- maybe should have an "original_title" field for releases? though could go in + 'extra' (along with subtitle) +- have a well-known 'extra' key to use saving citation context in references + + +## Data Model (high-level) + +Includes rich affiliation (at the per-paper level) and "field of study" +tagging. + +No work/release distinction. + +There are URLs, but no file-level metadata. + +Don't store full abstracts for legal reasons. + + +## Details (lower-level) + +Across many entities, there are "normalized" and "display" names. + +Some stats are aggregated: paper and citation counts + +#### Affilitions + +Institution names: "normalized" vs. "display" + +"GRID" id? + +What is the WikiPage? Wikipedia? + +#### Authors + +Saves "last known" affiliation. + +#### Field of Study + +Nested hierarchy + +#### Citations + +"Context" table stores... presumably text around the citaiton itself. + +"References" table stores little metadata about the citation itself. + +#### Papers + +Paper URLs now have types (an int). + +"Paper Title" / "Original Title" / "Book Title" + +Year and Date separately (same as fatcat) + +Stores first and last page separately. + +"Original Venue" (string), presumably name of the container/journal + +Has arbitrary resources (URLs) |