diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2018-09-20 20:20:43 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-09-20 20:20:43 -0700 |
commit | 182413ad4946d715aabf67c396d688fbb5d1c0eb (patch) | |
tree | 7f4c748b527c96d21fdd99a6c9f8a47908f076b7 /guide/src/sources.md | |
parent | da8911b029f06023d5d8f8aad3cc845583e6d708 (diff) | |
download | fatcat-182413ad4946d715aabf67c396d688fbb5d1c0eb.tar.gz fatcat-182413ad4946d715aabf67c396d688fbb5d1c0eb.zip |
progress on guide
Diffstat (limited to 'guide/src/sources.md')
-rw-r--r-- | guide/src/sources.md | 28 |
1 files changed, 28 insertions, 0 deletions
diff --git a/guide/src/sources.md b/guide/src/sources.md index e70306d4..b8853d8a 100644 --- a/guide/src/sources.md +++ b/guide/src/sources.md @@ -1 +1,29 @@ # Sources + +The core metadata bootstrap sources, by entity type, are: + +- `releases`: Crossref metadata, with DOIs as the primary identifier, and + PubMed (central), Wikidata, and [CORE]() identifiers cross-referenced +- `containers`: munged metadata from the DOAJ, ROAD, and Norwegian journal + list, with ISSN-Ls as the primary identifier. ISSN provides an "ISSN to + ISSN-L" mapping to normalize electronic and print ISSN numbers. +- `creators`: ORCID metadata and identifier. + +Initial `file` metadata and matches (file-to-release) come from earlier +Internet Archive matching efforts, and in particular efforts to extra +bibliographic metadata from PDFs (using GROBID) and fuzzy match (with +conservative settings) to Crossref metadata. + +[CORE]: https://core.ac.uk + +The intent is to continuously ingest and merge metadata from a small number of +large (~2-3 million more more records) general-purpose aggregators and catalogs +in a centralized fashion, using bots, and then support volunteers and +organizations in writing bots to merge high-quality metadata from field or +institution-specific catalogs. + +Progeny information (where the metadata comes from, or who "makes specific +claims") is stored in edit metadata in the data model. Value-level attribution +cna be achived by looking at the full edit history for an entity as a series of +patches. + |