diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2023-01-04 19:55:30 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2023-01-04 20:18:25 -0800 |
commit | 276ac2aa24166660bc6ffe7601cee44b5d848dae (patch) | |
tree | 8a35ce06e7ab9e6755b24abc41dee1115cf62788 /proposals/2020_ir_importer.spn | |
parent | ee46c33544941a5104182a2e221e841a32cbbf78 (diff) | |
download | fatcat-276ac2aa24166660bc6ffe7601cee44b5d848dae.tar.gz fatcat-276ac2aa24166660bc6ffe7601cee44b5d848dae.zip |
proposals: update status; add some old ones; consistent file names
Diffstat (limited to 'proposals/2020_ir_importer.spn')
-rw-r--r-- | proposals/2020_ir_importer.spn | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/proposals/2020_ir_importer.spn b/proposals/2020_ir_importer.spn new file mode 100644 index 00000000..ad561d7b --- /dev/null +++ b/proposals/2020_ir_importer.spn @@ -0,0 +1,25 @@ + +status: brainstorm + +Institutional Repository Importer +================================= + +Want to import content from IRs. Same general workflow for CORE, SHARE, BASE, +other aggregators. + +Filter input to only works with known/ingested fulltext. + +Lookup file by hash. If found, skip for now. In future might do +mapping/matching. + +Lookup by primary id (eg, CORE ident). If existing, can skip if it has file, or +add file/location directly. + +Two indirect lookups: by external ident (DOI, PMID), or fuzzy search match. If +we get either of these, want to do release/work grouping correctly. + +1. if we are certain of IR copy stage, then compare with existing release, + and/or lookup entire work for releases with same stage. update release or + add new release under same work. +2. not sure of IR copy stage. guess stage from sherpa/romeo color and proceed + to insert/update. |