diff options
Diffstat (limited to 'proposals')
| -rw-r--r-- | proposals/20190911_v04_schema_tweaks.md | 9 | ||||
| -rw-r--r-- | proposals/2020_elasticsearch_schemas.md | 3 | ||||
| -rw-r--r-- | proposals/2020_metadata_cleanups.md | 28 | 
3 files changed, 30 insertions, 10 deletions
| diff --git a/proposals/20190911_v04_schema_tweaks.md b/proposals/20190911_v04_schema_tweaks.md index 916e8816..f5253519 100644 --- a/proposals/20190911_v04_schema_tweaks.md +++ b/proposals/20190911_v04_schema_tweaks.md @@ -19,6 +19,7 @@ SQL (and API, and elasticsearch):  - TODO: release: switch how pages work? first/last?  - TODO: indication of peer-review process? at release or container level?  - TODO: container: separate canonical and disambiguating titles (?) +- TODO: container: "imprint" field?  - TODO: release inter-references using SCHOLIX/Datacite schema      https://zenodo.org/record/1120265      https://support.datacite.org/docs/connecting-research-outputs#section-related-identifiers @@ -37,11 +38,3 @@ API endpoints:    specific editor  See `2020_search_improvements` for elasticsearch-only schema updates. - -- releases *may* need an "_all" field (or `biblio`?) containing most fields to -  make some search experiences work -- releases should include volume, issue, pages -- releases *could* include reference and creator fatcat identifier lists, as a -  faster/cheaper mechanism for doing reverse lookups -- doi_prefix -- doi_registrar (?) diff --git a/proposals/2020_elasticsearch_schemas.md b/proposals/2020_elasticsearch_schemas.md index d931efd3..83db884f 100644 --- a/proposals/2020_elasticsearch_schemas.md +++ b/proposals/2020_elasticsearch_schemas.md @@ -18,6 +18,7 @@ Simple additions:  - OA license slug (?)  - `doi_prefix`  - `doi_registrar` (based on extra) +- `first_author` (surname; for matching)  "Array" keyword types for reverse lookups: @@ -51,6 +52,8 @@ able to do a better job of indicating OA status/policy for published works.  Not clear if this should be for "published" only, or whether we should try to  handle embargo time spans and dates. +Maybe also container `sherpa_romeo` color as a field? +  ## Release Merged Default Field diff --git a/proposals/2020_metadata_cleanups.md b/proposals/2020_metadata_cleanups.md index e53c47d3..cf6b08e5 100644 --- a/proposals/2020_metadata_cleanups.md +++ b/proposals/2020_metadata_cleanups.md @@ -45,7 +45,8 @@ is of the compressed body, not the actual inner file).  The current file URL metadata has a few warts:  - inconsistent or incorrect tagging of URL "rel" type. It is possible we should -  just strip/skip this tag and always recompute from scratch +  just strip/skip this tag and always recompute from scratch. Or target just +  those domains with >= 1% of links, or top 100 domains  - duplicate URLs (lack of normalization):      - `http://example.com/file.pdf`      - `http://example.com:80/file.pdf` @@ -72,7 +73,8 @@ a reasonable constraint, but am open to other opinions. I think that all web  URLs should be normalized for issues like `jsessionid` and `:80` port  specification. -In user interface we should limit to a single wayback link, and single link per domain. +In user interface we should limit to a single wayback link, and single link per +domain.  NOTE: "host" means the fully qualified domain hostname; domain means the  "registered" part of the domain. @@ -82,6 +84,8 @@ NOTE: "host" means the fully qualified domain hostname; domain means the  At some point, had many "NULL" publishers. +"NA" in ISSNe, ISSNp. Eg: <https://fatcat.wiki/container/s3gm7274mfe6fcs7e3jterqlri> +  "Type" coverage should be improved.  "Publisher type" (infered in various ways in chocula tool) could be included in @@ -107,3 +111,23 @@ A partial list:      - "Full title page with Editorial board (with Elsevier tree)"      - "Advisory Board Editorial Board" + +## Very Long Titles + +These are likely stubs, but the title is also "just too long". Could stash full +title in `extra`? + +- https://fatcat.wiki/release/4b7swn2zsvguvkzmt +    => crossref updated + +## Abstracts + +Bad: + +- https://qa.fatcat.wiki/release/nwd5kkilybf5vdhm3iduvhvbvq +- https://qa.fatcat.wiki/release/rkigixosmvgcvmlkb5aqeyznim + +Very long: + +- https://qa.fatcat.wiki/release/s2cafgwepvfqnjp4xicsx6amsa + | 
