aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2020-01-22 13:41:11 -0800
committerBryan Newbold <bnewbold@robocracy.org>2020-01-22 13:41:11 -0800
commit2e3988fcf6441bef7ee4b030e499fd129e7cb189 (patch)
treea80218dec84b48b5ff7486a3ebe7502530f98fd0 /proposals
parentda64fa0b36218d7f9726aa98dff0e834c1845193 (diff)
downloadfatcat-2e3988fcf6441bef7ee4b030e499fd129e7cb189.tar.gz
fatcat-2e3988fcf6441bef7ee4b030e499fd129e7cb189.zip
more TODO/proposal cleanup
Diffstat (limited to 'proposals')
-rw-r--r--proposals/20190911_v04_schema_tweaks.md9
-rw-r--r--proposals/2020_elasticsearch_schemas.md3
-rw-r--r--proposals/2020_metadata_cleanups.md28
3 files changed, 30 insertions, 10 deletions
diff --git a/proposals/20190911_v04_schema_tweaks.md b/proposals/20190911_v04_schema_tweaks.md
index 916e8816..f5253519 100644
--- a/proposals/20190911_v04_schema_tweaks.md
+++ b/proposals/20190911_v04_schema_tweaks.md
@@ -19,6 +19,7 @@ SQL (and API, and elasticsearch):
- TODO: release: switch how pages work? first/last?
- TODO: indication of peer-review process? at release or container level?
- TODO: container: separate canonical and disambiguating titles (?)
+- TODO: container: "imprint" field?
- TODO: release inter-references using SCHOLIX/Datacite schema
https://zenodo.org/record/1120265
https://support.datacite.org/docs/connecting-research-outputs#section-related-identifiers
@@ -37,11 +38,3 @@ API endpoints:
specific editor
See `2020_search_improvements` for elasticsearch-only schema updates.
-
-- releases *may* need an "_all" field (or `biblio`?) containing most fields to
- make some search experiences work
-- releases should include volume, issue, pages
-- releases *could* include reference and creator fatcat identifier lists, as a
- faster/cheaper mechanism for doing reverse lookups
-- doi_prefix
-- doi_registrar (?)
diff --git a/proposals/2020_elasticsearch_schemas.md b/proposals/2020_elasticsearch_schemas.md
index d931efd3..83db884f 100644
--- a/proposals/2020_elasticsearch_schemas.md
+++ b/proposals/2020_elasticsearch_schemas.md
@@ -18,6 +18,7 @@ Simple additions:
- OA license slug (?)
- `doi_prefix`
- `doi_registrar` (based on extra)
+- `first_author` (surname; for matching)
"Array" keyword types for reverse lookups:
@@ -51,6 +52,8 @@ able to do a better job of indicating OA status/policy for published works.
Not clear if this should be for "published" only, or whether we should try to
handle embargo time spans and dates.
+Maybe also container `sherpa_romeo` color as a field?
+
## Release Merged Default Field
diff --git a/proposals/2020_metadata_cleanups.md b/proposals/2020_metadata_cleanups.md
index e53c47d3..cf6b08e5 100644
--- a/proposals/2020_metadata_cleanups.md
+++ b/proposals/2020_metadata_cleanups.md
@@ -45,7 +45,8 @@ is of the compressed body, not the actual inner file).
The current file URL metadata has a few warts:
- inconsistent or incorrect tagging of URL "rel" type. It is possible we should
- just strip/skip this tag and always recompute from scratch
+ just strip/skip this tag and always recompute from scratch. Or target just
+ those domains with >= 1% of links, or top 100 domains
- duplicate URLs (lack of normalization):
- `http://example.com/file.pdf`
- `http://example.com:80/file.pdf`
@@ -72,7 +73,8 @@ a reasonable constraint, but am open to other opinions. I think that all web
URLs should be normalized for issues like `jsessionid` and `:80` port
specification.
-In user interface we should limit to a single wayback link, and single link per domain.
+In user interface we should limit to a single wayback link, and single link per
+domain.
NOTE: "host" means the fully qualified domain hostname; domain means the
"registered" part of the domain.
@@ -82,6 +84,8 @@ NOTE: "host" means the fully qualified domain hostname; domain means the
At some point, had many "NULL" publishers.
+"NA" in ISSNe, ISSNp. Eg: <https://fatcat.wiki/container/s3gm7274mfe6fcs7e3jterqlri>
+
"Type" coverage should be improved.
"Publisher type" (infered in various ways in chocula tool) could be included in
@@ -107,3 +111,23 @@ A partial list:
- "Full title page with Editorial board (with Elsevier tree)"
- "Advisory Board Editorial Board"
+
+## Very Long Titles
+
+These are likely stubs, but the title is also "just too long". Could stash full
+title in `extra`?
+
+- https://fatcat.wiki/release/4b7swn2zsvguvkzmt
+ => crossref updated
+
+## Abstracts
+
+Bad:
+
+- https://qa.fatcat.wiki/release/nwd5kkilybf5vdhm3iduvhvbvq
+- https://qa.fatcat.wiki/release/rkigixosmvgcvmlkb5aqeyznim
+
+Very long:
+
+- https://qa.fatcat.wiki/release/s2cafgwepvfqnjp4xicsx6amsa
+