update TODO file

author: Bryan Newbold <bnewbold@archive.org> 2023-01-04 21:24:57 -0800
committer: Bryan Newbold <bnewbold@archive.org> 2023-01-04 21:28:26 -0800
commit: cf2f45bd1b35ba0ce82ba9426ae90bade46d27d7 (patch)
tree: 3949b23224df8cf07fd4910368cd3973bdedc313 /TODO.txt
parent: 38ba3d6a9c5138afbd82a1a5025f43e08bbba6a2 (diff)
download: fatcat-scholar-cf2f45bd1b35ba0ce82ba9426ae90bade46d27d7.tar.gz
fatcat-scholar-cf2f45bd1b35ba0ce82ba9426ae90bade46d27d7.zip
1 files changed, 34 insertions, 32 deletions
diff --git a/TODO.txt b/TODO.txt
index 9931491..68618a4 100644
--- a/TODO.txt
+++ b/TODO.txt
@@ -1,57 +1,59 @@
 
-more accessibility:
-- try lynx
-- lynx filters: "no form action defined"
-- lynx filters: duplicated (would be good to de-dupe in general)
-- "missing image" alt tag
-- "hit count" should not be <h3> (see also lynx)
-- fulltext download: <a> alt/title
-- all footer headers should be same header class
-- should SERP page have an <h1>? "Search Results"? hidden?
-- search ARIA pages
-- search filter labels not linked correctly?
-- orange tag labels: low contrast?
-- next/previous grey low-contrast
-
+- have i18n stuff set 'Content-Language' in response header
+- only add id_ to PDF replay (not other content types)
+- add 'hdl' to schema (release extid)
 
-language notes:
-- french would open up canadian partnerships?
-- UN languages (there are 6)
+human names:
+    https://www.librarian.net/stax/5222/what-you-learn-in-library-school-whats-in-a-name/
 
+- refactor publisher domain/link lookup into python code (not jinja2 template logic)
+- async elasticsearch: https://elasticsearch-py.readthedocs.io/en/v7.10.1/async.html
+    => not until elasticsearch-dsl support exists
+- adopt a better "remove tags" library in clean_str() (replace bs4)
+    https://stackoverflow.com/a/57648173/4682349
 
-- Onion-Location header
-- es 7.x library breaks QA searching
-    => setup small public QA index somewhere public
-- <meta> description
-- canonical links?
-- "clear filters" link/button
-- jinja2: "if xyz is defined" better than "if xyz"
-- "default" translation option (clear prefix, use browser default)
-- detect browser-requested languages for default language
+work in progress:
+- SIM respect 'noindex' (verify)
+- SIM fetch much less metadata (changes in API?)
 
 copy editing:
 - "how it works" page
 - "Contribute" -> "How To Participate"
 - web.archive.org not found -> resolved
-- link to "known issues" from alpha warning?
-- /alpha page, include known issues there
+
+workers:
+- fetch worker: filter by changelog or 'updated' datetime
+- work deletion: some bundle version/variant to allow deleting ES documents?
+- SIM updater
+x what is the process for updating issue DB? cronjob? at least have a makefile target
+    => not really a problem now that SIM is ~done
+
+SIM:
+x SIM parallelism
+x fatcat lookup: ISSN/ISSN-L
+- ambiguity. sim_pubid only extra? are both ISSNs available?
+- add page count to issn-db (?)
 
 content/pipeline:
-- continuous update worker from fatcat
 - add gzip to intermediate files pipeline commands
-- parallelize SIM indexing
 - makefile targets for bulk ingest
 
 cleanups:
-- "web assets" (CSS etc) in this repo or on *.archive.org in general
+x "web assets" (CSS etc) in this repo or on *.archive.org in general
+x have "json to IntermediateBundle" be a helper method, instead of multiple implementations
 - better typing/annotation of work pipeline
 - test coverage
 - use settings.toml for defaults of CLI args
 
 ponder:
-- smaller author font size (?)
 - "search inside" phrasing
 - "counts" target to summarize (to console)
+- Onion-Location header
+- <meta> description
+- canonical links?
+- jinja2: "if xyz is defined" better than "if xyz"
+- "default" translation option (clear prefix, use browser default)
+- should SERP page have an <h1>? "Search Results"? hidden?
 
 data quality:
 - handle sim_issue items with multiple issues in single item (eg, issue="3-4")
author	Bryan Newbold <bnewbold@archive.org>	2023-01-04 21:24:57 -0800
committer	Bryan Newbold <bnewbold@archive.org>	2023-01-04 21:28:26 -0800
commit	cf2f45bd1b35ba0ce82ba9426ae90bade46d27d7 (patch)
tree	3949b23224df8cf07fd4910368cd3973bdedc313 /TODO.txt
parent	38ba3d6a9c5138afbd82a1a5025f43e08bbba6a2 (diff)
download	fatcat-scholar-cf2f45bd1b35ba0ce82ba9426ae90bade46d27d7.tar.gz fatcat-scholar-cf2f45bd1b35ba0ce82ba9426ae90bade46d27d7.zip