From 99cc7de073baee53bb97075377906743d364ab84 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Mon, 2 Jan 2023 19:16:09 -0800 Subject: proposals: update status; include some brainstorm-only docs --- proposals/20201103_xml_ingest.md | 19 +------------------ 1 file changed, 1 insertion(+), 18 deletions(-) (limited to 'proposals/20201103_xml_ingest.md') diff --git a/proposals/20201103_xml_ingest.md b/proposals/20201103_xml_ingest.md index 25ec973..34e00b0 100644 --- a/proposals/20201103_xml_ingest.md +++ b/proposals/20201103_xml_ingest.md @@ -1,22 +1,5 @@ -status: wip - -TODO: -x XML fulltext URL extractor (based on HTML biblio metadata, not PDF url extractor) -x differential JATS XML and scielo XML from generic XML? - application/xml+jats is what fatcat is doing for abstracts - but it should be application/jats+xml? - application/tei+xml - if startswith "
" => JATS -x refactor ingest worker to be more general -x have ingest code publish body to kafka topic -x write a persist worker -/ create/configure kafka topic -- test everything locally -- fatcat: ingest tool to create requests -- fatcat: entity updates worker creates XML ingest requests for specific sources -- fatcat: ingest file import worker allows XML results -- ansible: deployment of persist worker +status: deployed XML Fulltext Ingest ==================== -- cgit v1.2.3