diff options
Diffstat (limited to 'proposals/20201103_xml_ingest.md')
-rw-r--r-- | proposals/20201103_xml_ingest.md | 19 |
1 files changed, 1 insertions, 18 deletions
diff --git a/proposals/20201103_xml_ingest.md b/proposals/20201103_xml_ingest.md index 25ec973..34e00b0 100644 --- a/proposals/20201103_xml_ingest.md +++ b/proposals/20201103_xml_ingest.md @@ -1,22 +1,5 @@ -status: wip - -TODO: -x XML fulltext URL extractor (based on HTML biblio metadata, not PDF url extractor) -x differential JATS XML and scielo XML from generic XML? - application/xml+jats is what fatcat is doing for abstracts - but it should be application/jats+xml? - application/tei+xml - if startswith "<article " and "<article-meta>" => JATS -x refactor ingest worker to be more general -x have ingest code publish body to kafka topic -x write a persist worker -/ create/configure kafka topic -- test everything locally -- fatcat: ingest tool to create requests -- fatcat: entity updates worker creates XML ingest requests for specific sources -- fatcat: ingest file import worker allows XML results -- ansible: deployment of persist worker +status: deployed XML Fulltext Ingest ==================== |