aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
Commit message (Expand)AuthorAgeFilesLines
* xml: re-encode XML docs into UTF-8 for persistingBryan Newbold2020-11-031-1/+18
* XML ingest proposalBryan Newbold2020-11-031-0/+64
* commit WIP HTML ingest proposalBryan Newbold2020-11-031-0/+97
* store no-capture URLs in terminal_urlBryan Newbold2020-10-121-0/+36
* seaweedfs proposal: fix typos and wordingMartin Czygan2020-07-011-9/+11
* tweak pdf_meta SQL schemaBryan Newbold2020-06-171-5/+5
* tweak kafka topic names and seaweedfs layoutBryan Newbold2020-06-171-3/+4
* pdf thumbnail+text+meta proposalBryan Newbold2020-06-171-0/+327
* Merge branch 'martin-seaweed-s3' into 'master'bnewbold2020-05-261-0/+424
|\
| * notes on seaweedfs (s3 backend)Martin Czygan2020-05-201-0/+424
* | NSQ for job task manager/schedulerBryan Newbold2020-04-281-0/+79
|/
* ingest: add force_recrawl flag to skip historical wayback lookupBryan Newbold2020-03-021-0/+1
* move edit_extra path to top-levelBryan Newbold2020-02-181-2/+1
* include rel and oa_status in ingest request 'extra'Bryan Newbold2020-02-181-0/+4
* move pdf_trio results back under key in JSON/KafkaBryan Newbold2020-02-131-15/+18
* pdftrio JSON object as top-level in Kafka resultsBryan Newbold2020-02-121-16/+16
* pdftrio basic python codeBryan Newbold2020-02-121-2/+2
* pdftrio proposal and start on schema+kafkaBryan Newbold2020-02-121-0/+101
* 2020q1 fulltext ingest plansBryan Newbold2020-01-291-0/+272
* clarify ingest result schema and semanticsBryan Newbold2020-01-151-23/+34
* clarify pmc/pmcid pairingBryan Newbold2020-01-141-3/+3
* yet more tweaks to ingest proposalBryan Newbold2020-01-021-3/+2
* update ingest proposal source/link namingBryan Newbold2019-12-131-16/+26
* sql schema change proposalsBryan Newbold2019-12-111-0/+40
* pdftotext proposalBryan Newbold2019-12-111-0/+123
* update ingest proposalBryan Newbold2019-12-111-11/+145
* add structure of ingest proposalBryan Newbold2019-11-131-0/+129