aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
Commit message (Collapse)AuthorAgeFilesLines
* tweak pdf_meta SQL schemaBryan Newbold2020-06-171-5/+5
|
* tweak kafka topic names and seaweedfs layoutBryan Newbold2020-06-171-3/+4
|
* pdf thumbnail+text+meta proposalBryan Newbold2020-06-171-0/+327
|
* Merge branch 'martin-seaweed-s3' into 'master'bnewbold2020-05-261-0/+424
|\ | | | | | | | | notes on seaweedfs (s3 backend) See merge request webgroup/sandcrawler!28
| * notes on seaweedfs (s3 backend)Martin Czygan2020-05-201-0/+424
| | | | | | | | Notes gathered during seaweedfs setup and test runs.
* | NSQ for job task manager/schedulerBryan Newbold2020-04-281-0/+79
|/
* ingest: add force_recrawl flag to skip historical wayback lookupBryan Newbold2020-03-021-0/+1
|
* move edit_extra path to top-levelBryan Newbold2020-02-181-2/+1
|
* include rel and oa_status in ingest request 'extra'Bryan Newbold2020-02-181-0/+4
|
* move pdf_trio results back under key in JSON/KafkaBryan Newbold2020-02-131-15/+18
|
* pdftrio JSON object as top-level in Kafka resultsBryan Newbold2020-02-121-16/+16
| | | | To be same as GROBID results
* pdftrio basic python codeBryan Newbold2020-02-121-2/+2
| | | | This is basically just a copy/paste of GROBID code, only simpler!
* pdftrio proposal and start on schema+kafkaBryan Newbold2020-02-121-0/+101
|
* 2020q1 fulltext ingest plansBryan Newbold2020-01-291-0/+272
|
* clarify ingest result schema and semanticsBryan Newbold2020-01-151-23/+34
|
* clarify pmc/pmcid pairingBryan Newbold2020-01-141-3/+3
|
* yet more tweaks to ingest proposalBryan Newbold2020-01-021-3/+2
|
* update ingest proposal source/link namingBryan Newbold2019-12-131-16/+26
|
* sql schema change proposalsBryan Newbold2019-12-111-0/+40
|
* pdftotext proposalBryan Newbold2019-12-111-0/+123
|
* update ingest proposalBryan Newbold2019-12-111-11/+145
|
* add structure of ingest proposalBryan Newbold2019-11-131-0/+129
Still needs some details flushed out