| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | seaweedfs proposal: fix typos and wording | Martin Czygan | 2020-07-01 | 1 | -9/+11 | 
| | | |||||
| * | tweak pdf_meta SQL schema | Bryan Newbold | 2020-06-17 | 1 | -5/+5 | 
| | | |||||
| * | tweak kafka topic names and seaweedfs layout | Bryan Newbold | 2020-06-17 | 1 | -3/+4 | 
| | | |||||
| * | pdf thumbnail+text+meta proposal | Bryan Newbold | 2020-06-17 | 1 | -0/+327 | 
| | | |||||
| * | Merge branch 'martin-seaweed-s3' into 'master' | bnewbold | 2020-05-26 | 1 | -0/+424 | 
| |\ | | | | | | | | | notes on seaweedfs (s3 backend) See merge request webgroup/sandcrawler!28 | ||||
| | * | notes on seaweedfs (s3 backend) | Martin Czygan | 2020-05-20 | 1 | -0/+424 | 
| | | | | | | | | | Notes gathered during seaweedfs setup and test runs. | ||||
| * | | NSQ for job task manager/scheduler | Bryan Newbold | 2020-04-28 | 1 | -0/+79 | 
| |/ | |||||
| * | ingest: add force_recrawl flag to skip historical wayback lookup | Bryan Newbold | 2020-03-02 | 1 | -0/+1 | 
| | | |||||
| * | move edit_extra path to top-level | Bryan Newbold | 2020-02-18 | 1 | -2/+1 | 
| | | |||||
| * | include rel and oa_status in ingest request 'extra' | Bryan Newbold | 2020-02-18 | 1 | -0/+4 | 
| | | |||||
| * | move pdf_trio results back under key in JSON/Kafka | Bryan Newbold | 2020-02-13 | 1 | -15/+18 | 
| | | |||||
| * | pdftrio JSON object as top-level in Kafka results | Bryan Newbold | 2020-02-12 | 1 | -16/+16 | 
| | | | | | To be same as GROBID results | ||||
| * | pdftrio basic python code | Bryan Newbold | 2020-02-12 | 1 | -2/+2 | 
| | | | | | This is basically just a copy/paste of GROBID code, only simpler! | ||||
| * | pdftrio proposal and start on schema+kafka | Bryan Newbold | 2020-02-12 | 1 | -0/+101 | 
| | | |||||
| * | 2020q1 fulltext ingest plans | Bryan Newbold | 2020-01-29 | 1 | -0/+272 | 
| | | |||||
| * | clarify ingest result schema and semantics | Bryan Newbold | 2020-01-15 | 1 | -23/+34 | 
| | | |||||
| * | clarify pmc/pmcid pairing | Bryan Newbold | 2020-01-14 | 1 | -3/+3 | 
| | | |||||
| * | yet more tweaks to ingest proposal | Bryan Newbold | 2020-01-02 | 1 | -3/+2 | 
| | | |||||
| * | update ingest proposal source/link naming | Bryan Newbold | 2019-12-13 | 1 | -16/+26 | 
| | | |||||
| * | sql schema change proposals | Bryan Newbold | 2019-12-11 | 1 | -0/+40 | 
| | | |||||
| * | pdftotext proposal | Bryan Newbold | 2019-12-11 | 1 | -0/+123 | 
| | | |||||
| * | update ingest proposal | Bryan Newbold | 2019-12-11 | 1 | -11/+145 | 
| | | |||||
| * | add structure of ingest proposal | Bryan Newbold | 2019-11-13 | 1 | -0/+129 | 
| Still needs some details flushed out | |||||
