Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | pdf_trio persist fixes from prod | Bryan Newbold | 2020-02-19 | 1 | -4/+4 |
| | |||||
* | include rel and oa_status in ingest request 'extra' | Bryan Newbold | 2020-02-18 | 1 | -1/+1 |
| | |||||
* | pdftrio basic python code | Bryan Newbold | 2020-02-12 | 1 | -0/+57 |
| | | | | This is basically just a copy/paste of GROBID code, only simpler! | ||||
* | fix bug where ingest_request extra fields not persisted | Bryan Newbold | 2020-02-05 | 1 | -1/+2 |
| | |||||
* | persist grobid: actually, status_code is required | Bryan Newbold | 2020-01-21 | 1 | -1/+1 |
| | | | | | | | Instead of working around when missing, force it to exist but skip in database insert section. Disk mode still needs to check if blank. | ||||
* | persist: work around GROBID timeouts with no status_code | Bryan Newbold | 2020-01-21 | 1 | -1/+1 |
| | |||||
* | persist: fix dupe field copying | Bryan Newbold | 2020-01-15 | 1 | -1/+8 |
| | | | | | | In testing hit: AttributeError: 'str' object has no attribute 'get' | ||||
* | persist worker: implement updated ingest result semantics | Bryan Newbold | 2020-01-15 | 1 | -1/+1 |
| | |||||
* | small fixups to SandcrawlerPostgrestClient | Bryan Newbold | 2020-01-14 | 1 | -1/+10 |
| | |||||
* | db: move duplicate row filtering into DB insert helpers | Bryan Newbold | 2020-01-02 | 1 | -0/+25 |
| | |||||
* | fix DB import counting | Bryan Newbold | 2020-01-02 | 1 | -4/+5 |
| | |||||
* | fix small errors found by pylint | Bryan Newbold | 2020-01-02 | 1 | -1/+1 |
| | |||||
* | db: fancy insert/update separation using postgres xmax | Bryan Newbold | 2020-01-02 | 1 | -15/+30 |
| | |||||
* | improve DB helpers | Bryan Newbold | 2020-01-02 | 1 | -26/+81 |
| | | | | | - return insert/update row counts - implement ON CONFLICT ... DO UPDATE on some tables | ||||
* | start work on DB connector and minio client | Bryan Newbold | 2020-01-02 | 1 | -0/+141 |