aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/db.py
Commit message (Collapse)AuthorAgeFilesLines
* pdftrio basic python codeBryan Newbold2020-02-121-0/+57
| | | | This is basically just a copy/paste of GROBID code, only simpler!
* fix bug where ingest_request extra fields not persistedBryan Newbold2020-02-051-1/+2
|
* persist grobid: actually, status_code is requiredBryan Newbold2020-01-211-1/+1
| | | | | | | Instead of working around when missing, force it to exist but skip in database insert section. Disk mode still needs to check if blank.
* persist: work around GROBID timeouts with no status_codeBryan Newbold2020-01-211-1/+1
|
* persist: fix dupe field copyingBryan Newbold2020-01-151-1/+8
| | | | | | In testing hit: AttributeError: 'str' object has no attribute 'get'
* persist worker: implement updated ingest result semanticsBryan Newbold2020-01-151-1/+1
|
* small fixups to SandcrawlerPostgrestClientBryan Newbold2020-01-141-1/+10
|
* db: move duplicate row filtering into DB insert helpersBryan Newbold2020-01-021-0/+25
|
* fix DB import countingBryan Newbold2020-01-021-4/+5
|
* fix small errors found by pylintBryan Newbold2020-01-021-1/+1
|
* db: fancy insert/update separation using postgres xmaxBryan Newbold2020-01-021-15/+30
|
* improve DB helpersBryan Newbold2020-01-021-26/+81
| | | | | - return insert/update row counts - implement ON CONFLICT ... DO UPDATE on some tables
* start work on DB connector and minio clientBryan Newbold2020-01-021-0/+141