aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/db.py
Commit message (Collapse)AuthorAgeFilesLines
* db (postgrest): actually use an HTTP sessionBryan Newbold2021-11-041-12/+24
| | | | Not as important with GET as POST, I think, but still best practice.
* glue, utils, and worker code for crossref and grobid_refsBryan Newbold2021-11-041-3/+106
|
* make fmt (black 21.9b0)Bryan Newbold2021-10-271-112/+151
|
* lint collection membership (last lint for now)Bryan Newbold2021-10-261-1/+1
|
* more progress on type annotationsBryan Newbold2021-10-261-3/+3
|
* more progress on type annotations and lintingBryan Newbold2021-10-261-11/+11
|
* start adding python type annotations to db and persist codeBryan Newbold2021-10-261-95/+120
|
* make fmtBryan Newbold2021-10-261-82/+68
|
* python: isort all importsBryan Newbold2021-10-261-1/+2
|
* persist support for ingest platform table, using existing persist workerBryan Newbold2021-10-151-1/+67
|
* add crossref postgrest fetch support to python db helpersBryan Newbold2021-06-021-0/+9
|
* update default postgrest ('db') API endpointBryan Newbold2021-04-091-1/+1
|
* tweak html_meta SQL schemaBryan Newbold2020-11-031-12/+19
|
* html: start on SQL tableBryan Newbold2020-11-031-0/+44
|
* fixes and tweaks from testing locallyBryan Newbold2020-06-171-0/+47
|
* pdf_trio persist fixes from prodBryan Newbold2020-02-191-4/+4
|
* include rel and oa_status in ingest request 'extra'Bryan Newbold2020-02-181-1/+1
|
* pdftrio basic python codeBryan Newbold2020-02-121-0/+57
| | | | This is basically just a copy/paste of GROBID code, only simpler!
* fix bug where ingest_request extra fields not persistedBryan Newbold2020-02-051-1/+2
|
* persist grobid: actually, status_code is requiredBryan Newbold2020-01-211-1/+1
| | | | | | | Instead of working around when missing, force it to exist but skip in database insert section. Disk mode still needs to check if blank.
* persist: work around GROBID timeouts with no status_codeBryan Newbold2020-01-211-1/+1
|
* persist: fix dupe field copyingBryan Newbold2020-01-151-1/+8
| | | | | | In testing hit: AttributeError: 'str' object has no attribute 'get'
* persist worker: implement updated ingest result semanticsBryan Newbold2020-01-151-1/+1
|
* small fixups to SandcrawlerPostgrestClientBryan Newbold2020-01-141-1/+10
|
* db: move duplicate row filtering into DB insert helpersBryan Newbold2020-01-021-0/+25
|
* fix DB import countingBryan Newbold2020-01-021-4/+5
|
* fix small errors found by pylintBryan Newbold2020-01-021-1/+1
|
* db: fancy insert/update separation using postgres xmaxBryan Newbold2020-01-021-15/+30
|
* improve DB helpersBryan Newbold2020-01-021-26/+81
| | | | | - return insert/update row counts - implement ON CONFLICT ... DO UPDATE on some tables
* start work on DB connector and minio clientBryan Newbold2020-01-021-0/+141