Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | ingest import: fix edit_extra path | Bryan Newbold | 2020-02-18 | 1 | -1/+1 |
| | |||||
* | ingest importer: edit_extra is a top-level key | Bryan Newbold | 2020-02-18 | 1 | -1/+1 |
| | |||||
* | ingest import: allow short version of corpus names | Bryan Newbold | 2020-02-18 | 1 | -0/+3 |
| | |||||
* | ingest importer: pass through link rel | Bryan Newbold | 2020-02-18 | 1 | -1/+6 |
| | |||||
* | check ingest_request_source existance for SPN as well as ingest | Bryan Newbold | 2020-02-06 | 1 | -0/+3 |
| | |||||
* | additional trusted link sources | Bryan Newbold | 2020-02-06 | 1 | -0/+3 |
| | |||||
* | add mag and s2 as trusted link sources | Bryan Newbold | 2020-02-06 | 1 | -1/+1 |
| | |||||
* | ingest worker: handle missing ingest_request_source | Bryan Newbold | 2020-02-06 | 1 | -0/+3 |
| | | | | | Seeing a bunch of these due to re-ingests not including this field because of an earlier persist bug. | ||||
* | fix trivial typo in file importer | Bryan Newbold | 2020-01-20 | 1 | -1/+1 |
| | |||||
* | ingest: improve tests, support old ingest results | Bryan Newbold | 2020-01-15 | 1 | -3/+12 |
| | |||||
* | update ingest worker for schema tweaks | Bryan Newbold | 2020-01-15 | 1 | -8/+15 |
| | | | | | | Should be backwards compatible with old ingest results. Fixed a bug with glutton ident detection. | ||||
* | ingest: allow more sources to auto-import | Bryan Newbold | 2020-01-15 | 1 | -1/+2 |
| | |||||
* | importers: control update behavior with more-standard flag | Bryan Newbold | 2020-01-06 | 1 | -1/+1 |
| | |||||
* | allow arabesque backfill ingests for some source types | Bryan Newbold | 2019-12-24 | 1 | -0/+5 |
| | |||||
* | fix spn/ingest importer duplication check | Bryan Newbold | 2019-12-22 | 1 | -6/+8 |
| | | | | | | Check was happing after the `return True` by mistake, allowing duplicates in SPN editgroups, and potentially in ingest request editgroups as well. | ||||
* | add ingest import file collision protection | Bryan Newbold | 2019-12-13 | 1 | -0/+6 |
| | | | | | | | | The common case is the same URL being submitted repeatedly during testing. This is only within-editgroup, and per importer (eg, won't work across spn importer "submitted" editgroups), but is better than nothing. | ||||
* | update ingest request schema | Bryan Newbold | 2019-12-13 | 1 | -2/+7 |
| | | | | | This is mostly changing ingest_type from 'file' to 'pdf', and adding 'link_source'/'link_source_id', plus some small cleanups. | ||||
* | remove default mimetype from ingest-file importer | Bryan Newbold | 2019-12-13 | 1 | -2/+1 |
| | | | | We really should just use file_meta result or nothing. | ||||
* | savepapernow result importer | Bryan Newbold | 2019-12-12 | 1 | -3/+64 |
| | | | | Based on ingest-file-results importer | ||||
* | add another ingest request source to whitelist | Bryan Newbold | 2019-12-10 | 1 | -2/+5 |
| | |||||
* | tweaks to file ingest importer | Bryan Newbold | 2019-12-03 | 1 | -3/+4 |
| | | | | | - allow overriding source filter whitelist (common case for CLI use) - fix editgroup description env variable pass-through | ||||
* | re-order ingest want() for better stats | Bryan Newbold | 2019-11-15 | 1 | -7/+10 |
| | |||||
* | project -> ingest_request_source | Bryan Newbold | 2019-11-15 | 1 | -6/+6 |
| | |||||
* | ingest importer fixes | Bryan Newbold | 2019-11-15 | 1 | -3/+4 |
| | |||||
* | more ingest importer comments and counts | Bryan Newbold | 2019-11-15 | 1 | -1/+28 |
| | |||||
* | ingest file result importer | Bryan Newbold | 2019-11-15 | 1 | -0/+134 |