Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | ingest importer: don't use glutton matches | Bryan Newbold | 2020-05-22 | 1 | -3/+3 |
| | | | | | | | Until reviewing I didn't realize we were even doing this currently. Hopefluly has not impacted too many imports, as almost all ingests use an external identifer, so only those with identifers not in fatcat for whatever reason. | ||||
* | ingest import: fix edit_extra path | Bryan Newbold | 2020-02-18 | 1 | -1/+1 |
| | |||||
* | ingest importer: edit_extra is a top-level key | Bryan Newbold | 2020-02-18 | 1 | -1/+1 |
| | |||||
* | ingest import: allow short version of corpus names | Bryan Newbold | 2020-02-18 | 1 | -0/+3 |
| | |||||
* | ingest importer: pass through link rel | Bryan Newbold | 2020-02-18 | 1 | -1/+6 |
| | |||||
* | check ingest_request_source existance for SPN as well as ingest | Bryan Newbold | 2020-02-06 | 1 | -0/+3 |
| | |||||
* | additional trusted link sources | Bryan Newbold | 2020-02-06 | 1 | -0/+3 |
| | |||||
* | add mag and s2 as trusted link sources | Bryan Newbold | 2020-02-06 | 1 | -1/+1 |
| | |||||
* | ingest worker: handle missing ingest_request_source | Bryan Newbold | 2020-02-06 | 1 | -0/+3 |
| | | | | | Seeing a bunch of these due to re-ingests not including this field because of an earlier persist bug. | ||||
* | fix trivial typo in file importer | Bryan Newbold | 2020-01-20 | 1 | -1/+1 |
| | |||||
* | ingest: improve tests, support old ingest results | Bryan Newbold | 2020-01-15 | 1 | -3/+12 |
| | |||||
* | update ingest worker for schema tweaks | Bryan Newbold | 2020-01-15 | 1 | -8/+15 |
| | | | | | | Should be backwards compatible with old ingest results. Fixed a bug with glutton ident detection. | ||||
* | ingest: allow more sources to auto-import | Bryan Newbold | 2020-01-15 | 1 | -1/+2 |
| | |||||
* | importers: control update behavior with more-standard flag | Bryan Newbold | 2020-01-06 | 1 | -1/+1 |
| | |||||
* | allow arabesque backfill ingests for some source types | Bryan Newbold | 2019-12-24 | 1 | -0/+5 |
| | |||||
* | fix spn/ingest importer duplication check | Bryan Newbold | 2019-12-22 | 1 | -6/+8 |
| | | | | | | Check was happing after the `return True` by mistake, allowing duplicates in SPN editgroups, and potentially in ingest request editgroups as well. | ||||
* | add ingest import file collision protection | Bryan Newbold | 2019-12-13 | 1 | -0/+6 |
| | | | | | | | | The common case is the same URL being submitted repeatedly during testing. This is only within-editgroup, and per importer (eg, won't work across spn importer "submitted" editgroups), but is better than nothing. | ||||
* | update ingest request schema | Bryan Newbold | 2019-12-13 | 1 | -2/+7 |
| | | | | | This is mostly changing ingest_type from 'file' to 'pdf', and adding 'link_source'/'link_source_id', plus some small cleanups. | ||||
* | remove default mimetype from ingest-file importer | Bryan Newbold | 2019-12-13 | 1 | -2/+1 |
| | | | | We really should just use file_meta result or nothing. | ||||
* | savepapernow result importer | Bryan Newbold | 2019-12-12 | 1 | -3/+64 |
| | | | | Based on ingest-file-results importer | ||||
* | add another ingest request source to whitelist | Bryan Newbold | 2019-12-10 | 1 | -2/+5 |
| | |||||
* | tweaks to file ingest importer | Bryan Newbold | 2019-12-03 | 1 | -3/+4 |
| | | | | | - allow overriding source filter whitelist (common case for CLI use) - fix editgroup description env variable pass-through | ||||
* | re-order ingest want() for better stats | Bryan Newbold | 2019-11-15 | 1 | -7/+10 |
| | |||||
* | project -> ingest_request_source | Bryan Newbold | 2019-11-15 | 1 | -6/+6 |
| | |||||
* | ingest importer fixes | Bryan Newbold | 2019-11-15 | 1 | -3/+4 |
| | |||||
* | more ingest importer comments and counts | Bryan Newbold | 2019-11-15 | 1 | -1/+28 |
| | |||||
* | ingest file result importer | Bryan Newbold | 2019-11-15 | 1 | -0/+134 |