summaryrefslogtreecommitdiffstats
path: root/python/fatcat_import.py
Commit message (Expand)AuthorAgeFilesLines
* add 'lxml' mode for large XML file import, and multi-tagsBryan Newbold2020-12-171-2/+1
* implement remainder of DOAJ article importerBryan Newbold2020-11-191-0/+37
* ingest: initial 'web' worker implementationBryan Newbold2020-11-051-0/+42
* ingest: whitelist -> allowlistBryan Newbold2020-11-051-3/+3
* fixes and test coverage for file_meta importerBryan Newbold2020-08-211-1/+4
* initial implementation of file_meta importerBryan Newbold2020-08-211-0/+15
* lint (flake8) top-level python filesBryan Newbold2020-07-011-1/+3
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-271-2/+2
|\
| * Correct spelling mistakesEdward Betts2020-03-271-2/+2
* | Merge branch 'martin-kafka-bs4-import' into 'master'Martin Czygan2020-03-101-16/+18
|\ \ | |/ |/|
| * fatcat_import: address potential hanging, if stdin is emptyMartin Czygan2020-03-091-0/+2
| * more pubmed adjustmentsMartin Czygan2020-02-221-1/+1
| * pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-191-16/+16
* | shadow import fixes from QA testingBryan Newbold2020-02-131-1/+1
* | basic shadow importerBryan Newbold2020-02-131-0/+15
|/
* refactor fatcat_import kafka group namesBryan Newbold2020-01-211-13/+54
* fix trivial one-character typo in fatcat_import.pyBryan Newbold2020-01-171-1/+1
* actually control pubmed updates with a flagBryan Newbold2020-01-171-0/+4
* add missing sentry/raven tagsBryan Newbold2020-01-101-0/+6
* Merge branch 'martin-datacite-import'Martin Czygan2020-01-081-0/+43
|\
| * datacite: fix typosMartin Czygan2020-01-071-1/+1
| * datacite: remove --lang-detect flagMartin Czygan2020-01-031-4/+0
| * datacite: use specific auth varMartin Czygan2019-12-281-1/+1
| * datacite: add missing --extid-map-file flagMartin Czygan2019-12-281-0/+4
| * improve datacite field mapping and importMartin Czygan2019-12-281-1/+14
| * datacite: importer skeletonMartin Czygan2019-12-281-0/+30
* | importers: control update behavior with more-standard flagBryan Newbold2020-01-061-1/+5
|/
* savepapernow result importerBryan Newbold2019-12-121-0/+24
* improve argparse usageBryan Newbold2019-12-111-18/+30
* tweaks to file ingest importerBryan Newbold2019-12-031-0/+6
* have ingest-file-results importer operate as crawl-botBryan Newbold2019-11-151-1/+1
* better ingest-file-results import nameBryan Newbold2019-11-151-1/+1
* ingest file result importerBryan Newbold2019-11-151-0/+34
* small fixes to confluent-kafka importers/workersBryan Newbold2019-09-201-1/+1
* convert importers to confluent-kafka libraryBryan Newbold2019-09-201-2/+3
* start chocula importerBryan Newbold2019-09-031-0/+14
* support extids in matched importerBryan Newbold2019-06-201-0/+4
* faster LargeFile XML importer for PubMedBryan Newbold2019-05-291-1/+1
* make pubmed ref lookups configurableBryan Newbold2019-05-221-1/+8
* creative importer for bulk JSTOR importsBryan Newbold2019-05-221-0/+18
* pubmed importer command and tweaksBryan Newbold2019-05-221-0/+25
* arxiv importer robustification and CLI implBryan Newbold2019-05-211-0/+21
* JALC bulk file importerBryan Newbold2019-05-211-0/+21
* fix default mimetype (impacted pre-1923 files)Bryan Newbold2019-05-151-1/+5
* editgroup description overrideBryan Newbold2019-04-221-1/+11
* minor arabesque tweaksBryan Newbold2019-04-181-12/+22
* arabesque importer using crawl-bot credsBryan Newbold2019-04-181-1/+1
* arabesque import tweaksBryan Newbold2019-04-181-0/+4
* early version of arabesque importerBryan Newbold2019-04-121-0/+28
* importer for CDL/DASH dat pilot dweb datasetsBryan Newbold2019-03-191-1/+29