summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers/common.py
Commit message (Expand)AuthorAgeFilesLines
* add fuzzy matching helper to importer base classBryan Newbold2020-12-161-2/+62
* more python normalizers, and move from importer commonBryan Newbold2020-11-191-154/+4
* remove spurious print statementBryan Newbold2020-09-031-1/+0
* generic file entity clean-ups as part of file_meta importerBryan Newbold2020-09-021-0/+47
* simple lint (flake8) fixes over python codebaseBryan Newbold2020-07-231-1/+0
* lint (flake8) tool python filesBryan Newbold2020-07-011-13/+13
* importers: clarify handling of ApiExceptionBryan Newbold2020-05-221-4/+8
* consistently use raw string prefix for regexBryan Newbold2020-04-171-1/+1
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-271-1/+1
|\
| * Correct spelling mistakesEdward Betts2020-03-271-1/+1
* | Merge branch 'martin-kafka-bs4-import' into 'master'Martin Czygan2020-03-101-0/+65
|\ \ | |/ |/|
| * common: use smaller batch size since XML parsing may be slowMartin Czygan2020-03-101-1/+1
| * pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-191-0/+65
* | add some more domain/rel URL mappingsBryan Newbold2020-02-221-0/+9
|/
* fix KafkaError worker reporting for partition errorsBryan Newbold2020-01-291-1/+1
* importers: control update behavior with more-standard flagBryan Newbold2020-01-061-0/+1
* write diagnostic messages to stderrMartin Czygan2019-12-161-2/+2
* Merge branch 'martin-importers-common-doc-fix' into 'master'Martin Czygan2019-12-141-13/+10
|\
| * complete parse_record docstringMartin Czygan2019-12-141-0/+6
| * Update EntityImporter docstring.Martin Czygan2019-12-131-13/+4
* | revert accidentally commited test timingBryan Newbold2019-12-131-2/+2
* | ensure importer description arg isn't clobberedBryan Newbold2019-12-121-1/+3
* | flush importer editgroups every few minutesBryan Newbold2019-12-121-5/+20
* | EntityImporter: submit (not accept) modeBryan Newbold2019-12-121-2/+14
|/
* crude support for 'sandcrawler' kafka topicsBryan Newbold2019-11-151-2/+3
* refactor duplicated b32_hex function in importersBryan Newbold2019-10-081-0/+9
* review/fix all confluent-kafka produce codeBryan Newbold2019-09-201-1/+0
* small fixes to confluent-kafka importers/workersBryan Newbold2019-09-201-10/+24
* small kafka tweaks for robustnessBryan Newbold2019-09-201-0/+3
* convert importers to confluent-kafka libraryBryan Newbold2019-09-201-19/+71
* refactor all python source for client lib nameBryan Newbold2019-09-051-3/+3
* fix Importer editgroup_extra pass-throughBryan Newbold2019-09-051-2/+1
* file rel: social -> academicsocialBryan Newbold2019-09-031-2/+2
* better importer 'total' countingBryan Newbold2019-09-031-4/+2
* make importer extid lookups faster by hidingBryan Newbold2019-05-291-2/+2
* is_cjk() handles kanji betterBryan Newbold2019-05-291-4/+6
* faster LargeFile XML importer for PubMedBryan Newbold2019-05-291-0/+50
* more MARC languages, and less verbose reportingBryan Newbold2019-05-241-3/+14
* missing MARC/ISO languagesBryan Newbold2019-05-221-0/+2
* Gaelic!Bryan Newbold2019-05-221-0/+3
* creative importer for bulk JSTOR importsBryan Newbold2019-05-221-0/+22
* bs4 XML parse cleanupBryan Newbold2019-05-221-0/+2
* JALC bulk file importerBryan Newbold2019-05-211-0/+20
* updates to pubmed importerBryan Newbold2019-05-211-1/+20
* tweaks to new imports/testsBryan Newbold2019-05-211-5/+78
* initial flesh out of JALC parserBryan Newbold2019-05-211-0/+36
* python implBryan Newbold2019-05-141-3/+3
* add limits to match importersBryan Newbold2019-04-231-0/+3
* archive.org isn't really a repositoryBryan Newbold2019-04-221-1/+3
* mechanism to not double-update entitiesBryan Newbold2019-04-181-0/+3