summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers
Commit message (Expand)AuthorAgeFilesLines
* more update keys and cases for chocula importerBryan Newbold2020-08-041-5/+11
* fix key name mismatch in chocula importerBryan Newbold2020-08-041-1/+1
* fix issnl typo in pubmedBryan Newbold2020-07-231-1/+1
* remove isascii() work around definition in importers/datacite.pyBryan Newbold2020-07-231-7/+1
* simple lint (flake8) fixes over python codebaseBryan Newbold2020-07-235-17/+16
* Merge branch 'martin-datacite-duplicated-author-gh-59' into 'master'bnewbold2020-07-111-6/+60
|\
| * datacite: resolve formatting issues in testsMartin Czygan2020-07-1016-72/+28
| |\
| * | datacite: there should be no index gapsMartin Czygan2020-07-101-2/+8
| * | datacite: document contributor typesMartin Czygan2020-07-101-0/+25
| * | wip: contrib, GH59Martin Czygan2020-07-101-16/+22
| * | datacite: address duplicated contributor issueMartin Czygan2020-07-071-0/+16
* | | datacite: mitigate sentry #44035Martin Czygan2020-07-101-0/+4
| |/ |/|
* | datacite: fix attribute errorMartin Czygan2020-07-071-1/+1
* | lint (flake8) tool python filesBryan Newbold2020-07-0116-72/+27
|/
* add new license mappingsBryan Newbold2020-06-302-0/+27
* datacite: improve license mappingMartin Czygan2020-06-301-9/+15
* datacite: hard cast possible date value to stringMartin Czygan2020-06-291-1/+1
* ingest importer: check that stage is consistent with releaseBryan Newbold2020-05-261-0/+5
* Merge remote-tracking branch 'github/master'Bryan Newbold2020-05-221-2/+2
|\
| * Indentity is not the same this as equality in PythonChristian Clauss2020-05-141-2/+2
* | importers: clarify handling of ApiExceptionBryan Newbold2020-05-223-4/+10
* | ingest importer: don't use glutton matchesBryan Newbold2020-05-221-3/+3
* | datacite: fix type errorMartin Czygan2020-04-221-1/+3
* | datacite: fix a raw name constraint violationMartin Czygan2020-04-201-0/+8
|/
* consistently use raw string prefix for regexBryan Newbold2020-04-171-1/+1
* pubmed: use untranslated title if translated not availableBryan Newbold2020-04-011-0/+6
* importers: replace newlines in get_text() stringsBryan Newbold2020-04-014-23/+25
* importers: more string/get_text swapsBryan Newbold2020-03-283-27/+27
* pubmed: bunch of .get_text() instead of .stringBryan Newbold2020-03-281-12/+12
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-273-7/+7
|\
| * Correct spelling mistakesEdward Betts2020-03-273-7/+7
* | datacite: nameIdentifier corner caseBryan Newbold2020-03-261-1/+2
* | jalc: avoid meaningless pages valuesBryan Newbold2020-03-231-4/+8
* | datacite: add year sanity restrictionsbnewbold2020-03-231-0/+7
* | pubmed: handle multiple ReferenceListBryan Newbold2020-03-201-1/+4
* | pubmed: update many more metadata fieldsBryan Newbold2020-03-191-0/+22
* | crossref: skip stub OUP titleBryan Newbold2020-03-191-0/+8
* | Merge branch 'martin-kafka-bs4-import' into 'master'Martin Czygan2020-03-102-1/+66
|\ \ | |/ |/|
| * common: use smaller batch size since XML parsing may be slowMartin Czygan2020-03-101-1/+1
| * pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-192-1/+66
* | add some more domain/rel URL mappingsBryan Newbold2020-02-221-0/+9
* | Merge branch 'bnewbold-shadow-import'Bryan Newbold2020-02-192-0/+196
|\ \
| * | remove arabesque short wayback URL hackBryan Newbold2020-02-141-6/+0
| * | improve shadow import file url cleanup pathBryan Newbold2020-02-131-2/+12
| * | shadow import fixes from QA testingBryan Newbold2020-02-131-0/+6
| * | shadow import: more filtering of file_meta fieldsBryan Newbold2020-02-131-0/+10
| * | basic shadow importerBryan Newbold2020-02-132-0/+176
| |/
* | ingest import: fix edit_extra pathBryan Newbold2020-02-181-1/+1
* | ingest importer: edit_extra is a top-level keyBryan Newbold2020-02-181-1/+1
* | ingest import: allow short version of corpus namesBryan Newbold2020-02-181-0/+3