aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers
Commit message (Expand)AuthorAgeFilesLines
* doaj: require container linkage for release importBryan Newbold2022-07-191-0/+4
* arxiv: work-around hack for strange titleBryan Newbold2022-07-071-0/+8
* fileset ingest: handle missing/partial file-level metadataBryan Newbold2022-04-051-3/+3
* ingest importer: improved extra/edit_extra code flowBryan Newbold2022-04-051-20/+13
* fileset ingest: remove a TODOBryan Newbold2022-04-041-1/+0
* filesets: typo bugfix, and test 'mimetype' on entity, not extraBryan Newbold2022-04-041-1/+1
* fileset ingest: fix mimetype handlingBryan Newbold2022-03-311-4/+5
* bugfix: logic flow in fileset release checkingBryan Newbold2022-03-231-3/+6
* single-file variant of fileset importer for dataset attemptsBryan Newbold2022-03-232-0/+202
* fix typo in fileset comparison helperBryan Newbold2022-03-231-1/+1
* ingest fileset fixes, and some test coverageBryan Newbold2022-03-232-13/+30
* dataset ingest: JSON object fixesBryan Newbold2022-03-221-5/+5
* datacite importer: skip container_id for some repository sourcesBryan Newbold2022-02-091-0/+34
* doaj importer: TODO note to skip some larger publishersBryan Newbold2022-02-091-0/+4
* crossref importer: skip affiliations lacking 'name'Bryan Newbold2021-12-151-0/+3
* chocula importer: handle not-upper-case ISSNsBryan Newbold2021-11-301-2/+6
* chocula importer: handle broken ISSNs in extra metadataBryan Newbold2021-11-301-2/+7
* chocula importer: tweak counting, conditions for doing updatesBryan Newbold2021-11-301-15/+7
* chocula importer: move issne/issnp 'extra' to top-level fields if doing updatesBryan Newbold2021-11-301-0/+6
* chocula: don't do name cleanups in importerBryan Newbold2021-11-301-8/+2
* codespell fixes in python code (comments)Bryan Newbold2021-11-241-2/+2
* Merge branch 'bnewbold-import-refactors' into 'master'bnewbold2021-11-1116-1380/+146
|\
| * refactor importer metadata tables into separate file; move some helpers aroundBryan Newbold2021-11-108-621/+25
| * importers: refactor imports of clean() and other normalization helpersBryan Newbold2021-11-1012-95/+104
| * remove cdl_dash_dat and wayback_static importersBryan Newbold2021-11-103-510/+0
| * datacite import: store less subject metadataBryan Newbold2021-11-101-1/+7
| * importers: use clean_doi() in many more (all?) importersBryan Newbold2021-11-096-12/+29
| * remove deprecated extid sqlite3 lookup table feature from importersBryan Newbold2021-11-093-160/+0
* | Merge branch 'bnewbold-cleanups-nov2021' into 'master'bnewbold2021-11-111-0/+9
|\ \
| * | imports: generic file cleanup removes exact duplicate URLsBryan Newbold2021-11-091-0/+9
| |/
* / pubmed: allow updates if PMCID does not exist yetBryan Newbold2021-11-101-1/+6
|/
* datacite importer: remove unused 'year_only' variableBryan Newbold2021-11-031-2/+3
* datacite: add comment about potential date parsing bugBryan Newbold2021-11-031-0/+1
* datacite importer: dateparser.date.DateDataParser()Bryan Newbold2021-11-031-1/+1
* more involved type wrangling and fixes for importersBryan Newbold2021-11-033-12/+14
* typing: relatively simple type check fixesBryan Newbold2021-11-0314-87/+82
* typing: initial annotations on importersBryan Newbold2021-11-0322-274/+443
* importers: remove unused __main__ routineBryan Newbold2021-11-034-19/+0
* lint: resolve existing mypy type errorsBryan Newbold2021-11-023-22/+27
* re-fix some lint issues after big 'fmt'Bryan Newbold2021-11-021-2/+2
* fmt (black): fatcat_tools/Bryan Newbold2021-11-0222-2115/+2578
* python: isort everythingBryan Newbold2021-11-0217-41/+70
* arabesque import 'hit' field is 1/0, not true/falseBryan Newbold2021-11-021-2/+2
* lint: simple, safe inline lint fixesBryan Newbold2021-11-0212-22/+21
* lint/fmt: remove all 'import *'Bryan Newbold2021-11-025-21/+41
* re-fmt all the fatcat_tools __init__ files for readabilityBryan Newbold2021-11-021-17/+39
* small python tweaks for annotations, importsBryan Newbold2021-11-022-2/+6
* try some type annotationsBryan Newbold2021-11-022-55/+63
* fix missing variable in fileset ingestBryan Newbold2021-11-021-2/+1
* WIP: more fileset ingestBryan Newbold2021-10-181-13/+21