aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools
Commit message (Collapse)AuthorAgeFilesLines
...
* datacite: add comment about potential date parsing bugBryan Newbold2021-11-031-0/+1
|
* datacite importer: dateparser.date.DateDataParser()Bryan Newbold2021-11-031-1/+1
| | | | Perhaps this was a change when upgrading 'dateparser'?
* more involved type wrangling and fixes for importersBryan Newbold2021-11-033-12/+14
|
* typing: relatively simple type check fixesBryan Newbold2021-11-0314-87/+82
| | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes.
* typing: initial annotations on importersBryan Newbold2021-11-0322-274/+443
| | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass.
* typing: first batch of python bulk type annotationsBryan Newbold2021-11-039-69/+129
| | | | | | While these changes are more delicate than simple lint changes, this specific batch of edits and annotations was *relatively* simple, and resulted in few code changes other than function signature additions.
* importers: remove unused __main__ routineBryan Newbold2021-11-034-19/+0
| | | | | | These perhaps were used in initial develoment or testing? fatcat_import.py is the correct way to do these imports, even for testing/development.
* lint: resolve existing mypy type errorsBryan Newbold2021-11-028-50/+86
| | | | | | | | | Adds annotations and re-workes dataflow to satisfy existing mypy issues, without adding any additional type annotations to, eg, function signatures. There will probably be many more type errors when annotations are all added.
* re-fix some lint issues after big 'fmt'Bryan Newbold2021-11-022-4/+5
|
* fmt (black): fatcat_tools/Bryan Newbold2021-11-0243-3194/+4020
|
* python: isort everythingBryan Newbold2021-11-0232-71/+116
|
* arabesque import 'hit' field is 1/0, not true/falseBryan Newbold2021-11-021-2/+2
|
* lint: simple, safe inline lint fixesBryan Newbold2021-11-0218-83/+82
| | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc
* lint/fmt: remove all 'import *'Bryan Newbold2021-11-025-21/+41
|
* entity transforms: add basic type annotationsBryan Newbold2021-11-021-7/+19
|
* ftfy 'fix_entities' argument has been renamedBryan Newbold2021-11-021-4/+4
|
* hacks to work around new pylint false positivesBryan Newbold2021-11-021-2/+3
|
* cleanup imports after fatcat_tools.transforms changeBryan Newbold2021-11-021-5/+8
|
* re-fmt all the fatcat_tools __init__ files for readabilityBryan Newbold2021-11-025-30/+62
|
* remove 'import *' from fatcat_tools (for transforms)Bryan Newbold2021-11-021-2/+2
|
* small python tweaks for annotations, importsBryan Newbold2021-11-023-3/+7
|
* try some type annotationsBryan Newbold2021-11-024-70/+79
|
* reviewer: add annotations required by mypyBryan Newbold2021-11-021-2/+3
|
* fix missing variable in fileset ingestBryan Newbold2021-11-021-2/+1
|
* Merge branch 'bnewbold-import-fileset'Bryan Newbold2021-11-025-4/+350
|\
| * WIP: more fileset ingestBryan Newbold2021-10-181-13/+21
| |
| * WIP: rel fixesBryan Newbold2021-10-141-6/+6
| |
| * fileset ingest small tweaksBryan Newbold2021-10-141-21/+36
| |
| * initial implementation of fileset ingest importersBryan Newbold2021-10-142-3/+224
| |
| * ingest: handle datasets, components, other ingest typesBryan Newbold2021-10-141-1/+15
| |
| * generic fileset importer class, with test coverageBryan Newbold2021-10-143-0/+88
| |
* | Merge branch 'bnewbold-match-get'Bryan Newbold2021-11-021-3/+9
|\ \
| * | access: populate thumbnail_url for PDFsBryan Newbold2021-10-181-3/+9
| |/
* / pubmed: switch default http site to retrieve update filesMartin Czygan2021-10-151-2/+4
|/ | | | | | | Proxy started to throw: "dial tcp: lookup ftp.ncbi.nlm.nih.gov on [::1]:53: read udp [::1]:45178->[::1]:53: read: connection refused" NIH has a http version on it's own, try to use that.
* dblp import: basic support for handles as identifiersBryan Newbold2021-10-131-1/+5
|
* python: normalization/validation support for handle identifiers (hdl)Bryan Newbold2021-10-131-0/+33
|
* dblp import: fix typos in identifier parsingBryan Newbold2021-10-131-2/+1
|
* python: partial importer utilization of new schema changesBryan Newbold2021-10-133-6/+18
|
* python: implement ES schema changesBryan Newbold2021-10-131-4/+17
|
* Merge branch 'bnewbold-ingest-tweaks' into 'master'bnewbold2021-10-023-39/+106
|\ | | | | | | | | ingest importer behavior tweaks See merge request webgroup/fatcat!120
| * kafka import: optional 'force-flush' mode for some importersBryan Newbold2021-10-011-0/+13
| | | | | | | | Behavior and motivation described in the kafka json import comment.
| * new SPN web (html) importerBryan Newbold2021-10-012-27/+81
| |
| * ingest importer behavior tweaksBryan Newbold2021-10-011-8/+8
| | | | | | | | | | - change order of 'want()' checks, so that result counts are clearer - don't require GROBID success for file imports with SPN
| * importer common: more verbose logging (with counts)Bryan Newbold2021-10-011-4/+4
| |
* | datacite: skip empty abstractsMartin Czygan2021-10-011-1/+4
|/ | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required`
* pubmed: workaround a networking issueMartin Czygan2021-09-091-24/+21
| | | | | | use an http proxy (https://github.com/miku/ftpup) to fetch files from FTP, keep some retry logic; also, hardcoding the proxy path as this should be a temporary workaround
* pubmed: add option to ftp download with lftpMartin Czygan2021-09-081-2/+31
| | | | | lftp is a classic command line ftp client, and we hope that its retry capabilities are enough of a workaround for the current networking issue
* pubmed harvester: add basic retry logicMartin Czygan2021-08-201-8/+21
| | | | | | | | Related to a previous issue with seemingly random EOFError from FTP connections, this patch wrap "ftpretr" helper function with a basic retry. Refs: fatcat-workers/issues/92151, fatcat-workers/issues/91102
* refs: default to *not* consolidating worksBryan Newbold2021-08-061-1/+1
| | | | | | | We don't handle counts for consolidated refs yet, so just don't consolidate. This should fix, eg, "Showing 1-18 of 19" type UX confusion, with the trade-off that some works will be duplicated in inbound ref tables.
* refs: lint fixesBryan Newbold2021-07-271-0/+1
|