Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | wayback short ts: add regression test for dupe URLs | Bryan Newbold | 2021-11-09 | 1 | -0/+44 |
| | |||||
* | short wayback ts: initial cleanup script implementation | Bryan Newbold | 2021-11-09 | 1 | -0/+251 |
| | |||||
* | cleanups: create a separate JsonLinePusher for cleanup workers (distinct ↵ | Bryan Newbold | 2021-11-03 | 2 | -2/+19 |
| | | | | base class) | ||||
* | datacite importer: remove unused 'year_only' variable | Bryan Newbold | 2021-11-03 | 1 | -2/+3 |
| | |||||
* | pubmed harvester: remove unused variables | Bryan Newbold | 2021-11-03 | 1 | -2/+2 |
| | |||||
* | pubmed harvester: explicit assertions to mark unreachable code paths | Bryan Newbold | 2021-11-03 | 1 | -0/+2 |
| | |||||
* | typing: add assertions to fatcat_tool code to make type assumptions explicit | Bryan Newbold | 2021-11-03 | 3 | -0/+3 |
| | |||||
* | typing: add annotations to remaining fatcat_tools code | Bryan Newbold | 2021-11-03 | 9 | -122/+186 |
| | | | | | Again, these are just annotations, no changes made to get type checks to pass | ||||
* | datacite: add comment about potential date parsing bug | Bryan Newbold | 2021-11-03 | 1 | -0/+1 |
| | |||||
* | datacite importer: dateparser.date.DateDataParser() | Bryan Newbold | 2021-11-03 | 1 | -1/+1 |
| | | | | Perhaps this was a change when upgrading 'dateparser'? | ||||
* | more involved type wrangling and fixes for importers | Bryan Newbold | 2021-11-03 | 3 | -12/+14 |
| | |||||
* | typing: relatively simple type check fixes | Bryan Newbold | 2021-11-03 | 14 | -87/+82 |
| | | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes. | ||||
* | typing: initial annotations on importers | Bryan Newbold | 2021-11-03 | 22 | -274/+443 |
| | | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass. | ||||
* | typing: first batch of python bulk type annotations | Bryan Newbold | 2021-11-03 | 9 | -69/+129 |
| | | | | | | While these changes are more delicate than simple lint changes, this specific batch of edits and annotations was *relatively* simple, and resulted in few code changes other than function signature additions. | ||||
* | importers: remove unused __main__ routine | Bryan Newbold | 2021-11-03 | 4 | -19/+0 |
| | | | | | | These perhaps were used in initial develoment or testing? fatcat_import.py is the correct way to do these imports, even for testing/development. | ||||
* | lint: resolve existing mypy type errors | Bryan Newbold | 2021-11-02 | 8 | -50/+86 |
| | | | | | | | | | Adds annotations and re-workes dataflow to satisfy existing mypy issues, without adding any additional type annotations to, eg, function signatures. There will probably be many more type errors when annotations are all added. | ||||
* | re-fix some lint issues after big 'fmt' | Bryan Newbold | 2021-11-02 | 2 | -4/+5 |
| | |||||
* | fmt (black): fatcat_tools/ | Bryan Newbold | 2021-11-02 | 43 | -3194/+4020 |
| | |||||
* | python: isort everything | Bryan Newbold | 2021-11-02 | 32 | -71/+116 |
| | |||||
* | arabesque import 'hit' field is 1/0, not true/false | Bryan Newbold | 2021-11-02 | 1 | -2/+2 |
| | |||||
* | lint: simple, safe inline lint fixes | Bryan Newbold | 2021-11-02 | 18 | -83/+82 |
| | | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc | ||||
* | lint/fmt: remove all 'import *' | Bryan Newbold | 2021-11-02 | 5 | -21/+41 |
| | |||||
* | entity transforms: add basic type annotations | Bryan Newbold | 2021-11-02 | 1 | -7/+19 |
| | |||||
* | ftfy 'fix_entities' argument has been renamed | Bryan Newbold | 2021-11-02 | 1 | -4/+4 |
| | |||||
* | hacks to work around new pylint false positives | Bryan Newbold | 2021-11-02 | 1 | -2/+3 |
| | |||||
* | cleanup imports after fatcat_tools.transforms change | Bryan Newbold | 2021-11-02 | 1 | -5/+8 |
| | |||||
* | re-fmt all the fatcat_tools __init__ files for readability | Bryan Newbold | 2021-11-02 | 5 | -30/+62 |
| | |||||
* | remove 'import *' from fatcat_tools (for transforms) | Bryan Newbold | 2021-11-02 | 1 | -2/+2 |
| | |||||
* | small python tweaks for annotations, imports | Bryan Newbold | 2021-11-02 | 3 | -3/+7 |
| | |||||
* | try some type annotations | Bryan Newbold | 2021-11-02 | 4 | -70/+79 |
| | |||||
* | reviewer: add annotations required by mypy | Bryan Newbold | 2021-11-02 | 1 | -2/+3 |
| | |||||
* | fix missing variable in fileset ingest | Bryan Newbold | 2021-11-02 | 1 | -2/+1 |
| | |||||
* | Merge branch 'bnewbold-import-fileset' | Bryan Newbold | 2021-11-02 | 5 | -4/+350 |
|\ | |||||
| * | WIP: more fileset ingest | Bryan Newbold | 2021-10-18 | 1 | -13/+21 |
| | | |||||
| * | WIP: rel fixes | Bryan Newbold | 2021-10-14 | 1 | -6/+6 |
| | | |||||
| * | fileset ingest small tweaks | Bryan Newbold | 2021-10-14 | 1 | -21/+36 |
| | | |||||
| * | initial implementation of fileset ingest importers | Bryan Newbold | 2021-10-14 | 2 | -3/+224 |
| | | |||||
| * | ingest: handle datasets, components, other ingest types | Bryan Newbold | 2021-10-14 | 1 | -1/+15 |
| | | |||||
| * | generic fileset importer class, with test coverage | Bryan Newbold | 2021-10-14 | 3 | -0/+88 |
| | | |||||
* | | Merge branch 'bnewbold-match-get' | Bryan Newbold | 2021-11-02 | 1 | -3/+9 |
|\ \ | |||||
| * | | access: populate thumbnail_url for PDFs | Bryan Newbold | 2021-10-18 | 1 | -3/+9 |
| |/ | |||||
* / | pubmed: switch default http site to retrieve update files | Martin Czygan | 2021-10-15 | 1 | -2/+4 |
|/ | | | | | | | Proxy started to throw: "dial tcp: lookup ftp.ncbi.nlm.nih.gov on [::1]:53: read udp [::1]:45178->[::1]:53: read: connection refused" NIH has a http version on it's own, try to use that. | ||||
* | dblp import: basic support for handles as identifiers | Bryan Newbold | 2021-10-13 | 1 | -1/+5 |
| | |||||
* | python: normalization/validation support for handle identifiers (hdl) | Bryan Newbold | 2021-10-13 | 1 | -0/+33 |
| | |||||
* | dblp import: fix typos in identifier parsing | Bryan Newbold | 2021-10-13 | 1 | -2/+1 |
| | |||||
* | python: partial importer utilization of new schema changes | Bryan Newbold | 2021-10-13 | 3 | -6/+18 |
| | |||||
* | python: implement ES schema changes | Bryan Newbold | 2021-10-13 | 1 | -4/+17 |
| | |||||
* | Merge branch 'bnewbold-ingest-tweaks' into 'master' | bnewbold | 2021-10-02 | 3 | -39/+106 |
|\ | | | | | | | | | ingest importer behavior tweaks See merge request webgroup/fatcat!120 | ||||
| * | kafka import: optional 'force-flush' mode for some importers | Bryan Newbold | 2021-10-01 | 1 | -0/+13 |
| | | | | | | | | Behavior and motivation described in the kafka json import comment. | ||||
| * | new SPN web (html) importer | Bryan Newbold | 2021-10-01 | 2 | -27/+81 |
| | |