Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | small type annotation things from additional packages | Bryan Newbold | 2021-10-27 | 2 | -5/+14 | |
| | ||||||
* | make fmt (black 21.9b0) | Bryan Newbold | 2021-10-27 | 18 | -1840/+2332 | |
| | ||||||
* | fileset: refactor out tables of helpers | Bryan Newbold | 2021-10-27 | 3 | -21/+19 | |
| | | | | | | | Having these objects invoked in tables resulted in a whole bunch of objects (including children) getting initialized, which seems like the wrong thing to do. Defer this until the actual ingest fileset worker is initialized. | |||||
* | fix type annotations for petabox body fetch helper | Bryan Newbold | 2021-10-26 | 5 | -8/+11 | |
| | ||||||
* | small type annotation hack | Bryan Newbold | 2021-10-26 | 1 | -1/+1 | |
| | ||||||
* | fileset: fix field renaming bug (caught by mypy) | Bryan Newbold | 2021-10-26 | 1 | -2/+2 | |
| | ||||||
* | fileset ingest: fix table name typo (via mypy) | Bryan Newbold | 2021-10-26 | 1 | -1/+1 | |
| | ||||||
* | update 'XXX' notes from fileset ingest development | Bryan Newbold | 2021-10-26 | 2 | -9/+6 | |
| | ||||||
* | bugfix: setting html_biblio on ingest results | Bryan Newbold | 2021-10-26 | 2 | -2/+2 | |
| | | | | This was caught during lint cleanup | |||||
* | lint collection membership (last lint for now) | Bryan Newbold | 2021-10-26 | 7 | -32/+32 | |
| | ||||||
* | ingest fileset: fix silly import typo | Bryan Newbold | 2021-10-26 | 1 | -1/+1 | |
| | ||||||
* | type annotations for persist workers; required some work | Bryan Newbold | 2021-10-26 | 1 | -66/+59 | |
| | | | | | Had to re-structure and filter things a bit, Should be better behavior, but might be some small changes. | |||||
* | ingest file HTTP API: fixes from type checking | Bryan Newbold | 2021-10-26 | 1 | -3/+3 | |
| | | | | | This code is deprecated and should be removed anyways, but still interesting to see the fixes | |||||
* | more progress on type annotations | Bryan Newbold | 2021-10-26 | 8 | -34/+55 | |
| | ||||||
* | grobid: fix a bug with consolidate_mode header, exposed by type annotations | Bryan Newbold | 2021-10-26 | 1 | -1/+2 | |
| | ||||||
* | grobid: type annotations | Bryan Newbold | 2021-10-26 | 1 | -9/+19 | |
| | ||||||
* | type annotations on SandcrawlerWorker | Bryan Newbold | 2021-10-26 | 1 | -46/+57 | |
| | | | | | These annoations have a broad impact! Being conservative to start: Any-to-Any for process(), etc. | |||||
* | more progress on type annotations and linting | Bryan Newbold | 2021-10-26 | 8 | -49/+80 | |
| | ||||||
* | ia: more tweaks to delicate code to satisfy type checker | Bryan Newbold | 2021-10-26 | 1 | -10/+12 | |
| | | | | | Ran the 'live' wayback tests after this commit as a check, and worked (once FTP status code behavior change is fixed) | |||||
* | ia helpers: enforce max_redirects count correctly | Bryan Newbold | 2021-10-26 | 1 | -1/+1 | |
| | | | | | AKA, should run fetch even if max_redirects = 0; the first loop iteration is not a redirect. | |||||
* | set CDX request params are str, not int or datetime | Bryan Newbold | 2021-10-26 | 1 | -3/+6 | |
| | | | | This might be a bugfix, changing CDX lookup behavior? | |||||
* | bugfix: was setting 'from' parameter as a tuple, not a string | Bryan Newbold | 2021-10-26 | 1 | -1/+1 | |
| | ||||||
* | start type annotating IA helper code | Bryan Newbold | 2021-10-26 | 1 | -37/+65 | |
| | ||||||
* | start adding python type annotations to db and persist code | Bryan Newbold | 2021-10-26 | 2 | -97/+124 | |
| | ||||||
* | flake8 clean (with current settings) | Bryan Newbold | 2021-10-26 | 7 | -24/+22 | |
| | ||||||
* | start handling trivial lint cleanups: unused imports, 'is None', etc | Bryan Newbold | 2021-10-26 | 15 | -97/+57 | |
| | ||||||
* | make fmt | Bryan Newbold | 2021-10-26 | 19 | -571/+741 | |
| | ||||||
* | ingest_html: update trafilatura TEI-XML output kwarg | Bryan Newbold | 2021-10-26 | 1 | -1/+1 | |
| | ||||||
* | python: isort all imports | Bryan Newbold | 2021-10-26 | 18 | -99/+108 | |
| | ||||||
* | more small fileset ingest tweaks | Bryan Newbold | 2021-10-26 | 2 | -6/+21 | |
| | ||||||
* | persist support for ingest platform table, using existing persist worker | Bryan Newbold | 2021-10-15 | 2 | -2/+129 | |
| | ||||||
* | improve fileset ingest integration with file ingest | Bryan Newbold | 2021-10-15 | 3 | -5/+24 | |
| | ||||||
* | more fileset iteration | Bryan Newbold | 2021-10-15 | 4 | -45/+80 | |
| | ||||||
* | move SPNv2 'simple_get' logic to SPN client | Bryan Newbold | 2021-10-15 | 3 | -52/+31 | |
| | ||||||
* | filesets: iteration of implementation and docs | Bryan Newbold | 2021-10-15 | 4 | -82/+148 | |
| | ||||||
* | fileset ingest: improve platform parsing | Bryan Newbold | 2021-10-15 | 1 | -12/+196 | |
| | ||||||
* | fileset ingest: improve error handling | Bryan Newbold | 2021-10-15 | 4 | -48/+106 | |
| | ||||||
* | initial implementation of zenodo platform import | Bryan Newbold | 2021-10-15 | 1 | -0/+100 | |
| | ||||||
* | initial figshare platform helper | Bryan Newbold | 2021-10-15 | 1 | -0/+95 | |
| | ||||||
* | improvements to platform helpers | Bryan Newbold | 2021-10-15 | 3 | -34/+44 | |
| | ||||||
* | component ingest support for dataverse files (individual) | Bryan Newbold | 2021-10-15 | 2 | -13/+31 | |
| | ||||||
* | progress on web ingest strategy | Bryan Newbold | 2021-10-15 | 3 | -12/+121 | |
| | ||||||
* | fileset ingest progress for dataverse | Bryan Newbold | 2021-10-15 | 4 | -23/+291 | |
| | ||||||
* | local-file version of gen_file_metadata | Bryan Newbold | 2021-10-15 | 2 | -2/+43 | |
| | ||||||
* | progress on dataset ingest | Bryan Newbold | 2021-10-15 | 4 | -122/+333 | |
| | ||||||
* | wrap up previous renaming work | Bryan Newbold | 2021-10-15 | 3 | -5/+3 | |
| | ||||||
* | progress on fileset/dataset ingest | Bryan Newbold | 2021-10-15 | 4 | -0/+403 | |
| | ||||||
* | refactoring; progress on filesets | Bryan Newbold | 2021-10-15 | 2 | -1/+7 | |
| | ||||||
* | rename some python files for clarity | Bryan Newbold | 2021-10-15 | 2 | -0/+0 | |
| | ||||||
* | pdf ingest: journals.uchicago.edu pattern | Bryan Newbold | 2021-10-11 | 1 | -0/+8 | |
| |