| Commit message (Expand) | Author | Age | Files | Lines |
* | crossref persist: make GROBID ref parsing an option (not default) | Bryan Newbold | 2021-11-04 | 3 | -9/+33 |
* | glue, utils, and worker code for crossref and grobid_refs | Bryan Newbold | 2021-11-04 | 4 | -5/+212 |
* | iterated GROBID citation cleaning and processing | Bryan Newbold | 2021-11-04 | 1 | -27/+45 |
* | grobid citations: first pass at cleaning unstructured | Bryan Newbold | 2021-11-04 | 1 | -2/+34 |
* | initial crossref-refs via GROBID helper routine | Bryan Newbold | 2021-11-04 | 7 | -6/+839 |
* | pipenv: bump grobid_tei_xml version to 0.1.2 | Bryan Newbold | 2021-11-04 | 2 | -11/+11 |
* | pdftrio client: use HTTP session for POSTs | Bryan Newbold | 2021-11-03 | 1 | -1/+1 |
* | workers: use HTTP session for archive.org fetches | Bryan Newbold | 2021-11-03 | 1 | -3/+3 |
* | IA (wayback): actually use an HTTP session for replay fetches | Bryan Newbold | 2021-11-03 | 1 | -2/+3 |
* | updates/corrections to old small.json GROBID metadata example file | Bryan Newbold | 2021-10-27 | 1 | -6/+1 |
* | remove grobid2json helper file, replace with grobid_tei_xml | Bryan Newbold | 2021-10-27 | 7 | -224/+22 |
* | small type annotation things from additional packages | Bryan Newbold | 2021-10-27 | 2 | -5/+14 |
* | toolchain config updates | Bryan Newbold | 2021-10-27 | 3 | -10/+6 |
* | make fmt (black 21.9b0) | Bryan Newbold | 2021-10-27 | 57 | -3126/+3991 |
* | pipenv: flipflop from yapf back to black; more type packages; bump grobid_tei... | Bryan Newbold | 2021-10-27 | 2 | -27/+112 |
* | fileset: refactor out tables of helpers | Bryan Newbold | 2021-10-27 | 3 | -21/+19 |
* | fix type annotations for petabox body fetch helper | Bryan Newbold | 2021-10-26 | 5 | -8/+11 |
* | small type annotation hack | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
* | fileset: fix field renaming bug (caught by mypy) | Bryan Newbold | 2021-10-26 | 1 | -2/+2 |
* | fileset ingest: fix table name typo (via mypy) | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
* | update 'XXX' notes from fileset ingest development | Bryan Newbold | 2021-10-26 | 2 | -9/+6 |
* | bugfix: setting html_biblio on ingest results | Bryan Newbold | 2021-10-26 | 2 | -2/+2 |
* | lint collection membership (last lint for now) | Bryan Newbold | 2021-10-26 | 7 | -32/+32 |
* | commit updated flake8 lint configuration | Bryan Newbold | 2021-10-26 | 1 | -6/+10 |
* | ingest fileset: fix silly import typo | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
* | type annotations for persist workers; required some work | Bryan Newbold | 2021-10-26 | 1 | -66/+59 |
* | ingest file HTTP API: fixes from type checking | Bryan Newbold | 2021-10-26 | 1 | -3/+3 |
* | more progress on type annotations | Bryan Newbold | 2021-10-26 | 8 | -34/+55 |
* | grobid: fix a bug with consolidate_mode header, exposed by type annotations | Bryan Newbold | 2021-10-26 | 1 | -1/+2 |
* | grobid: type annotations | Bryan Newbold | 2021-10-26 | 1 | -9/+19 |
* | type annotations on SandcrawlerWorker | Bryan Newbold | 2021-10-26 | 1 | -46/+57 |
* | more progress on type annotations and linting | Bryan Newbold | 2021-10-26 | 11 | -55/+87 |
* | live tests: FTP wayback replay now returns 200, not 226 | Bryan Newbold | 2021-10-26 | 1 | -2/+2 |
* | ia: more tweaks to delicate code to satisfy type checker | Bryan Newbold | 2021-10-26 | 1 | -10/+12 |
* | ia helpers: enforce max_redirects count correctly | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
* | set CDX request params are str, not int or datetime | Bryan Newbold | 2021-10-26 | 1 | -3/+6 |
* | bugfix: was setting 'from' parameter as a tuple, not a string | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
* | start type annotating IA helper code | Bryan Newbold | 2021-10-26 | 1 | -37/+65 |
* | start adding python type annotations to db and persist code | Bryan Newbold | 2021-10-26 | 2 | -97/+124 |
* | Makefile: don't fail on isort error (consider these minor) | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
* | tweak flake8 config | Bryan Newbold | 2021-10-26 | 1 | -2/+11 |
* | flake8 clean (with current settings) | Bryan Newbold | 2021-10-26 | 9 | -25/+24 |
* | pipenv: import type annotations for requests and dateparser | Bryan Newbold | 2021-10-26 | 2 | -1/+19 |
* | start handling trivial lint cleanups: unused imports, 'is None', etc | Bryan Newbold | 2021-10-26 | 30 | -149/+86 |
* | make fmt | Bryan Newbold | 2021-10-26 | 59 | -1225/+1582 |
* | tweak lint/fmt settings | Bryan Newbold | 2021-10-26 | 2 | -4/+6 |
* | update pytest warning filters (they are pretty expansive) | Bryan Newbold | 2021-10-26 | 1 | -0/+3 |
* | ingest_html: update trafilatura TEI-XML output kwarg | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
* | python: isort all imports | Bryan Newbold | 2021-10-26 | 57 | -178/+207 |
* | add pyproject.toml (for isort and yapf config), and update 'lint' and 'fmt' m... | Bryan Newbold | 2021-10-26 | 2 | -3/+13 |