Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | schema: optional 'fetched' field on bundles | Bryan Newbold | 2020-10-16 | 1 | -0/+2 |
* | make fmt | Bryan Newbold | 2020-09-13 | 1 | -6/+12 |
* | ref transform: support more GROBID fields | Bryan Newbold | 2020-09-13 | 1 | -1/+4 |
* | URL cleanup helper | Bryan Newbold | 2020-09-13 | 1 | -0/+28 |
* | heavy to refs command | Bryan Newbold | 2020-09-04 | 1 | -0/+36 |
* | handle small ints better (signed/unsigned abs size) | Bryan Newbold | 2020-08-12 | 1 | -1/+2 |
* | transform: more string cleaning | Bryan Newbold | 2020-08-12 | 1 | -12/+59 |
* | volume_int/issue_int as actual ints | Bryan Newbold | 2020-08-06 | 1 | -2/+2 |
* | handle integer conversion and bounding for ES schema | Bryan Newbold | 2020-08-06 | 1 | -9/+22 |
* | scrub_text: single-token strings skipped | Bryan Newbold | 2020-08-06 | 1 | -0/+4 |
* | strip ACKNOWLEDGEMENTS prefix | Bryan Newbold | 2020-08-06 | 1 | -0/+1 |
* | transform: catch more cases of null extra | Bryan Newbold | 2020-07-30 | 1 | -10/+10 |
* | abstracts: more prefixes to ignore | Bryan Newbold | 2020-07-27 | 1 | -0/+3 |
* | strip <em> tags explicitly | Bryan Newbold | 2020-07-21 | 1 | -0/+1 |
* | handle large/bad 'first_page' metadata | Bryan Newbold | 2020-06-29 | 1 | -0/+3 |
* | more conservative container_original_name | Bryan Newbold | 2020-06-29 | 1 | -0/+2 |
* | fix lint errors (and some small bugs) | Bryan Newbold | 2020-06-29 | 1 | -2/+1 |
* | fixes to schema parsing from prod | Bryan Newbold | 2020-06-29 | 1 | -9/+13 |
* | include GROBID-extracted abstracts in search documents | Bryan Newbold | 2020-06-29 | 1 | -0/+8 |
* | fetch pdftotext and pdf_meta from blobs, postgrest | Bryan Newbold | 2020-06-29 | 1 | -4/+5 |
* | commit production work-around (temporarily) | Bryan Newbold | 2020-06-04 | 1 | -1/+2 |
* | collapse pages by SIM issue | Bryan Newbold | 2020-06-04 | 1 | -0/+1 |
* | fmt | Bryan Newbold | 2020-06-04 | 1 | -0/+2 |
* | start some annotaition fixes for pytype | Bryan Newbold | 2020-06-03 | 1 | -1/+3 |
* | more flake8 | Bryan Newbold | 2020-06-03 | 1 | -1/+1 |
* | flake8 fixes (partial) | Bryan Newbold | 2020-06-03 | 1 | -1/+1 |
* | reformat python code with black | Bryan Newbold | 2020-06-03 | 1 | -38/+64 |
* | improve text scrubbing | Bryan Newbold | 2020-06-03 | 1 | -13/+21 |
* | add prefix scrubing (esp. for abstracts) | Bryan Newbold | 2020-05-21 | 1 | -0/+18 |
* | use beautiful soup for XML scrubing | Bryan Newbold | 2020-05-21 | 1 | -7/+6 |
* | be more inclusive of author names | Bryan Newbold | 2020-05-21 | 1 | -4/+4 |
* | fixes from manual testing | Bryan Newbold | 2020-05-20 | 1 | -7/+11 |
* | first pass transform from pipelines to ES schema | Bryan Newbold | 2020-05-20 | 1 | -0/+334 |