| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | schema: use container redirect as ident if defined | Bryan Newbold | 2021-11-30 | 1 | -2/+2 |
| | | | | | | | This is to handle containers which have been merged (redirected), but the release entities have not be updated to point to the new "primary" container yet. | ||||
| * | refactor use of grobid_tei_xml | Bryan Newbold | 2021-10-27 | 1 | -4/+5 |
| | | |||||
| * | scrub_text: remove unused mimetype arg | Bryan Newbold | 2021-10-27 | 1 | -1/+1 |
| | | | | | To resolve a warning caught by pytype | ||||
| * | replace classmethods with staticmethods | Bryan Newbold | 2021-10-27 | 1 | -2/+2 |
| | | |||||
| * | lint: small cleanups, mostly E711 and E713 | Bryan Newbold | 2021-10-27 | 1 | -10/+10 |
| | | |||||
| * | make fmt (black 21.9b0) | Bryan Newbold | 2021-10-27 | 1 | -1/+0 |
| | | |||||
| * | re-style imports (isort) on all core python files | Bryan Newbold | 2021-10-27 | 1 | -6/+7 |
| | | |||||
| * | better parsing of year as integer in refs pipeline | Bryan Newbold | 2021-07-26 | 1 | -2/+6 |
| | | |||||
| * | bibref: add version field; isbn13 -> isbn | Bryan Newbold | 2021-07-25 | 1 | -1/+2 |
| | | |||||
| * | refs transform: 1-index refs.index, not 0-index | Bryan Newbold | 2021-07-25 | 1 | -1/+1 |
| | | | | | | | | | This was not matching expectations/schema of downstream refs pipeline (cgraph), and wasn't matching documented schema. Note care required when checking if the index is set, to distinguish between '0' and 'None' values. | ||||
| * | refs: include (source) release_stage in output | Bryan Newbold | 2021-06-30 | 1 | -0/+1 |
| | | |||||
| * | schema: add 'crossref' to bundle schema, and add from_json() helper | Bryan Newbold | 2021-06-02 | 1 | -1/+20 |
| | | | | | | from_json() refactor was an earlier TODO, to reduce duplication when updating fields on this class | ||||
| * | indexing: defer to creator.display_name over contrib.raw_name | Bryan Newbold | 2021-04-12 | 1 | -1/+3 |
| | | |||||
| * | catch HTML parsing error from withing html (via bs4) | Bryan Newbold | 2021-02-01 | 1 | -2/+9 |
| | | |||||
| * | bugfix: container_sherpa_color not defined | Bryan Newbold | 2021-01-29 | 1 | -1/+1 |
| | | |||||
| * | make fmt | Bryan Newbold | 2021-01-25 | 1 | -1/+3 |
| | | |||||
| * | basic support for excluding web content from index | Bryan Newbold | 2021-01-22 | 1 | -0/+14 |
| | | | | | Based on particular patterns in metadata, or exclusion lists in settings | ||||
| * | add container_sherpa_color field, and populate it | Bryan Newbold | 2021-01-22 | 1 | -18/+18 |
| | | |||||
| * | refactor DOI domain lookup into python code; expand table | Bryan Newbold | 2021-01-21 | 1 | -0/+14 |
| | | |||||
| * | citation: fixes to generic hack; remove bibtex hack | Bryan Newbold | 2021-01-21 | 1 | -31/+6 |
| | | |||||
| * | fixup: check for container.extra in indexing pipeline | Bryan Newbold | 2021-01-21 | 1 | -1/+3 |
| | | |||||
| * | fix indexing bug (false-y publisher_type?) | Bryan Newbold | 2021-01-18 | 1 | -0/+2 |
| | | |||||
| * | lint: fix small bugs and type annotations | Bryan Newbold | 2021-01-18 | 1 | -1/+2 |
| | | |||||
| * | small corrections to schema/transform | Bryan Newbold | 2021-01-16 | 1 | -1/+4 |
| | | |||||
| * | make fmt | Bryan Newbold | 2021-01-15 | 1 | -6/+6 |
| | | |||||
| * | crude bibtex and citation formatting, as a demo | Bryan Newbold | 2021-01-14 | 1 | -0/+49 |
| | | |||||
| * | schema: make fulltext body optional (eg, for search results) | Bryan Newbold | 2021-01-14 | 1 | -1/+1 |
| | | |||||
| * | add support for new identifiers and size_bytes schema fields | Bryan Newbold | 2021-01-14 | 1 | -4/+13 |
| | | |||||
| * | add basic html fulltext support to fetch pipeline | Bryan Newbold | 2020-11-18 | 1 | -0/+1 |
| | | |||||
| * | schema: optional 'fetched' field on bundles | Bryan Newbold | 2020-10-16 | 1 | -0/+2 |
| | | |||||
| * | make fmt | Bryan Newbold | 2020-09-13 | 1 | -6/+12 |
| | | |||||
| * | ref transform: support more GROBID fields | Bryan Newbold | 2020-09-13 | 1 | -1/+4 |
| | | |||||
| * | URL cleanup helper | Bryan Newbold | 2020-09-13 | 1 | -0/+28 |
| | | |||||
| * | heavy to refs command | Bryan Newbold | 2020-09-04 | 1 | -0/+36 |
| | | |||||
| * | handle small ints better (signed/unsigned abs size) | Bryan Newbold | 2020-08-12 | 1 | -1/+2 |
| | | |||||
| * | transform: more string cleaning | Bryan Newbold | 2020-08-12 | 1 | -12/+59 |
| | | |||||
| * | volume_int/issue_int as actual ints | Bryan Newbold | 2020-08-06 | 1 | -2/+2 |
| | | |||||
| * | handle integer conversion and bounding for ES schema | Bryan Newbold | 2020-08-06 | 1 | -9/+22 |
| | | |||||
| * | scrub_text: single-token strings skipped | Bryan Newbold | 2020-08-06 | 1 | -0/+4 |
| | | |||||
| * | strip ACKNOWLEDGEMENTS prefix | Bryan Newbold | 2020-08-06 | 1 | -0/+1 |
| | | |||||
| * | transform: catch more cases of null extra | Bryan Newbold | 2020-07-30 | 1 | -10/+10 |
| | | | | | Also correctly pull issne/issnp from container.extra, not release.extra. | ||||
| * | abstracts: more prefixes to ignore | Bryan Newbold | 2020-07-27 | 1 | -0/+3 |
| | | |||||
| * | strip <em> tags explicitly | Bryan Newbold | 2020-07-21 | 1 | -0/+1 |
| | | |||||
| * | handle large/bad 'first_page' metadata | Bryan Newbold | 2020-06-29 | 1 | -0/+3 |
| | | | | | This was causing elasticsearch indexing errors | ||||
| * | more conservative container_original_name | Bryan Newbold | 2020-06-29 | 1 | -0/+2 |
| | | |||||
| * | fix lint errors (and some small bugs) | Bryan Newbold | 2020-06-29 | 1 | -2/+1 |
| | | |||||
| * | fixes to schema parsing from prod | Bryan Newbold | 2020-06-29 | 1 | -9/+13 |
| | | |||||
| * | include GROBID-extracted abstracts in search documents | Bryan Newbold | 2020-06-29 | 1 | -0/+8 |
| | | |||||
| * | fetch pdftotext and pdf_meta from blobs, postgrest | Bryan Newbold | 2020-06-29 | 1 | -4/+5 |
| | | | | | | This replaces the temporary COVID-19 content hack with production content (text, thumbnail URLs) stored in postgrest and seaweedfs. | ||||
| * | commit production work-around (temporarily) | Bryan Newbold | 2020-06-04 | 1 | -1/+2 |
| | | |||||
