summaryrefslogtreecommitdiffstats
path: root/tests
Commit message (Collapse)AuthorAgeFilesLines
* refs: include GROBID-parsed crossref refsBryan Newbold2021-12-061-0/+1
| | | | | | This takes advantage of Crossref 'unstructured' refs which have been parsed using GROBID and stored in the sandcrawler database, as part of the sandcrawler crossref metadata pipeline.
* fetch GROBID-parsed refs along with crossref metadataBryan Newbold2021-12-061-1/+2
|
* Revert "pull GROBID refs along with crossref records into bundles"Bryan Newbold2021-11-101-2/+1
| | | | | | This reverts commit c164970449a392b5165d903d213c2bb51f2a187f. Didn't mean to merge this to master just yet.
* lint: disallow 'import *' even in testsBryan Newbold2021-11-102-4/+14
|
* pull GROBID refs along with crossref records into bundlesBryan Newbold2021-11-101-1/+2
|
* refactor use of grobid_tei_xmlBryan Newbold2021-10-272-3/+33
|
* replace grobid2json with grobid_tei_xmlBryan Newbold2021-10-272-5/+11
| | | | | This first iteration uses the .to_legacy_dict() helpers for backwards compatibility
* lint: small cleanups, mostly E711 and E713Bryan Newbold2021-10-272-2/+2
|
* make fmt (black 21.9b0)Bryan Newbold2021-10-272-3/+8
|
* re-style imports (isort) on all core python filesBryan Newbold2021-10-277-7/+9
|
* web: access_redirect_fallback mechanismBryan Newbold2021-07-261-1/+102
| | | | | | | | | | | | This adds a helper code path that "tries harder" to find an access link, by querying the fatcat API directly to look for any file from any release associated with the work. If it finds a match, it does the redirect as usual (but does log the incident). If no match can be found, there is now a more helpful access-specific 404 error page. If the *work* is a 404, the generic error page is shown.
* make fmtBryan Newbold2021-07-261-5/+13
|
* fix failing test after clean_doi()Bryan Newbold2021-07-261-1/+1
|
* refs transform: many fixesBryan Newbold2021-07-252-1/+274
| | | | | | | | | - include year correctly (many cases) - test coverage for Crossref transform - pass-through 'edition' as 'version' - series-title parsed in to title or container as appropriate - missing release stage - fix 0-index vs. 1-index ref index field
* refs transform: 1-index refs.index, not 0-indexBryan Newbold2021-07-251-1/+1
| | | | | | | | This was not matching expectations/schema of downstream refs pipeline (cgraph), and wasn't matching documented schema. Note care required when checking if the index is set, to distinguish between '0' and 'None' values.
* refs: include (source) release_stage in outputBryan Newbold2021-06-301-9/+18
|
* commit missing elastic get example JSON filesBryan Newbold2021-06-112-0/+174
|
* update citation_pdf_url HTML meta tag to new access URL styleBryan Newbold2021-06-111-0/+1
|
* update access redirect URL endpointsBryan Newbold2021-06-111-19/+20
|
* lint fixes, and run fmtBryan Newbold2021-06-021-4/+1
|
* add 'crossref' hydration to work pipelineBryan Newbold2021-06-021-0/+16
| | | | | | | | The immediate motivation is to include recent crossref refs in citation graph transforms. May also be valuable for researchers to have authoritative/publisher metadata in the bundle dumps.
* web: fixes to access redirect endpointsBryan Newbold2021-05-191-0/+11
|
* iterate on PDF redirect linksBryan Newbold2021-05-171-3/+41
|
* iterate on access redirects and landing page implementationBryan Newbold2021-04-272-0/+123
| | | | Small code refactors and minimal test coverage
* Revert undesirable changesChristian Clauss2021-02-236-11/+11
|
* Modernize Python syntax with pyupgrade --py38-plus **/*.pyChristian Clauss2021-02-236-11/+11
|
* api: handle null 'q' parameter on search endpointBryan Newbold2021-02-111-1/+5
|
* refactor ES configuration setting namesBryan Newbold2021-01-251-1/+1
|
* api: fix /search test, and mypy error on implementationBryan Newbold2021-01-151-1/+11
|
* add mocks to work pipeline testBryan Newbold2021-01-141-1/+63
|
* add regression test for uvloop+httptools uvicorn problemBryan Newbold2021-01-051-0/+11
|
* improve Accept-Language header parsingBryan Newbold2020-12-021-0/+4
|
* fmtBryan Newbold2020-10-281-1/+0
|
* fixes to issue_db testsBryan Newbold2020-10-231-6/+3
|
* basic web search testBryan Newbold2020-10-232-1/+1701
|
* basic test for issue-db pipelineBryan Newbold2020-10-233-0/+30
|
* start test coverage for web interfaceBryan Newbold2020-10-222-0/+68
|
* improve test coverageBryan Newbold2020-10-225-0/+72
|
* minimum viable tests for GROBID XML parsing and refs transformBryan Newbold2020-09-143-0/+535
|
* another clean_str() test caseBryan Newbold2020-08-121-0/+4
|
* transform: more string cleaningBryan Newbold2020-08-121-1/+19
|
* scrub_text: single-token strings skippedBryan Newbold2020-08-061-1/+1
|
* start some annotaition fixes for pytypeBryan Newbold2020-06-031-1/+1
|
* flake8-annotation lintingBryan Newbold2020-06-033-4/+4
| | | | Added some new annotations; need to finish more.
* flake8 fixes (partial)Bryan Newbold2020-06-032-3/+0
|
* reformat python code with blackBryan Newbold2020-06-033-13/+19
|
* improve text scrubbingBryan Newbold2020-06-031-0/+15
| | | | | | | | | | Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short.
* first pass transform from pipelines to ES schemaBryan Newbold2020-05-201-1/+1
|
* initial progress on work pipelineBryan Newbold2020-05-161-2/+2
|
* crude djvu XML parsingBryan Newbold2020-05-162-0/+5158
|