aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* | updates to MakefileBryan Newbold2020-07-013-6/+33
| |
* | reviewer: fix bugs in common code found by mypyBryan Newbold2020-07-011-2/+3
| |
* | update TODO with some old examplesBryan Newbold2020-07-011-0/+10
| |
* | commit old example notesBryan Newbold2020-07-013-0/+65
| |
* | JALC bulk edit notes from 2020-03-23Bryan Newbold2020-07-011-0/+23
| |
* | commit example of an elasticsearch SQL queryBryan Newbold2020-07-011-0/+8
| |
* | commit old README about bulk downloadsBryan Newbold2020-07-011-0/+40
|/
* CLI proposalBryan Newbold2020-06-301-0/+124
|
* add new license mappingsBryan Newbold2020-06-302-0/+27
|
* datacite: improve license mappingMartin Czygan2020-06-302-9/+29
| | | | via "missed potential license", refs #58
* Merge branch 'martin-datacite-fix-strptime-36559' into 'master'bnewbold2020-06-292-1/+2
|\ | | | | | | | | datacite: hard cast possible date value to string See merge request webgroup/fatcat!59
| * datacite: hard cast possible date value to stringMartin Czygan2020-06-292-1/+2
|/
* remove accidentally-commited lines from rust MakefileBryan Newbold2020-06-261-3/+0
|
* disallow a specific unicode character from DOIsBryan Newbold2020-06-261-0/+6
|
* Merge branch 'martin-fulltext-checkbox-label' into 'master'bnewbold2020-06-171-2/+2
|\ | | | | | | | | make fulltext-only label clickable See merge request webgroup/fatcat!58
| * make fulltext-only label clickableMartin Czygan2020-06-161-2/+2
|/
* Merge branch 'bnewbold-better-button-links' into 'master'Martin Czygan2020-06-055-4/+19
|\ | | | | | | | | better download button links See merge request webgroup/fatcat!57
| * use ES 'best_url' in file download pagesBryan Newbold2020-06-042-2/+4
| | | | | | | | Similar to recent change for release download pages.
| * ES schema: add best_url to file schemaBryan Newbold2020-06-042-0/+13
| | | | | | | | | | | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
| * re-use 'best pdf url' for release green buttonBryan Newbold2020-06-041-2/+2
| | | | | | | | | | | | | | I thought this was the existing behavior, but it looks like we were just taking the first link from the first file. In the future may refactor this out even further.
* | fix 'dev' target in python makefileBryan Newbold2020-06-041-1/+1
|/
* Merge remote-tracking branch 'origin/martin-harvest-fail-on-400'Bryan Newbold2020-05-291-4/+0
|\ | | | | | | | | | | Manually resolved conflicts: python/fatcat_tools/harvest/doi_registrars.py
| * harvest: fail on HTTP 400Martin Czygan2020-05-291-4/+0
| | | | | | | | | | | | | | | | | | In the past harvest of datacite resulted in occasional HTTP 400. Meanwhile, various API bugs have been fixed (most recently: https://github.com/datacite/lupo/pull/537, https://github.com/datacite/datacite/issues/1038). Downside of ignoring this error was that state lives in kafka, which has limited support for deletion of arbitrary messages from a topic.
* | Merge branch 'martin-datacite-harvest-log-output' into 'master'Martin Czygan2020-05-291-1/+1
|\ \ | | | | | | | | | | | | harvest: log the failed url See merge request webgroup/fatcat!55
| * | harvest: log the failed urlMartin Czygan2020-05-291-1/+1
| |/
* | Merge branch 'martin-datacite-harvest-test-docs' into 'master'Martin Czygan2020-05-291-3/+3
|\ \ | |/ |/| | | | | datacite: fix test docs See merge request webgroup/fatcat!54
| * datacite: fix test docsMartin Czygan2020-05-291-3/+3
|/
* Merge branch 'bnewbold-ingest-stage' into 'master'Martin Czygan2020-05-283-7/+46
|\ | | | | | | | | verify release_stage in ingest importer See merge request webgroup/fatcat!52
| * ingest importer: check that stage is consistent with releaseBryan Newbold2020-05-261-0/+5
| |
| * regression test for release_stage mismatch with ingest requestBryan Newbold2020-05-262-7/+41
| |
* | Merge branch 'bnewbold-harvest-state-next-span' into 'master'Martin Czygan2020-05-275-7/+7
|\ \ | |/ |/| | | | | rename HarvestState.next() to HarvestState.next_span() See merge request webgroup/fatcat!53
| * rename HarvestState.next() to HarvestState.next_span()Bryan Newbold2020-05-265-7/+7
|/ | | | | | | | | "span" short for "timespan" to harvest; there may be a better name to use. Motivation for this is to work around a pylint erorr that .next() was not callable. This might be a bug with pylint, but .next() is also a very generic name.
* add work-in-progress Rust makefileBryan Newbold2020-05-262-2/+29
|
* add a work-in-progress python MakefileBryan Newbold2020-05-261-0/+24
|
* pylintrc: skip many spurious WTForm no-member errorsBryan Newbold2020-05-261-0/+2
|
* HACK: try to squelch pylint in CIBryan Newbold2020-05-261-2/+2
| | | | | | | | | | | | | | | | | Gitlab CI is showing lint errors like: =================================== FAILURES =================================== 6316 _______________________ [pylint] tests/harvest_state.py ________________________ 6317 E: 19,11: hs.next is not callable (not-callable) 6318 E: 33,11: hs.next is not callable (not-callable) 6319 E: 19,11: hs.next is not callable (not-callable) [...] this is confusing as we use pipenv with a lock, so I should see the exact same errors locally. This commit is a hack to try and fix this and unbreak builds until we can debug further.
* sql: really don't double-dump requestsBryan Newbold2020-05-261-1/+0
| | | | | | I guess we were dumping 3 times originally; already had an earlier commit that removed one row from this README (that I copypaste to CLI every time)
* 2020-05-26 prod database size and statsBryan Newbold2020-05-262-0/+48
|
* HACK: skip pylint errors on lines that seem to be fineBryan Newbold2020-05-223-3/+3
| | | | | It seems to be an inadvertantly ugraded version of pylint saying that these lines are not-callable.
* run flake8 in CIBryan Newbold2020-05-221-0/+1
|
* pipenv: add flake8Bryan Newbold2020-05-222-183/+213
|
* Merge remote-tracking branch 'github/master'Bryan Newbold2020-05-223-11/+11
|\
| * Merge pull request #55 from cclauss/patch-1bnewbold2020-05-223-11/+11
| |\ | | | | | | Travis CI: Lint Python code for syntax errors and undefined names
| | * LICENSE.md: Properly capitalize brand namesChristian Clauss2020-05-141-4/+4
| | |
| | * Delete .travis.ymlChristian Clauss2020-05-141-6/+0
| | |
| | * Indentity is not the same this as equality in PythonChristian Clauss2020-05-141-2/+2
| | |
| | * Indentity is not the same this as equality in PythonChristian Clauss2020-05-141-5/+5
| | |
| | * python: 3.8Christian Clauss2020-05-131-0/+2
| | |
| | * Travis CI: Lint Python code for syntax errors and undefined namesChristian Clauss2020-05-131-0/+4
| |/
* | importers: clarify handling of ApiExceptionBryan Newbold2020-05-223-4/+10
| | | | | | | | | | | | | | | | One of these (in ingest importer pipeline) is an actual bug, the others are just changing the syntax to be more explicit/conservative. The ingest importer bug seems to have resulted in some bad file match imports; scale of impact is unknown.