Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | update TODO with some old examples | Bryan Newbold | 2020-07-01 | 1 | -0/+10 |
| | |||||
* | commit old example notes | Bryan Newbold | 2020-07-01 | 3 | -0/+65 |
| | |||||
* | JALC bulk edit notes from 2020-03-23 | Bryan Newbold | 2020-07-01 | 1 | -0/+23 |
| | |||||
* | commit example of an elasticsearch SQL query | Bryan Newbold | 2020-07-01 | 1 | -0/+8 |
| | |||||
* | commit old README about bulk downloads | Bryan Newbold | 2020-07-01 | 1 | -0/+40 |
| | |||||
* | CLI proposal | Bryan Newbold | 2020-06-30 | 1 | -0/+124 |
| | |||||
* | add new license mappings | Bryan Newbold | 2020-06-30 | 2 | -0/+27 |
| | |||||
* | datacite: improve license mapping | Martin Czygan | 2020-06-30 | 2 | -9/+29 |
| | | | | via "missed potential license", refs #58 | ||||
* | Merge branch 'martin-datacite-fix-strptime-36559' into 'master' | bnewbold | 2020-06-29 | 2 | -1/+2 |
|\ | | | | | | | | | datacite: hard cast possible date value to string See merge request webgroup/fatcat!59 | ||||
| * | datacite: hard cast possible date value to string | Martin Czygan | 2020-06-29 | 2 | -1/+2 |
|/ | |||||
* | remove accidentally-commited lines from rust Makefile | Bryan Newbold | 2020-06-26 | 1 | -3/+0 |
| | |||||
* | disallow a specific unicode character from DOIs | Bryan Newbold | 2020-06-26 | 1 | -0/+6 |
| | |||||
* | Merge branch 'martin-fulltext-checkbox-label' into 'master' | bnewbold | 2020-06-17 | 1 | -2/+2 |
|\ | | | | | | | | | make fulltext-only label clickable See merge request webgroup/fatcat!58 | ||||
| * | make fulltext-only label clickable | Martin Czygan | 2020-06-16 | 1 | -2/+2 |
|/ | |||||
* | Merge branch 'bnewbold-better-button-links' into 'master' | Martin Czygan | 2020-06-05 | 5 | -4/+19 |
|\ | | | | | | | | | better download button links See merge request webgroup/fatcat!57 | ||||
| * | use ES 'best_url' in file download pages | Bryan Newbold | 2020-06-04 | 2 | -2/+4 |
| | | | | | | | | Similar to recent change for release download pages. | ||||
| * | ES schema: add best_url to file schema | Bryan Newbold | 2020-06-04 | 2 | -0/+13 |
| | | | | | | | | | | | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile. | ||||
| * | re-use 'best pdf url' for release green button | Bryan Newbold | 2020-06-04 | 1 | -2/+2 |
| | | | | | | | | | | | | | | I thought this was the existing behavior, but it looks like we were just taking the first link from the first file. In the future may refactor this out even further. | ||||
* | | fix 'dev' target in python makefile | Bryan Newbold | 2020-06-04 | 1 | -1/+1 |
|/ | |||||
* | Merge remote-tracking branch 'origin/martin-harvest-fail-on-400' | Bryan Newbold | 2020-05-29 | 1 | -4/+0 |
|\ | | | | | | | | | | | Manually resolved conflicts: python/fatcat_tools/harvest/doi_registrars.py | ||||
| * | harvest: fail on HTTP 400 | Martin Czygan | 2020-05-29 | 1 | -4/+0 |
| | | | | | | | | | | | | | | | | | | In the past harvest of datacite resulted in occasional HTTP 400. Meanwhile, various API bugs have been fixed (most recently: https://github.com/datacite/lupo/pull/537, https://github.com/datacite/datacite/issues/1038). Downside of ignoring this error was that state lives in kafka, which has limited support for deletion of arbitrary messages from a topic. | ||||
* | | Merge branch 'martin-datacite-harvest-log-output' into 'master' | Martin Czygan | 2020-05-29 | 1 | -1/+1 |
|\ \ | | | | | | | | | | | | | harvest: log the failed url See merge request webgroup/fatcat!55 | ||||
| * | | harvest: log the failed url | Martin Czygan | 2020-05-29 | 1 | -1/+1 |
| |/ | |||||
* | | Merge branch 'martin-datacite-harvest-test-docs' into 'master' | Martin Czygan | 2020-05-29 | 1 | -3/+3 |
|\ \ | |/ |/| | | | | | datacite: fix test docs See merge request webgroup/fatcat!54 | ||||
| * | datacite: fix test docs | Martin Czygan | 2020-05-29 | 1 | -3/+3 |
|/ | |||||
* | Merge branch 'bnewbold-ingest-stage' into 'master' | Martin Czygan | 2020-05-28 | 3 | -7/+46 |
|\ | | | | | | | | | verify release_stage in ingest importer See merge request webgroup/fatcat!52 | ||||
| * | ingest importer: check that stage is consistent with release | Bryan Newbold | 2020-05-26 | 1 | -0/+5 |
| | | |||||
| * | regression test for release_stage mismatch with ingest request | Bryan Newbold | 2020-05-26 | 2 | -7/+41 |
| | | |||||
* | | Merge branch 'bnewbold-harvest-state-next-span' into 'master' | Martin Czygan | 2020-05-27 | 5 | -7/+7 |
|\ \ | |/ |/| | | | | | rename HarvestState.next() to HarvestState.next_span() See merge request webgroup/fatcat!53 | ||||
| * | rename HarvestState.next() to HarvestState.next_span() | Bryan Newbold | 2020-05-26 | 5 | -7/+7 |
|/ | | | | | | | | | "span" short for "timespan" to harvest; there may be a better name to use. Motivation for this is to work around a pylint erorr that .next() was not callable. This might be a bug with pylint, but .next() is also a very generic name. | ||||
* | add work-in-progress Rust makefile | Bryan Newbold | 2020-05-26 | 2 | -2/+29 |
| | |||||
* | add a work-in-progress python Makefile | Bryan Newbold | 2020-05-26 | 1 | -0/+24 |
| | |||||
* | pylintrc: skip many spurious WTForm no-member errors | Bryan Newbold | 2020-05-26 | 1 | -0/+2 |
| | |||||
* | HACK: try to squelch pylint in CI | Bryan Newbold | 2020-05-26 | 1 | -2/+2 |
| | | | | | | | | | | | | | | | | | Gitlab CI is showing lint errors like: =================================== FAILURES =================================== 6316 _______________________ [pylint] tests/harvest_state.py ________________________ 6317 E: 19,11: hs.next is not callable (not-callable) 6318 E: 33,11: hs.next is not callable (not-callable) 6319 E: 19,11: hs.next is not callable (not-callable) [...] this is confusing as we use pipenv with a lock, so I should see the exact same errors locally. This commit is a hack to try and fix this and unbreak builds until we can debug further. | ||||
* | sql: really don't double-dump requests | Bryan Newbold | 2020-05-26 | 1 | -1/+0 |
| | | | | | | I guess we were dumping 3 times originally; already had an earlier commit that removed one row from this README (that I copypaste to CLI every time) | ||||
* | 2020-05-26 prod database size and stats | Bryan Newbold | 2020-05-26 | 2 | -0/+48 |
| | |||||
* | HACK: skip pylint errors on lines that seem to be fine | Bryan Newbold | 2020-05-22 | 3 | -3/+3 |
| | | | | | It seems to be an inadvertantly ugraded version of pylint saying that these lines are not-callable. | ||||
* | run flake8 in CI | Bryan Newbold | 2020-05-22 | 1 | -0/+1 |
| | |||||
* | pipenv: add flake8 | Bryan Newbold | 2020-05-22 | 2 | -183/+213 |
| | |||||
* | Merge remote-tracking branch 'github/master' | Bryan Newbold | 2020-05-22 | 3 | -11/+11 |
|\ | |||||
| * | Merge pull request #55 from cclauss/patch-1 | bnewbold | 2020-05-22 | 3 | -11/+11 |
| |\ | | | | | | | Travis CI: Lint Python code for syntax errors and undefined names | ||||
| | * | LICENSE.md: Properly capitalize brand names | Christian Clauss | 2020-05-14 | 1 | -4/+4 |
| | | | |||||
| | * | Delete .travis.yml | Christian Clauss | 2020-05-14 | 1 | -6/+0 |
| | | | |||||
| | * | Indentity is not the same this as equality in Python | Christian Clauss | 2020-05-14 | 1 | -2/+2 |
| | | | |||||
| | * | Indentity is not the same this as equality in Python | Christian Clauss | 2020-05-14 | 1 | -5/+5 |
| | | | |||||
| | * | python: 3.8 | Christian Clauss | 2020-05-13 | 1 | -0/+2 |
| | | | |||||
| | * | Travis CI: Lint Python code for syntax errors and undefined names | Christian Clauss | 2020-05-13 | 1 | -0/+4 |
| |/ | |||||
* | | importers: clarify handling of ApiException | Bryan Newbold | 2020-05-22 | 3 | -4/+10 |
| | | | | | | | | | | | | | | | | One of these (in ingest importer pipeline) is an actual bug, the others are just changing the syntax to be more explicit/conservative. The ingest importer bug seems to have resulted in some bad file match imports; scale of impact is unknown. | ||||
* | | ingest importer: don't use glutton matches | Bryan Newbold | 2020-05-22 | 1 | -3/+3 |
| | | | | | | | | | | | | | | Until reviewing I didn't realize we were even doing this currently. Hopefluly has not impacted too many imports, as almost all ingests use an external identifer, so only those with identifers not in fatcat for whatever reason. | ||||
* | | retro CHANGELOG entry for python client library pypi package | Bryan Newbold | 2020-05-14 | 1 | -0/+4 |
| | |