Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | improve previous commit (JATS abstract hack) | Bryan Newbold | 2019-12-03 | 1 | -4/+6 |
| | |||||
* | hack: remove enclosing JATS XML tags around abstracts | Bryan Newbold | 2019-12-03 | 1 | -1/+7 |
| | | | | | | The more complete fix is to actually render the JATS to HTML and display that. This is just to fix a nit with the most common case of XML tags in abstracts. | ||||
* | tweaks to file ingest importer | Bryan Newbold | 2019-12-03 | 2 | -3/+10 |
| | | | | | - allow overriding source filter whitelist (common case for CLI use) - fix editgroup description env variable pass-through | ||||
* | crossref is_update isn't what I thought | Bryan Newbold | 2019-12-03 | 1 | -6/+2 |
| | | | | | | | | I thought this would filter for metadata updates to an existing DOI, but actually "updates" are a type of DOI (eg, a retraction). TODO: handle 'updates' field. Should both do a lookup and set work_ident appropriately, and store in crossref-specific metadata. | ||||
* | bump required rust to 1.36 | Bryan Newbold | 2019-12-03 | 2 | -2/+2 |
| | | | | | | | | | | | | This isn't a fatcat rust requirement, but instead a diesel requirement, via rust-smallvec, which in v1.0 uses the alloc crate: https://github.com/servo/rust-smallvec/issues/73 I think the reason this came up now is that diesel-cli is an application and doesn't have a Cargo.lock file, and the build was updated. Using some binary mechanism to install these dependencies would be more robust, but feels like a yak shave right now. | ||||
* | update gitlab-ci to rust 1.34 | Bryan Newbold | 2019-12-03 | 1 | -1/+1 |
| | | | | | Apparently the rust:1.34-stretch image is gone from docker hub, and this was causing CI errors. | ||||
* | make file edit form hash values case insensitive | Bryan Newbold | 2019-12-02 | 1 | -0/+3 |
| | | | | | | | Test in previous commit. This fixes a user-reported 500 error when creating a file with SHA1/SHA256/MD5 hashes in upper-case. | ||||
* | add regression test for upper-case SHA-1 form submit | Bryan Newbold | 2019-12-02 | 1 | -0/+10 |
| | |||||
* | re-order ingest want() for better stats | Bryan Newbold | 2019-11-15 | 1 | -7/+10 |
| | |||||
* | project -> ingest_request_source | Bryan Newbold | 2019-11-15 | 3 | -9/+9 |
| | |||||
* | have ingest-file-results importer operate as crawl-bot | Bryan Newbold | 2019-11-15 | 1 | -1/+1 |
| | | | | As opposed to sandcrawler-bot | ||||
* | fix release.pmcid typo | Bryan Newbold | 2019-11-15 | 1 | -2/+2 |
| | |||||
* | better ingest-file-results import name | Bryan Newbold | 2019-11-15 | 1 | -1/+1 |
| | |||||
* | ingest importer fixes | Bryan Newbold | 2019-11-15 | 1 | -3/+4 |
| | |||||
* | more ingest importer comments and counts | Bryan Newbold | 2019-11-15 | 2 | -2/+29 |
| | |||||
* | crude support for 'sandcrawler' kafka topics | Bryan Newbold | 2019-11-15 | 1 | -2/+3 |
| | |||||
* | ingest file result importer | Bryan Newbold | 2019-11-15 | 5 | -2/+228 |
| | |||||
* | test for ingest transform | Bryan Newbold | 2019-11-15 | 1 | -0/+57 |
| | |||||
* | add ingest request feature to entity_updates worker | Bryan Newbold | 2019-11-15 | 2 | -4/+22 |
| | | | | | | | | | | | | | Initially was going to create a new worker to consume from the release update channel, but couldn't get the edit context ("is this a new release, or update to an existing") from that context. Currently there is a flag in source code to control whether we only do OA releases or all releases. Starting with OA only to start slow, but should probably default to all, and make this a config flag. Should probably also have a config flag to control this entire feature. Tested locally in dev. | ||||
* | add ingest request transform (and test) | Bryan Newbold | 2019-11-15 | 3 | -1/+68 |
| | |||||
* | update next schema tweaks proposal doc | Bryan Newbold | 2019-11-15 | 1 | -0/+1 |
| | |||||
* | Merge branch 'martin-search-results-pagination' into 'master' | Martin Czygan | 2019-11-15 | 6 | -20/+82 |
|\ | | | | | | | | | Add basic pagination to search results See merge request webgroup/fatcat!4 | ||||
| * | address test issue | Martin Czygan | 2019-11-15 | 1 | -2/+3 |
| | | |||||
| * | adjust search test case for new wording | Martin Czygan | 2019-11-14 | 1 | -2/+2 |
| | | | | | | | | > "Showing top " -> "Showing first " | ||||
| * | gray out inactive navigation links | Martin Czygan | 2019-11-14 | 1 | -2/+2 |
| | | | | | | | | | | | | | | | | As per [this issue](https://github.com/Semantic-Org/Semantic-UI/issues/1885#issuecomment-77619519), text colors are not supported in semantic ui. To not move text too much, gray out inactive links. | ||||
| * | move pagination into macros | Martin Czygan | 2019-11-14 | 3 | -43/+51 |
| | | | | | | | | | | | | | | | | | | Two new macros: * top_results(found) * bottom_results(found) wip: move pagination into macro | ||||
| * | Add basic pagination to search results | Martin Czygan | 2019-11-08 | 4 | -14/+67 |
| | | | | | | | | | | | | | | | | | | | | | | | | The "deep paging problem" imposes some limit, which currently is a hardcoded default value, `deep_page_limit=2000` in `do_search`. Elasticsearch can be configured, too: > Note that from + size can not be more than the index.max_result_window index setting, which defaults to 10,000. -- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-from-size | ||||
* | | web: catch MacaroonInitException | Bryan Newbold | 2019-11-12 | 1 | -0/+4 |
| | | | | | | | | | | Caught one of these in sentry. Probably due to a crawler? Or typing gibberish in the token form. | ||||
* | | design notes for a larger database | Bryan Newbold | 2019-11-12 | 1 | -0/+81 |
| | | |||||
* | | old proposals for 'next' schema update | Bryan Newbold | 2019-11-12 | 1 | -0/+38 |
| | | |||||
* | | crossref patch bulk import | Bryan Newbold | 2019-11-12 | 2 | -0/+63 |
| | | |||||
* | | Merge branch 'martin-python-readme-es-note' into 'master' | bnewbold | 2019-11-08 | 1 | -0/+5 |
|\ \ | | | | | | | | | | | | | mention elasticsearch empty index setup See merge request webgroup/fatcat!3 | ||||
| * | | mention elasticsearch empty index setup | Martin Czygan | 2019-11-08 | 1 | -0/+5 |
| |/ | | | | | | | | | | | When setting up with the defaults, all works fine, except that the web search will try to access a local elasticsearch. Mention in README, how to create empty indices. | ||||
* | | crossref: accurate blank title counts | Bryan Newbold | 2019-11-05 | 1 | -0/+1 |
| | | |||||
* | | fix crossref component test | Bryan Newbold | 2019-11-04 | 1 | -1/+1 |
| | | |||||
* | | TODO idea: 'first seen' | Bryan Newbold | 2019-11-04 | 1 | -0/+1 |
| | | |||||
* | | crossref: component type | Bryan Newbold | 2019-11-04 | 1 | -1/+3 |
| | | |||||
* | | add 'component' as a release_type | Bryan Newbold | 2019-11-04 | 2 | -0/+3 |
| | | |||||
* | | crossref: count why skip happened | Bryan Newbold | 2019-11-04 | 1 | -1/+7 |
| | | | | | | | | | | | | Might skip based on release type (eg container, not a paper/release), or missing title, or other reasons. Over 7 million DOIs are getting skipped, curious why. | ||||
* | | crossref: don't skip on short/null subtitle | Bryan Newbold | 2019-11-04 | 1 | -1/+1 |
|/ | | | | This was a bug. Should only set subtitle black, not skip the import. | ||||
* | note file fixup pushed in prod | Bryan Newbold | 2019-10-09 | 2 | -1/+64 |
| | |||||
* | move corpus changes to 'notes/bulk_edits' | Bryan Newbold | 2019-10-08 | 3 | -0/+285 |
| | |||||
* | commit file cleaner tests | Bryan Newbold | 2019-10-08 | 1 | -0/+58 |
| | |||||
* | file cleanup tweaks to actually run | Bryan Newbold | 2019-10-08 | 2 | -5/+4 |
| | |||||
* | refactor duplicated b32_hex function in importers | Bryan Newbold | 2019-10-08 | 3 | -21/+11 |
| | |||||
* | dict wrapper for entity_from_json() | Bryan Newbold | 2019-10-08 | 2 | -3/+7 |
| | |||||
* | new cleanup python tool/framework | Bryan Newbold | 2019-10-08 | 5 | -0/+300 |
| | |||||
* | CHANGELOG entry for previous commit | Bryan Newbold | 2019-10-03 | 1 | -0/+6 |
| | |||||
* | redirect direct entity underscore links | Bryan Newbold | 2019-10-03 | 2 | -0/+30 |
| | |||||
* | export raw affiliation strings for analysis | Bryan Newbold | 2019-10-03 | 1 | -0/+17 |
| |