Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | tweaks to file ingest importer | Bryan Newbold | 2019-12-03 | 2 | -3/+10 |
| | | | | | - allow overriding source filter whitelist (common case for CLI use) - fix editgroup description env variable pass-through | ||||
* | crossref is_update isn't what I thought | Bryan Newbold | 2019-12-03 | 1 | -6/+2 |
| | | | | | | | | I thought this would filter for metadata updates to an existing DOI, but actually "updates" are a type of DOI (eg, a retraction). TODO: handle 'updates' field. Should both do a lookup and set work_ident appropriately, and store in crossref-specific metadata. | ||||
* | make file edit form hash values case insensitive | Bryan Newbold | 2019-12-02 | 1 | -0/+3 |
| | | | | | | | Test in previous commit. This fixes a user-reported 500 error when creating a file with SHA1/SHA256/MD5 hashes in upper-case. | ||||
* | add regression test for upper-case SHA-1 form submit | Bryan Newbold | 2019-12-02 | 1 | -0/+10 |
| | |||||
* | re-order ingest want() for better stats | Bryan Newbold | 2019-11-15 | 1 | -7/+10 |
| | |||||
* | project -> ingest_request_source | Bryan Newbold | 2019-11-15 | 3 | -9/+9 |
| | |||||
* | have ingest-file-results importer operate as crawl-bot | Bryan Newbold | 2019-11-15 | 1 | -1/+1 |
| | | | | As opposed to sandcrawler-bot | ||||
* | fix release.pmcid typo | Bryan Newbold | 2019-11-15 | 1 | -2/+2 |
| | |||||
* | better ingest-file-results import name | Bryan Newbold | 2019-11-15 | 1 | -1/+1 |
| | |||||
* | ingest importer fixes | Bryan Newbold | 2019-11-15 | 1 | -3/+4 |
| | |||||
* | more ingest importer comments and counts | Bryan Newbold | 2019-11-15 | 2 | -2/+29 |
| | |||||
* | crude support for 'sandcrawler' kafka topics | Bryan Newbold | 2019-11-15 | 1 | -2/+3 |
| | |||||
* | ingest file result importer | Bryan Newbold | 2019-11-15 | 5 | -2/+228 |
| | |||||
* | test for ingest transform | Bryan Newbold | 2019-11-15 | 1 | -0/+57 |
| | |||||
* | add ingest request feature to entity_updates worker | Bryan Newbold | 2019-11-15 | 2 | -4/+22 |
| | | | | | | | | | | | | | Initially was going to create a new worker to consume from the release update channel, but couldn't get the edit context ("is this a new release, or update to an existing") from that context. Currently there is a flag in source code to control whether we only do OA releases or all releases. Starting with OA only to start slow, but should probably default to all, and make this a config flag. Should probably also have a config flag to control this entire feature. Tested locally in dev. | ||||
* | add ingest request transform (and test) | Bryan Newbold | 2019-11-15 | 3 | -1/+68 |
| | |||||
* | Merge branch 'martin-search-results-pagination' into 'master' | Martin Czygan | 2019-11-15 | 6 | -20/+82 |
|\ | | | | | | | | | Add basic pagination to search results See merge request webgroup/fatcat!4 | ||||
| * | address test issue | Martin Czygan | 2019-11-15 | 1 | -2/+3 |
| | | |||||
| * | adjust search test case for new wording | Martin Czygan | 2019-11-14 | 1 | -2/+2 |
| | | | | | | | | > "Showing top " -> "Showing first " | ||||
| * | gray out inactive navigation links | Martin Czygan | 2019-11-14 | 1 | -2/+2 |
| | | | | | | | | | | | | | | | | As per [this issue](https://github.com/Semantic-Org/Semantic-UI/issues/1885#issuecomment-77619519), text colors are not supported in semantic ui. To not move text too much, gray out inactive links. | ||||
| * | move pagination into macros | Martin Czygan | 2019-11-14 | 3 | -43/+51 |
| | | | | | | | | | | | | | | | | | | Two new macros: * top_results(found) * bottom_results(found) wip: move pagination into macro | ||||
| * | Add basic pagination to search results | Martin Czygan | 2019-11-08 | 4 | -14/+67 |
| | | | | | | | | | | | | | | | | | | | | | | | | The "deep paging problem" imposes some limit, which currently is a hardcoded default value, `deep_page_limit=2000` in `do_search`. Elasticsearch can be configured, too: > Note that from + size can not be more than the index.max_result_window index setting, which defaults to 10,000. -- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-from-size | ||||
* | | web: catch MacaroonInitException | Bryan Newbold | 2019-11-12 | 1 | -0/+4 |
| | | | | | | | | | | Caught one of these in sentry. Probably due to a crawler? Or typing gibberish in the token form. | ||||
* | | Merge branch 'martin-python-readme-es-note' into 'master' | bnewbold | 2019-11-08 | 1 | -0/+5 |
|\ \ | | | | | | | | | | | | | mention elasticsearch empty index setup See merge request webgroup/fatcat!3 | ||||
| * | | mention elasticsearch empty index setup | Martin Czygan | 2019-11-08 | 1 | -0/+5 |
| |/ | | | | | | | | | | | When setting up with the defaults, all works fine, except that the web search will try to access a local elasticsearch. Mention in README, how to create empty indices. | ||||
* | | crossref: accurate blank title counts | Bryan Newbold | 2019-11-05 | 1 | -0/+1 |
| | | |||||
* | | fix crossref component test | Bryan Newbold | 2019-11-04 | 1 | -1/+1 |
| | | |||||
* | | crossref: component type | Bryan Newbold | 2019-11-04 | 1 | -1/+3 |
| | | |||||
* | | crossref: count why skip happened | Bryan Newbold | 2019-11-04 | 1 | -1/+7 |
| | | | | | | | | | | | | Might skip based on release type (eg container, not a paper/release), or missing title, or other reasons. Over 7 million DOIs are getting skipped, curious why. | ||||
* | | crossref: don't skip on short/null subtitle | Bryan Newbold | 2019-11-04 | 1 | -1/+1 |
|/ | | | | This was a bug. Should only set subtitle black, not skip the import. | ||||
* | commit file cleaner tests | Bryan Newbold | 2019-10-08 | 1 | -0/+58 |
| | |||||
* | file cleanup tweaks to actually run | Bryan Newbold | 2019-10-08 | 2 | -5/+4 |
| | |||||
* | refactor duplicated b32_hex function in importers | Bryan Newbold | 2019-10-08 | 3 | -21/+11 |
| | |||||
* | dict wrapper for entity_from_json() | Bryan Newbold | 2019-10-08 | 2 | -3/+7 |
| | |||||
* | new cleanup python tool/framework | Bryan Newbold | 2019-10-08 | 5 | -0/+300 |
| | |||||
* | redirect direct entity underscore links | Bryan Newbold | 2019-10-03 | 2 | -0/+30 |
| | |||||
* | webface: extra <br> in container lookup links | Bryan Newbold | 2019-09-21 | 1 | -1/+1 |
| | |||||
* | remove duplicate style ref in container edit view | Bryan Newbold | 2019-09-20 | 1 | -5/+0 |
| | |||||
* | review/fix all confluent-kafka produce code | Bryan Newbold | 2019-09-20 | 6 | -27/+75 |
| | |||||
* | small fixes to confluent-kafka importers/workers | Bryan Newbold | 2019-09-20 | 8 | -26/+69 |
| | | | | | | | | - decrease default changelog pipeline to 5.0sec - fix missing KafkaException harvester imports - more confluent-kafka tweaks - updates to kafka consumer configs - bump elastic updates consumergroup (again) | ||||
* | update Pipfile.lock after confluent-kafka rebase | Bryan Newbold | 2019-09-20 | 1 | -1/+33 |
| | |||||
* | convert pipeline workers from pykafka to confluent-kafka | Bryan Newbold | 2019-09-20 | 3 | -125/+230 |
| | |||||
* | small kafka tweaks for robustness | Bryan Newbold | 2019-09-20 | 2 | -0/+5 |
| | |||||
* | convert importers to confluent-kafka library | Bryan Newbold | 2019-09-20 | 2 | -21/+74 |
| | |||||
* | bump max message size to ~20 MBytes | Bryan Newbold | 2019-09-20 | 2 | -0/+2 |
| | |||||
* | fixes to confluent-kafka harvesters | Bryan Newbold | 2019-09-20 | 3 | -20/+21 |
| | |||||
* | first draft harvesters using confluent-kafka | Bryan Newbold | 2019-09-20 | 3 | -48/+104 |
| | |||||
* | make default kafka env 'dev', not 'qa' | Bryan Newbold | 2019-09-20 | 2 | -4/+4 |
| | |||||
* | add confluent-kafka library (to replace pykafka) | Bryan Newbold | 2019-09-20 | 1 | -0/+1 |
| | |||||
* | handle more external identifiers in python | Bryan Newbold | 2019-09-18 | 2 | -14/+101 |
| | | | | | This makes it possible to, eg, past an arxiv identifier or SHA-1 hash in the general search box and do a quick lookup. |