|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| | Check was happing after the `return True` by mistake, allowing
duplicates in SPN editgroups, and potentially in ingest request
editgroups as well. | 
| | |  | 
| |\  
| | 
| | 
| | 
| | | datacite release links and metadata expansion
See merge request webgroup/fatcat!15 | 
| |/  
|   
|   
|   
|   
|   
| | Small ergonomic changes for datacite releases:
- add a link to live/current datacite metadata (like we do for Crossref)
- expand "extra" metadata fields under 'datacite' dict in metadata view | 
| | |  | 
| | |  | 
| |\  
| | 
| | 
| | 
| | | pipenv updates
See merge request webgroup/fatcat!13 | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | loginpass patches got accepted upstream a while back, so don't need to
pin to a git version
ipython 7.10 seems to have problems installing, so restricting to
earlier 6.x versions | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | This prevents a test exception that presents like:
    tests/transform_csl.py:46:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    fatcat_tools/transforms/csl.py:204: in citeproc_csl
        style_path = get_style_filepath(style)
    .venv/lib/python3.5/site-packages/citeproc_styles/__init__.py:74: in get_style_filepath
        if resource_exists(__name__, independent_style):
    .venv/lib/python3.5/site-packages/pkg_resources/__init__.py:1134: in resource_exists
        return get_provider(package_or_requirement).has_resource(resource_name)
    .venv/lib/python3.5/site-packages/pkg_resources/__init__.py:1404: in has_resource
        return self._has(self._fn(self.module_path, resource_name))
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    self = <pkg_resources.NullProvider object at 0x7f4f38c0bb00>
    path = '/home/bnewbold/code/fatcat/python/.venv/lib/python3.5/site-packages/citeproc_styles/styles/bibtex.csl'
        def _has(self, path):
            raise NotImplementedError(
    >           "Can't perform this operation for unregistered loader type"
            )
    E       NotImplementedError: Can't perform this operation for unregistered loader type | 
| | | 
| | 
| | 
| | 
| | 
| | | This is still manually tweaked. I believe i've bifurcated the source of
the CSL/citeproc_style import error to upgrade of the 'pytest' module.
This commit upgrades all packages except pytest. | 
| |/ |  | 
| |\  
| | 
| | 
| | 
| | | guide fix: code and db uses release_stage
See merge request webgroup/fatcat!12 | 
| |/ |  | 
| |\  
| | 
| | 
| | 
| | | write diagnostic messages to stderr
See merge request webgroup/fatcat!10 | 
| |/  
|   
|   
|   
| | During debugging, it can be helpful to keep stdout (e.g. processing
results) and dignostic messages separate. | 
| |\  
| | 
| | 
| | 
| | | Update EntityImporter docstring.
See merge request webgroup/fatcat!9 | 
| | | |  | 
| | | 
| | 
| | 
| | | I believe the required method is `parse_record`, not `parse`. | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | The common case is the same URL being submitted repeatedly during
testing.
This is only within-editgroup, and per importer (eg, won't work across
spn importer "submitted" editgroups), but is better than nothing. | 
| | | |  | 
| | | 
| | 
| | 
| | 
| | | This is mostly changing ingest_type from 'file' to 'pdf', and adding
'link_source'/'link_source_id', plus some small cleanups. | 
| | | 
| | 
| | 
| | | We really should just use file_meta result or nothing. | 
| | | 
| | 
| | 
| | | Also fix a spurious typo. | 
| | | |  | 
| | | |  | 
| | | |  | 
| | | 
| | 
| | 
| | | As a form of documentation | 
| | | 
| | 
| | 
| | | Based on ingest-file-results importer | 
| | | |  | 
| |/  
|   
|   
|   
| | For use with bots that don't have admin privileges, or where human
follow-up review is desired. | 
| |\  
| | 
| | 
| | 
| | | container-ingest tool
See merge request webgroup/fatcat!8 | 
| | | 
| | 
| | 
| | | Caught by Martin in review; Thanks! | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | --fatcat-api-url is clearer than --host-url
remove unimplemented --debug (copy/paste from webface argparse)
use formater which will display 'default' parameters with --help
Thanks to Martin for pointing out the later, which i've always wanted! | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | This gets rid of some mess error handling code by properly configuring
the elasticsearch client to just not clean up scroll iterators when
accessing the public (prod or qa) search interfaces.
Leaving the scroll state around isn't ideal, so we still delete them if
possible (eg, connecting directly to elasticsearch).
Thanks to Martin for pointing out this solution in review. | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | The intent of this tool is to make it easy to enque ingest requests into
kafka, to be processed by a worker pool and eventually end up inserted
into fatcat (for ingest hits that pass various checks).
As a specific example use-case, we have pretty good coverage of eLife (a
prominent OA publisher), but have missed some publications in the past,
and have a large gap for the year 2019:
  https://fatcat.wiki/container/en4qj5ijrbf5djxx7p5zzpjyoq/coverage
This tool would make it trivial to enqueue all the missing releases to
be crawled.
Future variants on this tool could query for, eg, long-tail OA works. | 
| | | |  | 
| | | |  | 
| | | 
| | 
| | 
| | 
| | | These are low-level and high-level (respectively)
client wrappers for elasticsearch | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | Use --fatcat-api-url instead of (ambiguous) --host-url for commands that
aren't deployed/running via systemd.
TODO: update the other --host-url usage, and either roll-out change
consistently or support the old arg as an alias during cut-over
Use argparse.ArgumentDefaultsHelpFormatter (thanks Martin!)
Add help messages for all sub-commands, both as documentation and as a
way to get argparse to print available commands in a more readable
format. | 
| | | |  | 
| |/  
|   
|   
|   
| | - don't start kafka image until zookeeper is running
- set very liberal "watermarks" for elasticsearch disk monitoring | 
| | |  | 
| | 
| 
| 
| 
| 
| | This was causing 5xx errors in production and qa. Eg, at:
  https://qa.fatcat.wiki/release/aaaaaaaaaaaaarceaaaaaaaaai/history | 
| | |  | 
| | |  | 
| |\  
| | 
| | 
| | 
| | | Basic mocked test for crossref harvester
See merge request webgroup/fatcat!7 | 
| | | |  | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | | producer creation/configuration should be happening in __init__() time,
not 'daily' call.
This specific refactor motivated by mocking out the producer in unit
tests. | 
| | | |  | 
| |\ \  
| |/  
|/|   
| |   
| | | increase max.message.bytes in container
See merge request webgroup/fatcat!5 |