# refcat (wip) Citation graph related tasks. * compagnon project: [skate](https://git.archive.org/martin/cgraph/-/tree/master/skate) Objective: Given data about [releases](https://guide.fatcat.wiki/entity_release.html) and references derive various artifacts, e.g.: * a citation graph; nodes are releases and an edge is a citation (currently, this graph has about 50M nodes and 870M edges) * a list of referenced entities, like ISSN (container), ISBN (book), URL (webpage), datasets (by URL, DOI, name, ...) ## Ongoing Notes * [notes/version_0.md](version 0) (id only) * [notes/version_1.md](version 1) (id plus title) * [notes/version_2.md](version 2) (v1, full schema) * [notes/version_3.md](version 3) (v2, unstructured) * [notes/version_4.md](version 4) (v3, extra sources, qa) ## Deployment We are testing a zipapp based deployment (20s for packaging into a 10MB zip file, and copying to target). Caveat: The development machine needs the same python version (e.g. 3.8) as the target, e.g. for native dependencies. It is relatively easy to have multiple versions of Python available with [pyenv](https://github.com/pyenv/pyenv). ``` $ make refcat.pyz && rsync -avP refcat.pyz user@host:/usr/local/bin ``` On the target you can call (first run will be slower, e.g. 4s, subsequent runs at around 1s startup time). ``` $ refcat.pyz ____ __ ________ / __/________ _/ /_ / ___/ _ \/ /_/ ___/ __ `/ __/ / / / __/ __/ /__/ /_/ / /_ /_/ \___/_/ \___/\__,_/\__/ Command line entry point for running various data tasks. $ refcat.pyz [COMMAND | TASK] [OPTIONS] Commands: ls, ll, deps, tasks, files, config, cat, completion To install completion run: $ source <(refcat.pyz completion) VERSION 0.1.3 SETTINGS /home/martin/.config/refcat/settings.ini BASE /magna/refcat TMPDIR /sandcrawler-db/tmp-refcat SHIV_ROOT None Bref OpenLibraryWorksSorted BrefCombined Refcat BrefOpenLibraryZipISBN Refs BrefSortedByWorkID RefsArxiv BrefZipArxiv RefsByWorkID BrefZipDOI RefsDOI BrefZipFuzzy RefsMapped BrefZipOpenLibrary RefsPMCID BrefZipPMCID RefsPMID BrefZipPMID RefsToRelease FatcatArxiv RefsWithUnstructured FatcatDOI RefsWithoutIdentifiers FatcatMapped ReleaseExportExpanded FatcatPMCID ReleaseExportReduced FatcatPMID URLList MAGPapers URLTabs OpenLibraryAuthorMapping URLTabsCleaned OpenLibraryAuthors UnmatchedMapped OpenLibraryDump UnmatchedOpenLibraryMatchTable OpenLibraryEditions UnmatchedRefs OpenLibraryEditionsByWork UnmatchedRefsToRelease OpenLibraryEditionsMapped UnmatchedResolveJournalNames OpenLibraryEditionsToRelease UnmatchedResolveJournalNamesMapped OpenLibraryReleaseMapped WikipediaCitationsMinimalDataset OpenLibraryWorks ``` ## Dependencies ![](notes/deps.png) ## TODO * [ ] wrap up refcat