1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
|
# refcat (wip)
Citation graph related tasks.
* compagnon project: [skate](https://git.archive.org/martin/cgraph/-/tree/master/skate)
Objective: Given data about
[releases](https://guide.fatcat.wiki/entity_release.html) and references derive
various artifacts, e.g.:
* a citation graph; nodes are releases and an edge is a citation (currently,
this graph has about 50M nodes and 870M edges)
* a list of referenced entities, like ISSN (container), ISBN (book), URL
(webpage), datasets (by URL, DOI, name, ...)
## Ongoing Notes
* [notes/version_0.md](version 0) (id only)
* [notes/version_1.md](version 1) (id plus title)
* [notes/version_2.md](version 2) (v1, full schema)
* [notes/version_3.md](version 3) (v2, unstructured)
## Deployment
We are testing a zipapp based deployment (20s for packaging into a 10MB zip
file, and copying to target).
Caveat: The development machine needs the same python version (e.g. 3.7) as the
target, e.g. for native dependencies. It is relatively easy to have multiple
versions of Python available with [pyenv](https://github.com/pyenv/pyenv).
```
$ make refcat.pyz && rsync -avP refcat.pyz user@host:/usr/local/bin
```
On the target you can call (first run will be slower, e.g. 4s, subsequent runs
at around 1s startup time).
```
$ refcat.pyz
____ __
________ / __/________ _/ /_
/ ___/ _ \/ /_/ ___/ __ `/ __/
/ / / __/ __/ /__/ /_/ / /_
/_/ \___/_/ \___/\__,_/\__/
Command line entry point for running various data tasks.
General usage:
$ refcat TASK
BASE: /bigger/.cache
BiblioRef KeyDistribution RefsFatcatSortedKeys
BiblioRefFromJoin RefCounter RefsFatcatTitleLowerJoin
BiblioRefFuzzy Refcat RefsKeyStats
CommonDOIs RefsArxiv RefsPMCID
CommonTitles RefsDOIs RefsPMID
CommonTitlesLower RefsDOIsLower RefsReleasesMerged
FatcatArxiv RefsFatcatArxivJoin RefsTitleFrequency
FatcatDOIs RefsFatcatClusterVerify RefsTitles
FatcatDOIsLower RefsFatcatClusters RefsTitlesLower
FatcatPMCID RefsFatcatDOIJoin RefsToRelease
FatcatPMID RefsFatcatGroupJoin ReleaseExportExpanded
FatcatTitles RefsFatcatPMCIDJoin URLList
FatcatTitlesLower RefsFatcatPMIDJoin URLTabs
Input RefsFatcatRanked
```
|