aboutsummaryrefslogtreecommitdiffstats
path: root/python/README.md
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-03-21 00:36:54 +0100
committerMartin Czygan <martin.czygan@gmail.com>2021-03-21 00:36:54 +0100
commite00e979a8b144231ce16aafe6b8482e4104f5e37 (patch)
tree942af1fbb0eeb71625438a2aaa0b1d783b84db0e /python/README.md
parentc8d9268759f7da1e050658e135fac0c8f0b6fc53 (diff)
downloadrefcat-e00e979a8b144231ce16aafe6b8482e4104f5e37.tar.gz
refcat-e00e979a8b144231ce16aafe6b8482e4104f5e37.zip
initial import of python tasks
Diffstat (limited to 'python/README.md')
-rw-r--r--python/README.md68
1 files changed, 68 insertions, 0 deletions
diff --git a/python/README.md b/python/README.md
new file mode 100644
index 0000000..81db0b0
--- /dev/null
+++ b/python/README.md
@@ -0,0 +1,68 @@
+# refcat (wip)
+
+Citation graph related tasks.
+
+* compagnon repository: [skate](https://github.com/miku/skate)
+
+Objective: Given data about
+[releases](https://guide.fatcat.wiki/entity_release.html) and references derive
+various artifacts, e.g.:
+
+* a citation graph; nodes are releases and an edge is a citation (currently, this graph has about 50M nodes and 870M edges)
+* a list of referenced entities, like ISSN (container), ISBN (book), URL (webpage), datasets (by URL, DOI, name, ...)
+
+## Ongoing Notes
+
+* [notes/version_0.md](version 0) (id only)
+* [notes/version_1.md](version 1) (id plus title)
+* [notes/version_2.md](version 2) (v1, full schema)
+
+## Deployment
+
+We are testing a zipapp based deployment (20s for packaging into a 10MB zip
+file, and copying to target).
+
+Caveat: The development machine needs the same python version (e.g. 3.7) as the
+target, e.g. for native dependencies. It is relatively easy to have multiple
+versions of Python available with [pyenv](https://github.com/pyenv/pyenv).
+
+```
+$ make refcat.pyz && rsync -avP refcat.pyz user@host:/usr/local/bin
+```
+
+On the target you can call (first run will be slower, e.g. 4s, subsequent runs
+at around 1s startup time).
+
+```
+$ refcat.pyz
+
+
+ ____ __
+ ________ / __/________ _/ /_
+ / ___/ _ \/ /_/ ___/ __ `/ __/
+ / / / __/ __/ /__/ /_/ / /_
+/_/ \___/_/ \___/\__,_/\__/
+
+Command line entry point for running various data tasks.
+
+General usage:
+
+ $ refcat TASK
+
+BASE: /bigger/.cache
+
+BiblioRef KeyDistribution RefsFatcatSortedKeys
+BiblioRefFromJoin RefCounter RefsFatcatTitleLowerJoin
+BiblioRefFuzzy Refcat RefsKeyStats
+CommonDOIs RefsArxiv RefsPMCID
+CommonTitles RefsDOIs RefsPMID
+CommonTitlesLower RefsDOIsLower RefsReleasesMerged
+FatcatArxiv RefsFatcatArxivJoin RefsTitleFrequency
+FatcatDOIs RefsFatcatClusterVerify RefsTitles
+FatcatDOIsLower RefsFatcatClusters RefsTitlesLower
+FatcatPMCID RefsFatcatDOIJoin RefsToRelease
+FatcatPMID RefsFatcatGroupJoin ReleaseExportExpanded
+FatcatTitles RefsFatcatPMCIDJoin URLList
+FatcatTitlesLower RefsFatcatPMIDJoin URLTabs
+Input RefsFatcatRanked
+```