1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
# cgraph
Scholarly citation graph related code; maintained by [martin@archive.org](mailto:martin@archive.org).
* python: mostly luigi tasks
* skate: various Go tools
Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21)
# Grant related tasks
* [ ] Link PID or DOI to archived versions
* [ ] URLs in corpus linked to best possible timestamp (GWB)
* [ ] Harvest all URLs in citation corpus
* [ ] Links between records w/o DOI (fuzzy matching)
* [ ] Publication of augmented citation graph, explore data mining, etc.
* [ ] Interlinkage with other source, monographs, commercial publications, etc.
* [ ] Wikipedia (en) references metadata or archived record
* [ ] Metadata records for often cited non-scholarly web publications
* [ ] Collaborations: I4OC, wikicite
# Current status
```
$ refcat.pyz BiblioRefV2
```
* schema: [https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas](https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas)
* matches via: doi, arxiv, pmid, pmcid, fuzzy title matches
* 717,435,777 edges (94% of open citation/crossref), 37G compressed, ~260G uncompressed
# Rough Notes
* [python/notes/version_0.md](python/notes/version_0.md)
* [python/notes/version_1.md](python/notes/version_1.md)
* [python/notes/version_2.md](python/notes/version_2.md)
* [python/notes/version_3.md](python/notes/version_3.md)
|