aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: 046ce7a4201dc97418efd3b63957df353c56b0d8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# cgraph

Scholarly citation graph related code; maintained by [martin@archive.org](mailto:martin@archive.org).

* python: mostly luigi tasks
* skate: various Go tools

Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21)

# Grant related tasks

* [ ] Link PID or DOI to archived versions
* [ ] URLs in corpus linked to best possible timestamp (GWB)
* [ ] Harvest all URLs in citation corpus
* [ ] Links between records w/o DOI (fuzzy matching)
* [ ] Publication of augmented citation graph, explore data mining, etc.
* [ ] Interlinkage with other source, monographs, commercial publications, etc.
* [ ] Wikipedia (en) references metadata or archived record
* [ ] Metadata records for often cited non-scholarly web publications
* [ ] Collaborations: I4OC, wikicite

# Current status

```
$ refcat.pyz BiblioRefV2
```

* schema: [https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas](https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas)
* matches via: doi, arxiv, pmid, pmcid, fuzzy title matches
* 717,435,777 edges (94% of open citation/crossref), 37G compressed, ~260G uncompressed

# Rough Notes

* [python/notes/version_0.md](python/notes/version_0.md)
* [python/notes/version_1.md](python/notes/version_1.md)
* [python/notes/version_2.md](python/notes/version_2.md)
* [python/notes/version_3.md](python/notes/version_3.md)