# cgraph Scholarly citation graph related code; maintained by [martin@archive.org](mailto:martin@archive.org). * python: mostly luigi tasks * skate: various Go tools Context: [fatcat](https://fatcat.wiki), "Mellon Grant" (20/21) # Grant related tasks * [ ] Link PID or DOI to archived versions * [ ] URLs in corpus linked to best possible timestamp (GWB) * [ ] Harvest all URLs in citation corpus * [ ] Links between records w/o DOI (fuzzy matching) * [ ] Publication of augmented citation graph, explore data mining, etc. * [ ] Interlinkage with other source, monographs, commercial publications, etc. * [ ] Wikipedia (en) references metadata or archived record * [ ] Metadata records for often cited non-scholarly web publications * [ ] Collaborations: I4OC, wikicite # Current status ``` $ refcat.pyz BiblioRefV2 ``` * schema: [https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas](https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md#schemas) * matches via: doi, arxiv, pmid, pmcid, fuzzy title matches * 717,435,777 edges (94% of open citation/crossref), 37G compressed, ~260G uncompressed # Rough Notes * [python/notes/version_0.md](python/notes/version_0.md) * [python/notes/version_1.md](python/notes/version_1.md) * [python/notes/version_2.md](python/notes/version_2.md) * [python/notes/version_3.md](python/notes/version_3.md)