# COCI Notes * [https://opencitations.net/download](https://opencitations.net/download) * [https://figshare.com/articles/dataset/Crossref_Open_Citation_Index_CSV_dataset_of_all_the_citation_data/6741422/9](https://figshare.com/articles/dataset/Crossref_Open_Citation_Index_CSV_dataset_of_all_the_citation_data/6741422/9) > 6741422v9.zip [19G] > Dump created on 2020-12-07. This dump includes information on: * 60,778,357 bibliographic resources; * 759,516,507 citation links. ``` extracted/2020-06-13T18_18_05_1-2.zip extracted/2020-08-20T18_12_28_1-2.zip extracted/2020-04-25T04_48_36_1-5.zip extracted/2020-11-22T17_48_01_1-3.zip extracted/2020-01-13T19_31_19_1-4.zip extracted/2019-10-21T22_41_20_1-63.zip ``` * extracted to 79 CSV files Raw data example. ``` oci,citing,cited,creation,timespan,journal_sc,author_sc 02003080406360106010101060909370200010237070005020502-02001000106361937231430122422370200000837000737000200,10.3846/16111699.2012.705252,10.1016/j.neucom.2008.07.020,2012-10-04,P3Y0M,no,no 02003080406360106010101060909370200010237070005020502-0200308040636010601016301060909370200000837093701080963010908,10.3846/16111699.2012.705252,10.3846/1611-1699.2008.9.189-198,2012-10-04,P4Y0M4D,yes,no 02003080406360106010101060909370200010237070005020502-02001000106361937102818141224370200000737000237000003,10.3846/16111699.2012.705252,10.1016/j.asieco.2007.02.003,2012-10-04,P5Y6M,no,no 02003080406360106010101060909370200010237070005020502-02003080406360106010101060909370200010137050505030808,10.3846/16111699.2012.705252,10.3846/16111699.2011.555388,2012-10-04,P1Y5M22D,yes,no ... ``` For comparison, we need also a DOI-DOI matching list. Example approach: * extract source-target release ident, sort by source ident * from fatcat db dump, extract source id and ext ids, sort by source ident * "zip together" Unify CSV files: ``` $ zstdcat -T0 6741422v9.csv.zst | wc -l 759516506 ``` Nomenclature: * citing = source * cited = target Example: ``` 10.3846/16111699.2012.720591,10.1016/0024-6301(96)00041-6 ``` > citing: 10.3846/16111699.2012.720591, https://fatcat.wiki/release/52znjflg2bdd5h2q2icu3zjhki > cited: 10.1016/0024-6301(96)00041-6, https://fatcat.wiki/release/mz6dkakhknd47h3skd7ttomwga ``` $ curl -s "localhost:9200/fatcat_ref_v02_20210716/_search?q=source_release_ident:52znjflg2bdd5h2q2icu3zjhki+AND+target_release_ident:mz6dkakhknd47h3skd7ttomwga" | jq . { "took": 259, "timed_out": false, "_shards": { "total": 6, "successful": 6, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 32.16953, "hits": [ { "_index": "fatcat_ref_v02_20210716", "_type": "_doc", "_id": "52znjflg2bdd5h2q2icu3zjhki_2", "_score": 32.16953, "_source": { "indexed_ts": "2021-07-10T12:04:57Z", "match_provenance": "crossref", "match_reason": "doi", "match_status": "exact", "ref_index": 2, "ref_key": "cit0005", "source_release_ident": "52znjflg2bdd5h2q2icu3zjhki", "source_work_ident": "76yenkekovfh5bnvuxwvtvxy5q", "source_year": "2014", "target_release_ident": "mz6dkakhknd47h3skd7ttomwga", "target_work_ident": "um37w3kdcnhqvnp5jeh3mvhumy" } } ] } } ```