# Data issues specifically in Citation Graph ## Occurence Download * date: 2021-10-28 * status: open * example: https://fatcat.wiki/release/ns4v2jvhgbhh7mbg45bjtpzway/refs-in Symptom: Many datasets pointing to a publication; e.g. all having "Occurrence Download" as title Possible mitigation: * [ ] extract all titles from fatcat * [ ] find most common titles, decide if it should be blacklisted for citation graph * [ ] keep blacklist of release ident to ignore in edges * [ ] filter refcat, remove edges with blacklisted id as source (and target) ## Repeated entries * date: 2021-04-19 * status: solved * example: https://fatcat.wiki/release/lcarb5rg5vf3tk4hpvosja5sm4/refs-out A DOI seems to be using the key, which leads to repeated entries. > 2021-07-02: Solved, kind of. We get rid of various duplicates in a > post-processing step. It would still be better to not generate these in the > first place. ## Self references * date: 2021-04-19 * status: solved * example: https://fatcat.wiki/release/3fcp4pk7nfamvkbjekqam24bfq/refs-out The source and target seem to be the same. > 2021-07-02: Solved in post-processing, for now. ## Duplicated Edges * date: 2021-04-20 * status: solved * example: https://fatcat.wiki/release/22222736evcc7kdn3bleua3fge/refs-out * found 16/1M Source and target are the same, maybe DOI with ref key? > 2021-07-02: Solved in post-processing, for now.