blob: e450c8d42173e43fdb3989921fec9fc29e2ad0ba (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
# Data issues specifically in Citation Graph
## Occurence Download
* date: 2021-10-28
* status: open
* example: https://fatcat.wiki/release/ns4v2jvhgbhh7mbg45bjtpzway/refs-in
Symptom: Many datasets pointing to a publication; e.g. all having "Occurrence Download" as title
Possible mitigation:
* [ ] extract all titles from fatcat
* [ ] find most common titles, decide if it should be blacklisted for citation graph
* [ ] keep blacklist of release ident to ignore in edges
* [ ] filter refcat, remove edges with blacklisted id as source (and target)
## Repeated entries
* date: 2021-04-19
* status: solved
* example: https://fatcat.wiki/release/lcarb5rg5vf3tk4hpvosja5sm4/refs-out
A DOI seems to be using the key, which leads to repeated entries.
> 2021-07-02: Solved, kind of. We get rid of various duplicates in a
> post-processing step. It would still be better to not generate these in the
> first place.
## Self references
* date: 2021-04-19
* status: solved
* example: https://fatcat.wiki/release/3fcp4pk7nfamvkbjekqam24bfq/refs-out
The source and target seem to be the same.
> 2021-07-02: Solved in post-processing, for now.
## Duplicated Edges
* date: 2021-04-20
* status: solved
* example: https://fatcat.wiki/release/22222736evcc7kdn3bleua3fge/refs-out
* found 16/1M
Source and target are the same, maybe DOI with ref key?
> 2021-07-02: Solved in post-processing, for now.
|