blob: 8076093827fad97611fbb031baa94aa8c65879a3 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
|
# Fatcat Production Import CHANGELOG
This file tracks major content (metadata) imports to the Fatcat production
database (at https://fatcat.wiki). It complements the code CHANGELOG file.
In general, changes that impact more than 50k entities will get logged here;
this file should probably get merged into the guide at some point.
This file should not turn in to a TODO list!
## 2019-12
Inserted about 154k new arxiv release entities. Still no automatic daily
harvesting.
"Save Paper Now" importer running. This bot only *submits* editgroups for
review, doesn't auto-accept them.
## 2019-11
Daily ingest of fulltext for OA releases now enabled. New file entities created
and merged automatically.
## 2019-10
Inserted 1.45m new release entities from Crossref which had been missed during
a previous gap in continuous metadata harvesting.
Updated 304,308 file entities to remove broken
"https://web.archive.org/web/None/*" URLs.
## 2019-09
Created and updated metadata for tens of thousands of containers, using
"chocula" pipeline.
## 2019-08
Merged/fixed roughly 100 container entities with invalid ISSN-L numbers (eg,
invalid ISSN checksum).
## 2019-04
Imported files (matched to releases by DOI) from Semantic Scholar
(`DIRECT-OA-CRAWL-2019` crawl).
Arabesque importer
crawl-bot
`s2_doi.sqlite`
TODO: archive.org link
TODO: rough count
TODO: date
Imported files (matched to releases by DOI) from pre-1923/pre-1909 items uploaded
by a user to archive.org.
Matched importer
internetarchive-bot (TODO:)
TODO: archive.org link
TODO: counts
TODO: date
Imported files (matched to releases by DOI) from CORE.ac.uk
(`DIRECT-OA-CRAWL-2019` crawl).
Imported files (matched to releases by DOI) from the public web (including many
repositories) from the `UNPAYWALL` 2018 crawl.
## 2019-02
Bootstrapped!
|