blob: 470827ac9540b1d97622401df3765164d4341275 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
File Ingest Mode: 'src'
=======================
Ingest type for "source" of works in document form. For example, tarballs of
LaTeX source and figures, as published on arxiv.org and Pubmed Central.
For now, presumption is that this would be a single file (`file` entity in
fatcat).
Initial mimetypes to allow:
- text/x-tex
- application/xml
- application/gzip
- application/x-bzip
- application/x-bzip2
- application/zip
- application/x-tar
- application/msword
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
## Fatcat Changes
In the file importer, allow the additional mimetypes for 'src' ingest.
Might keep ingest disabled on the fatcat side, at least initially. Eg, until
there is some scope of "file scope", or other ways of treating 'src' tarballs
separate from PDFs or other fulltext formats.
## Ingest Changes
Allow additional terminal mimetypes for 'src' crawls.
## Examples
arxiv:2109.00954v1
fatcat:release_akzp2lgqjbcbhpoeoitsj5k5hy
https://arxiv.org/format/2109.00954v1
https://arxiv.org/e-print/2109.00954v1
arxiv:1912.03397v2
https://arxiv.org/format/1912.03397v2
https://arxiv.org/e-print/1912.03397v2
NOT: https://arxiv.org/pdf/1912.03397v2
pmcid:PMC3767916
https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/08/03/PMC3767916.tar.gz
For PMC, will need to use one of the .csv file lists to get the digit prefixes.
|