blob: 470827ac9540b1d97622401df3765164d4341275 (
plain)
| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
 | 
File Ingest Mode: 'src'
=======================
Ingest type for "source" of works in document form. For example, tarballs of
LaTeX source and figures, as published on arxiv.org and Pubmed Central.
For now, presumption is that this would be a single file (`file` entity in
fatcat).
Initial mimetypes to allow:
- text/x-tex
- application/xml
- application/gzip
- application/x-bzip
- application/x-bzip2
- application/zip
- application/x-tar
- application/msword
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
## Fatcat Changes
In the file importer, allow the additional mimetypes for 'src' ingest.
Might keep ingest disabled on the fatcat side, at least initially. Eg, until
there is some scope of "file scope", or other ways of treating 'src' tarballs
separate from PDFs or other fulltext formats.
## Ingest Changes
Allow additional terminal mimetypes for 'src' crawls.
## Examples
    arxiv:2109.00954v1
    fatcat:release_akzp2lgqjbcbhpoeoitsj5k5hy
    https://arxiv.org/format/2109.00954v1
    https://arxiv.org/e-print/2109.00954v1
    arxiv:1912.03397v2
    https://arxiv.org/format/1912.03397v2
    https://arxiv.org/e-print/1912.03397v2
    NOT: https://arxiv.org/pdf/1912.03397v2
    pmcid:PMC3767916
    https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/08/03/PMC3767916.tar.gz
For PMC, will need to use one of the .csv file lists to get the digit prefixes.
 |