This report is auto-generated from a sqlite database file, which should be available/included.
datetime('now')
2020-07-09 02:41:48
QUERY: SELECT datetime('now');
Note that pretty much all of the fatcat release stats are on a release, not work basis, so there may be over-counting. Also, as of July 2019 there were over 1.5 million OA longtail releases which are not linked to a container (journal).
seq
name
file
0
main
/home/bnewbold/code/chocula/chocula.sqlite
QUERY: PRAGMA database_list;
Top publishers by journal count:
publisher
COUNT(*)
50321
Elsevier
4909
Springer
3180
Taylor & Francis
3049
John Wiley & Sons, Inc
2325
SAGE Publications
1442
J-STAGE
1406
Peter Lang International Academic Publishers
1356
SciELO
1188
Informa UK (Taylor & Francis)
738
Springer-Verlag
707
Cambridge University Press
598
Walter de Gruyter GmbH
553
Georg Thieme Verlag KG
515
OMICS Publishing Group
497
IEEE, Inc
483
Medknow Publications
473
JSTOR
469
Oxford University Press
461
Hindawi
456
Bentham Science
445
De Gruyter Open Sp. z o.o.
442
Wolters Kluwer Health
427
CAIRN
416
Egypts Presidential Specialized Council for Education and Scientific Research
402
QUERY: SELECT publisher, COUNT(*)
FROM journal
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 25;
Top countries by number of journals:
country
COUNT(*)
us
31066
12064
id
10204
de
8220
in
7491
gb
7358
fr
6947
uk
5988
nl
5579
br
4783
QUERY: SELECT country,
COUNT(*)
FROM journal
GROUP BY country
ORDER BY COUNT(*) DESC
LIMIT 10;
.. by number of papers:
country
COUNT(*)
SUM(release_count)
us
31066
32295341
gb
7358
13562766
nl
5579
11027952
de
8220
7555626
jp
3924
5538593
uk
5988
4672277
fr
6947
2211672
ch
2184
1973488
ru
3322
1447208
in
7491
1206194
QUERY: SELECT country,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY country
ORDER BY SUM(release_count) DESC
LIMIT 10;
Top languages by number of journals:
lang
COUNT(*)
119624
en
33516
fr
2545
es
1986
pt
1278
id
810
fa
705
de
687
ja
627
ru
455
QUERY: SELECT lang,
COUNT(*)
FROM journal
GROUP BY lang
ORDER BY COUNT(*) DESC
LIMIT 10;
... by number of papers:
lang
COUNT(*)
SUM(release_count)
en
33516
52300274
119624
39741739
de
687
1081202
ja
627
701129
fr
2545
460774
es
1986
328252
pt
1278
265893
ru
455
236151
it
365
107943
id
810
64503
QUERY: SELECT lang,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY lang
ORDER BY SUM(release_count) DESC
LIMIT 10;
Fulltext coverage by publisher type:
publisher_type
AVG(ia_frac)
AVG(preserved_frac)
journal_count
paper_count
big5
0.20272243401354267
0.7452170162730414
14358
39085852
society
0.3794377091020545
0.5217607884865395
11622
17531161
0.28508193524461156
0.39013178488749245
76634
17473701
unipress
0.5255733733184424
0.710093694843418
8238
6014361
commercial
0.3310478193432686
0.6774676406235399
5908
5794436
longtail
0.6956529253337449
0.7448822575700694
42814
5612002
repository
0.12340474682980347
0.24038109564241453
765
1037428
scielo
0.8187897515991598
0.8475283307664541
1589
935919
other
0.1772382118883935
0.629416843982798
956
852087
archive
0.32858477978955963
0.9868821710035494
543
727385
oa
0.7674000144839934
0.8021162061188577
1854
669492
QUERY: SELECT publisher_type,
AVG(ia_frac),
AVG(preserved_frac),
COUNT(*) AS journal_count,
SUM(release_count) AS paper_count
FROM journal
GROUP BY publisher_type
ORDER BY SUM(release_count) DESC;
Top publishers with very little coverage:
publisher
journal_count
AVG(ia_frac)
9641
0.0017593496094944628
Elsevier
1891
0.017255730382637596
Taylor & Francis
1041
0.026388684277766576
J-STAGE
1000
0.008786544945748286
John Wiley & Sons, Inc
763
0.021697556560790764
Informa UK (Taylor & Francis)
586
0.01002503920645208
SAGE Publications
568
0.018712710467758988
Springer-Verlag
385
0.015267708054728164
Springer
359
0.025752583122217714
JSTOR
270
0.010432032891517822
QUERY: SELECT publisher,
COUNT(*) AS journal_count,
AVG(ia_frac)
FROM journal
WHERE ia_frac < 0.05
GROUP BY publisher
ORDER BY journal_count DESC
LIMIT 10;
Amount of fulltext by SHERPA/ROMEO journal color::
sherpa_color
SUM(ia_count)
8203410
blue
1071423
green
10304362
white
732457
yellow
2490476
QUERY: SELECT sherpa_color,
SUM(ia_count)
FROM journal
GROUP BY sherpa_color;
Homepage URL counts:
unique_urls
journals_with_hompages
188588
118879
QUERY: SELECT COUNT(DISTINCT surt) as unique_urls, COUNT(DISTINCT issnl) as journals_with_hompages FROM homepage;
Journal counts by homepage status:
any_homepage
any_live_homepage
any_gwb_homepage
COUNT(*)
frac
0
0
0
46402
0.28
1
0
0
12882
0.08
1
0
1
10266
0.06
1
1
0
8721
0.05
1
1
1
87010
0.53
QUERY: SELECT any_homepage, any_live_homepage, any_gwb_homepage, COUNT(*), ROUND(1.0 * COUNT(*) / (SELECT COUNT(*) FROM journal), 2) AS frac FROM journal GROUP BY any_homepage, any_live_homepage, any_gwb_homepage;
Number of unique journals that have a homepage pointing to wayback or archive.org:
COUNT(DISTINCT issnl)
1453
QUERY: SELECT COUNT(DISTINCT issnl) FROM homepage WHERE domain = 'archive.org';
Journals with the most homepage URLs:
issnl
COUNT(*)
0036-6439
21
1487-0614
16
2375-0383
16
2374-4030
15
0097-6326
14
0749-405X
13
1521-9097
13
0009-7004
12
0030-7076
12
0717-554X
12
QUERY: SELECT issnl,
COUNT(*)
FROM homepage
GROUP BY issnl
ORDER BY COUNT(*) DESC
LIMIT 10;
Top/redundant URLs and SURTs:
surt
COUNT(*)
com,indianjournals)/
80
com,hindawi)/
71
au,com,informit,search)/search;res=apaft
64
com,umi)/pqdauto
51
org,rsc,pubs)/en/ebooks
50
com,umi)/proquest
48
org,ieee,ieeexplore)/xplore/conferences.jsp
40
org,omicsonline)/
37
com,idealibrary)/
36
com,wiley,interscience)/
31
QUERY: SELECT surt,
COUNT(*)
FROM homepage
GROUP BY surt
ORDER BY COUNT(*) DESC
LIMIT 10;
What is the deal with all those "benjamins" URLs?
publisher
name
John Benjamins Publishing Company
NOWELE
Studia Uralo-Altaica
John Benjamins Publishing Company
Language Problems and Language Planning
John Benjamins Publishing Company
Lingvisticæ investigationes
John Benjamins Publishing Company
Linguistics of the TIbeto-Burman Area
John Benjamins Publishing Company
Pragmatics & Cognition
John Benjamins Publishing Company
Terminology
John Benjamins Publishing Company
Written Language & Literacy
FORUM: Revue internationale d?interprétation et de traduction / International Journal of Interpretation and Translation
John Benjamins Publishing Company
English Text Construction
John Benjamins Publishing Company
Constructions and Frames
John Benjamins Publishing Company
Pragmatics and Society
John Benjamins Publishing Company
Translation and Interpreting Studies
John Benjamins Publishing Company
Language and Dialogue
John Benjamins Publishing Company
Metaphor in Language, Cognition, and Communication
Hamburg Studies on Linguistic Diversity
John Benjamins Publishing Company
Translation Spaces
Studies in Arabic Linguistics
John Benjamins Publishing Company
Journal of Immersion and Content-Based Language Education (JICB)
Children's Literature, Culture, and Cognition
John Benjamins Publishing Company
Journal of Language Aggression and Conflict
FILLM Studies in Languages and Literatures
Advances in Historical Sociolinguistics
John Benjamins Publishing Company
Linguistic Landscape
John Benjamins Publishing Company
International Journal of Learner Corpus Research
John Benjamins Publishing Company
Journal of Second Language Pronunciation
ITL - International Journal of Applied Linguistics
John Benjamins Publishing Company
Cognitive Individual Differences in Second Language Processing and Acquisition
John Benjamins Publishing Company
FORUM
John Benjamins Publishing Company
Studies in Germanic Linguistics
QUERY: SELECT publisher,
name
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.surt = 'com,benjamins)/';
Domains that block us:
domain
journal_homepages
SUM(blocked)
jstor.org
7674
7507
tandfonline.com
4568
4505
wiley.com
4289
721
informahealthcare.com
221
220
brill.nl
234
164
bentham.org
152
149
computer.org
143
64
ucpress.edu
64
59
dekker.com
48
47
uem.br
49
42
maney.co.uk
41
41
ingentaconnect.com
417
31
heldref.org
25
25
amcity.com
23
23
managementjournals.com
19
19
ucpressjournals.com
19
19
ametsoc.org
32
18
mdconsult.com
27
17
ikpress.org
18
16
rodopi.nl
20
16
QUERY: SELECT domain,
COUNT(*) as journal_homepages,
SUM(blocked)
FROM homepage
GROUP BY domain
ORDER BY SUM(blocked) DESC
LIMIT 20;
Top duplicated domains:
QUERY: SELECT url,
COUNT(*)
FROM homepage
GROUP BY url
ORDER BY COUNT(*) DESC
LIMIT 20;
Number of journals with a homepage that points to web.archive.org or archive.org:
COUNT(DISTINCT issnl)
1453
QUERY: SELECT COUNT(DISTINCT issnl)
FROM homepage
WHERE domain = 'archive.org';
Top publishers that have journals in wayback:
publisher
COUNT(*)
653
EDP Sciences
23
CAIRN
18
OpenEdition
18
Elsevier
6
Springer
6
PERSEE Program
5
Peer Community In
5
Institut de recherche et d'histoire des textes (France)
4
San Lucas Medical
4
QUERY: SELECT publisher,
COUNT(*)
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.domain = 'archive.org'
GROUP BY journal.publisher
ORDER BY COUNT(*) DESC
LIMIT 10;
Top publishers by number of journals missing a homepage:
publisher
COUNT(*)
21460
Peter Lang International Academic Publishers
1270
Elsevier
876
J-STAGE
864
Egypts Presidential Specialized Council for Education and Scientific Research
354
Georg Thieme Verlag KG
288
Al Manhal FZ, LLC
216
Informa UK (Taylor & Francis)
202
Springer-Verlag
156
ELSEVIER LTD
145
Inderscience
122
African Journals Online
121
Diva Enterprises Private Limited
119
PERSEE Program
118
Sabinet
109
SAGE Publications
103
Brill
99
Superintendent of Government Documents
99
Taylor & Francis
98
Bentham Science
94
QUERY: SELECT publisher,
COUNT(*)
FROM journal
WHERE any_homepage=0
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 20;