Fatcat "Chocula" Journal Metadata Summary

This report is auto-generated from a sqlite database file, which should be available/included.

datetime('now')
2020-07-09 02:41:48
QUERY: SELECT datetime('now');

Note that pretty much all of the fatcat release stats are on a release, not work basis, so there may be over-counting. Also, as of July 2019 there were over 1.5 million OA longtail releases which are not linked to a container (journal).

seq name file
0 main /home/bnewbold/code/chocula/chocula.sqlite
QUERY: PRAGMA database_list;

Overview

Top publishers by journal count:

publisher COUNT(*)
50321
Elsevier 4909
Springer 3180
Taylor & Francis 3049
John Wiley & Sons, Inc 2325
SAGE Publications 1442
J-STAGE 1406
Peter Lang International Academic Publishers 1356
SciELO 1188
Informa UK (Taylor & Francis) 738
Springer-Verlag 707
Cambridge University Press 598
Walter de Gruyter GmbH 553
Georg Thieme Verlag KG 515
OMICS Publishing Group 497
IEEE, Inc 483
Medknow Publications 473
JSTOR 469
Oxford University Press 461
Hindawi 456
Bentham Science 445
De Gruyter Open Sp. z o.o. 442
Wolters Kluwer Health 427
CAIRN 416
Egypts Presidential Specialized Council for Education and Scientific Research 402
QUERY: SELECT publisher, COUNT(*)
FROM journal
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 25;

Top countries by number of journals:

country COUNT(*)
us 31066
12064
id 10204
de 8220
in 7491
gb 7358
fr 6947
uk 5988
nl 5579
br 4783
QUERY: SELECT  country,
COUNT(*)
FROM journal
GROUP BY country
ORDER BY COUNT(*) DESC
LIMIT 10;

.. by number of papers:

country COUNT(*) SUM(release_count)
us 31066 32295341
gb 7358 13562766
nl 5579 11027952
de 8220 7555626
jp 3924 5538593
uk 5988 4672277
fr 6947 2211672
ch 2184 1973488
ru 3322 1447208
in 7491 1206194
QUERY: SELECT  country,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY country
ORDER BY SUM(release_count) DESC
LIMIT 10;

Top languages by number of journals:

lang COUNT(*)
119624
en 33516
fr 2545
es 1986
pt 1278
id 810
fa 705
de 687
ja 627
ru 455
QUERY: SELECT  lang,
COUNT(*)
FROM journal
GROUP BY lang
ORDER BY COUNT(*) DESC
LIMIT 10;

... by number of papers:

lang COUNT(*) SUM(release_count)
en 33516 52300274
119624 39741739
de 687 1081202
ja 627 701129
fr 2545 460774
es 1986 328252
pt 1278 265893
ru 455 236151
it 365 107943
id 810 64503
QUERY: SELECT  lang,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY lang
ORDER BY SUM(release_count) DESC
LIMIT 10;

Fatcat Fulltext Coverage

Fulltext coverage by publisher type:

publisher_type AVG(ia_frac) AVG(preserved_frac) journal_count paper_count
big5 0.20272243401354267 0.7452170162730414 14358 39085852
society 0.3794377091020545 0.5217607884865395 11622 17531161
0.28508193524461156 0.39013178488749245 76634 17473701
unipress 0.5255733733184424 0.710093694843418 8238 6014361
commercial 0.3310478193432686 0.6774676406235399 5908 5794436
longtail 0.6956529253337449 0.7448822575700694 42814 5612002
repository 0.12340474682980347 0.24038109564241453 765 1037428
scielo 0.8187897515991598 0.8475283307664541 1589 935919
other 0.1772382118883935 0.629416843982798 956 852087
archive 0.32858477978955963 0.9868821710035494 543 727385
oa 0.7674000144839934 0.8021162061188577 1854 669492
QUERY: SELECT  publisher_type,
AVG(ia_frac),
AVG(preserved_frac),
COUNT(*) AS journal_count,
SUM(release_count) AS paper_count
FROM journal
GROUP BY publisher_type
ORDER BY SUM(release_count) DESC;

Top publishers with very little coverage:

publisher journal_count AVG(ia_frac)
9641 0.0017593496094944628
Elsevier 1891 0.017255730382637596
Taylor & Francis 1041 0.026388684277766576
J-STAGE 1000 0.008786544945748286
John Wiley & Sons, Inc 763 0.021697556560790764
Informa UK (Taylor & Francis) 586 0.01002503920645208
SAGE Publications 568 0.018712710467758988
Springer-Verlag 385 0.015267708054728164
Springer 359 0.025752583122217714
JSTOR 270 0.010432032891517822
QUERY: SELECT  publisher,
COUNT(*) AS journal_count,
AVG(ia_frac)
FROM journal
WHERE ia_frac < 0.05
GROUP BY publisher
ORDER BY journal_count DESC
LIMIT 10;

Amount of fulltext by SHERPA/ROMEO journal color::

sherpa_color SUM(ia_count)
8203410
blue 1071423
green 10304362
white 732457
yellow 2490476
QUERY: SELECT  sherpa_color,
SUM(ia_count)
FROM journal
GROUP BY sherpa_color;

Journal Homepages

Homepage URL counts:

unique_urls journals_with_hompages
188588 118879
QUERY: SELECT COUNT(DISTINCT surt) as unique_urls, COUNT(DISTINCT issnl) as journals_with_hompages FROM homepage;

Journal counts by homepage status:

any_homepage any_live_homepage any_gwb_homepage COUNT(*) frac
0 0 0 46402 0.28
1 0 0 12882 0.08
1 0 1 10266 0.06
1 1 0 8721 0.05
1 1 1 87010 0.53
QUERY: SELECT any_homepage, any_live_homepage, any_gwb_homepage, COUNT(*), ROUND(1.0 * COUNT(*) / (SELECT COUNT(*) FROM journal), 2) AS frac FROM journal GROUP BY any_homepage, any_live_homepage, any_gwb_homepage;

Number of unique journals that have a homepage pointing to wayback or archive.org:

COUNT(DISTINCT issnl)
1453
QUERY: SELECT COUNT(DISTINCT issnl) FROM homepage WHERE domain = 'archive.org';

Journals with the most homepage URLs:

issnl COUNT(*)
0036-6439 21
1487-0614 16
2375-0383 16
2374-4030 15
0097-6326 14
0749-405X 13
1521-9097 13
0009-7004 12
0030-7076 12
0717-554X 12
QUERY: SELECT  issnl,
COUNT(*)
FROM homepage
GROUP BY issnl
ORDER BY COUNT(*) DESC
LIMIT 10;

Top/redundant URLs and SURTs:

surt COUNT(*)
com,indianjournals)/ 80
com,hindawi)/ 71
au,com,informit,search)/search;res=apaft 64
com,umi)/pqdauto 51
org,rsc,pubs)/en/ebooks 50
com,umi)/proquest 48
org,ieee,ieeexplore)/xplore/conferences.jsp 40
org,omicsonline)/ 37
com,idealibrary)/ 36
com,wiley,interscience)/ 31
QUERY: SELECT  surt,
COUNT(*)
FROM homepage
GROUP BY surt
ORDER BY COUNT(*) DESC
LIMIT 10;

What is the deal with all those "benjamins" URLs?

publisher name
John Benjamins Publishing Company NOWELE
Studia Uralo-Altaica
John Benjamins Publishing Company Language Problems and Language Planning
John Benjamins Publishing Company Lingvisticæ investigationes
John Benjamins Publishing Company Linguistics of the TIbeto-Burman Area
John Benjamins Publishing Company Pragmatics & Cognition
John Benjamins Publishing Company Terminology
John Benjamins Publishing Company Written Language & Literacy
FORUM: Revue internationale d?interprétation et de traduction / International Journal of Interpretation and Translation
John Benjamins Publishing Company English Text Construction
John Benjamins Publishing Company Constructions and Frames
John Benjamins Publishing Company Pragmatics and Society
John Benjamins Publishing Company Translation and Interpreting Studies
John Benjamins Publishing Company Language and Dialogue
John Benjamins Publishing Company Metaphor in Language, Cognition, and Communication
Hamburg Studies on Linguistic Diversity
John Benjamins Publishing Company Translation Spaces
Studies in Arabic Linguistics
John Benjamins Publishing Company Journal of Immersion and Content-Based Language Education (JICB)
Children's Literature, Culture, and Cognition
John Benjamins Publishing Company Journal of Language Aggression and Conflict
FILLM Studies in Languages and Literatures
Advances in Historical Sociolinguistics
John Benjamins Publishing Company Linguistic Landscape
John Benjamins Publishing Company International Journal of Learner Corpus Research
John Benjamins Publishing Company Journal of Second Language Pronunciation
ITL - International Journal of Applied Linguistics
John Benjamins Publishing Company Cognitive Individual Differences in Second Language Processing and Acquisition
John Benjamins Publishing Company FORUM
John Benjamins Publishing Company Studies in Germanic Linguistics
QUERY: SELECT  publisher,
name
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.surt = 'com,benjamins)/';

Domains that block us:

domain journal_homepages SUM(blocked)
jstor.org 7674 7507
tandfonline.com 4568 4505
wiley.com 4289 721
informahealthcare.com 221 220
brill.nl 234 164
bentham.org 152 149
computer.org 143 64
ucpress.edu 64 59
dekker.com 48 47
uem.br 49 42
maney.co.uk 41 41
ingentaconnect.com 417 31
heldref.org 25 25
amcity.com 23 23
managementjournals.com 19 19
ucpressjournals.com 19 19
ametsoc.org 32 18
mdconsult.com 27 17
ikpress.org 18 16
rodopi.nl 20 16
QUERY: SELECT  domain,
COUNT(*) as journal_homepages,
SUM(blocked)
FROM homepage
GROUP BY domain
ORDER BY SUM(blocked) DESC
LIMIT 20;

Top duplicated domains:

url COUNT(*)
http://www.indianjournals.com/ 73
http://www.hindawi.com/ 70
http://search.informit.com.au/search;res=APAFT 60
http://www.umi.com/proquest 46
http://www.umi.com/pqdauto/ 45
http://ieeexplore.ieee.org/Xplore/conferences.jsp 40
http://omicsonline.org/ 36
http://www.idealibrary.com/ 36
http://ieeexplore.ieee.org/xpl/conferences.jsp 24
http://www.metapress.com/ 24
http://www.randspublications.org/ 22
http://www.studia.ubbcluj.ro/serii/index_en.html 22
http://find.galegroup.com/ips/publicationSearch.do 21
http://jurnal.unimed.ac.id/ 21
http://www.bioinfo.in/journals.php 20
http://www.interscience.wiley.com/ 20
http://www.commongroundpublishing.com/ 19
http://www.haworthpress.com/ 19
http://www.heinonline.org/ 19
http://www.infosci-journals.com/ 19
QUERY: SELECT  url,
COUNT(*)
FROM homepage
GROUP BY url
ORDER BY COUNT(*) DESC
LIMIT 20;

Number of journals with a homepage that points to web.archive.org or archive.org:

COUNT(DISTINCT issnl)
1453
QUERY: SELECT COUNT(DISTINCT issnl)
FROM homepage
WHERE domain = 'archive.org';

Top publishers that have journals in wayback:

publisher COUNT(*)
653
EDP Sciences 23
CAIRN 18
OpenEdition 18
Elsevier 6
Springer 6
PERSEE Program 5
Peer Community In 5
Institut de recherche et d'histoire des textes (France) 4
San Lucas Medical 4
QUERY: SELECT  publisher,
COUNT(*)
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.domain = 'archive.org'
GROUP BY journal.publisher
ORDER BY COUNT(*) DESC
LIMIT 10;

Top publishers by number of journals missing a homepage:

publisher COUNT(*)
21460
Peter Lang International Academic Publishers 1270
Elsevier 876
J-STAGE 864
Egypts Presidential Specialized Council for Education and Scientific Research 354
Georg Thieme Verlag KG 288
Al Manhal FZ, LLC 216
Informa UK (Taylor & Francis) 202
Springer-Verlag 156
ELSEVIER LTD 145
Inderscience 122
African Journals Online 121
Diva Enterprises Private Limited 119
PERSEE Program 118
Sabinet 109
SAGE Publications 103
Brill 99
Superintendent of Government Documents 99
Taylor & Francis 98
Bentham Science 94
QUERY: SELECT  publisher,
COUNT(*)
FROM journal
WHERE any_homepage=0
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 20;