Chocula Journal Aggregate Stats

datetime('now')
2019-12-26 20:48:12
QUERY: SELECT datetime('now');

seq name file
0 main /home/bnewbold/code/chocula/chocula.2019-12-26.sqlite
QUERY: PRAGMA database_list;

Overview

Top publishers by journal count:

publisher COUNT(*)
43367
Elsevier 4060
Informa UK (Taylor & Francis) 3363
Springer-Verlag 2938
SAGE Publications 1437
Peter Lang International Academic Publishers 1357
Wiley (Blackwell Publishing) 1167
Wiley (John Wiley & Sons) 1083
Walter de Gruyter GmbH 629
Springer (Biomed Central Ltd.) 594
Cambridge University Press 580
Georg Thieme Verlag KG 534
Hindawi Limited 533
OMICS Publishing Group 504
JSTOR 482
Oxford University Press 481
Medknow Publications 474
Emerald (MCB UP ) 470
De Gruyter Open Sp. z o.o. 451
Inderscience Enterprises Ltd 448
Bentham Science 435
CAIRN 415
Institute of Electrical and Electronics Engineers 395
Brill 374
OpenEdition 374
QUERY: SELECT publisher, COUNT(*)
FROM journal
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 25;

Top countries by number of journals:

country COUNT(*)
106970
us 7547
gb 6411
nl 2497
de 2026
id 1645
br 1569
no 1079
es 1069
pl 898
QUERY: SELECT  country,
COUNT(*)
FROM journal
GROUP BY country
ORDER BY COUNT(*) DESC
LIMIT 10;

.. by number of papers:

country COUNT(*) SUM(release_count)
106970 34471001
us 7547 22313191
gb 6411 13028344
nl 2497 8089964
de 2026 4391511
ch 698 1320180
jp 627 1311086
fr 843 905916
br 1569 683462
ca 618 526316
QUERY: SELECT  country,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY country
ORDER BY SUM(release_count) DESC
LIMIT 10;

Top languages by number of journals:

lang COUNT(*)
115227
en 25571
es 738
id 587
pt 560
de 504
fr 420
ja 330
ru 245
it 201
QUERY: SELECT  lang,
COUNT(*)
FROM journal
GROUP BY lang
ORDER BY COUNT(*) DESC
LIMIT 10;

... by number of papers:

lang COUNT(*) SUM(release_count)
en 25571 48034446
115227 41589472
de 504 1061813
ja 330 567233
fr 420 327575
ru 245 166286
es 738 119675
pt 560 116237
it 201 98440
zh 62 45780
QUERY: SELECT  lang,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY lang
ORDER BY SUM(release_count) DESC
LIMIT 10;

Fatcat Fulltext Coverage

Fulltext coverage by publisher type:

publisher_type AVG(ia_frac) AVG(preserved_frac) journal_count paper_count
big5 0.13790374942184175 0.777494354837538 15643 40212427
society 0.26629892659791143 0.4477543353941486 10588 18162939
0.18932165820389008 0.24842923113768445 64146 14187155
unipress 0.33285972356984356 0.5661068841349259 7549 6200069
commercial 0.22752849270250103 0.6925074605186571 6796 5971082
longtail 0.45711865781170435 0.496898741198817 35197 3147895
oa 0.5556916593868192 0.7334655691426084 2581 1343062
repository 0.03318653234900813 0.1597687993504964 751 993044
other 0.1393788759995973 0.6215256674027421 939 862821
archive 0.31004617042048294 0.9824231978239752 597 765813
scielo 0.7876453723874977 0.8017765242492674 405 432115
QUERY: SELECT  publisher_type,
AVG(ia_frac),
AVG(preserved_frac),
COUNT(*) AS journal_count,
SUM(release_count) AS paper_count
FROM journal
GROUP BY publisher_type
ORDER BY SUM(release_count) DESC;

Top publishers with very little coverage:

publisher journal_count AVG(ia_frac)
12489 0.0007462422716968573
Informa UK (Taylor & Francis) 2159 0.018535223377689123
Elsevier 2087 0.015386849708409962
Springer-Verlag 854 0.017150697644092168
SAGE Publications 834 0.01744322992299731
Wiley (Blackwell Publishing) 650 0.019927080180958748
Wiley (John Wiley & Sons) 647 0.01600965663955534
CAIRN 363 0.008910075730899723
Medknow Publications 313 0.007785659242349944
JSTOR 285 0.004924726114898372
QUERY: SELECT  publisher,
COUNT(*) AS journal_count,
AVG(ia_frac)
FROM journal
WHERE ia_frac < 0.05
GROUP BY publisher
ORDER BY journal_count DESC
LIMIT 10;

Amount of fulltext by SHERPA/ROMEO journal color::

sherpa_color SUM(ia_count)
5133222
blue 804197
green 7940396
white 484933
yellow 1805028
QUERY: SELECT  sherpa_color,
SUM(ia_count)
FROM journal
GROUP BY sherpa_color;

Journal Homepages

Homepage URL counts:

unique_urls journals_with_hompages
115819 77673
QUERY: SELECT COUNT(DISTINCT surt) as unique_urls, COUNT(DISTINCT issnl) as journals_with_hompages FROM homepage;

Journals with the most homepage URLs:

issnl COUNT(*)
0717-554X 9
0185-2574 8
0328-1205 8
0328-3445 8
0379-8682 8
0717-5906 8
1246-7405 8
1415-6555 8
1641-876X 8
1669-2381 8
QUERY: SELECT  issnl,
COUNT(*)
FROM homepage
GROUP BY issnl
ORDER BY COUNT(*) DESC
LIMIT 10;

Top/redundant URLs and SURTs:

surt COUNT(*)
org,rsc,pubs)/en/ebooks 47
com,benjamins)/ 29
ro,ubbcluj,studia)/serii/index_en.html 22
pl,czest,ajd,bg,kernel)/wydawnictwo.php 17
id,ac,unimed,jurnal)/ 12
pl,edu,uwm,wydawnictwo)/artykul/14/czytelnia.html 12
pl,krakow,up,pbc)/dlibra/pubindex?dirids=5 11
org,ecorfan)/bolivia/research_journals.php 10
it,minervamedica)/index2.t 9
kr,or,koreascience)/journal/aboutjournal.jsp 8
QUERY: SELECT  surt,
COUNT(*)
FROM homepage
GROUP BY surt
ORDER BY COUNT(*) DESC
LIMIT 10;

What is the deal with all those "benjamins" URLs?

publisher name
John Benjamins Publishing Company NOWELE
Studia Uralo-Altaica
John Benjamins Publishing Company Language Problems and Language Planning
John Benjamins Publishing Company Lingvisticæ investigationes
John Benjamins Publishing Company Linguistics of the TIbeto-Burman Area
John Benjamins Publishing Company Pragmatics & Cognition
John Benjamins Publishing Company Terminology
John Benjamins Publishing Company Written Language & Literacy
John Benjamins Publishing Company FORUM: Revue internationale d?interprétation et de traduction / International Journal of Interpretation and Translation
John Benjamins Publishing Company English Text Construction
John Benjamins Publishing Company Constructions and Frames
John Benjamins Publishing Company Pragmatics and Society
John Benjamins Publishing Company Translation and Interpreting Studies
John Benjamins Publishing Company Language and Dialogue
John Benjamins Publishing Company Metaphor in Language, Cognition, and Communication
Hamburg Studies on Linguistic Diversity
John Benjamins Publishing Company Translation Spaces
Studies in Arabic Linguistics
John Benjamins Publishing Company Journal of Immersion and Content-Based Language Education (JICB)
Children's Literature, Culture, and Cognition
John Benjamins Publishing Company Journal of Language Aggression and Conflict
FILLM Studies in Languages and Literatures
Advances in Historical Sociolinguistics
John Benjamins Publishing Company Linguistic Landscape
John Benjamins Publishing Company International Journal of Learner Corpus Research
John Benjamins Publishing Company Journal of Second Language Pronunciation
ITL - International Journal of Applied Linguistics
John Benjamins Publishing Company Cognitive Individual Differences in Second Language Processing and Acquisition
John Benjamins Publishing Company Studies in Germanic Linguistics
QUERY: SELECT  publisher,
name
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.surt = 'com,benjamins)/';

Domains that block us:

domain journal_homepages SUM(blocked)
jstor.org 3241 3241
wiley.com 2503 229
brill.nl 222 164
bentham.org 149 149
emeraldgrouppublishing.com 76 76
uem.br 39 34
emeraldinsight.com 401 17
rodopi.nl 19 15
esaunggul.ac.id 11 11
ingentaconnect.com 120 9
ucb.br 11 9
univie.ac.at 45 9
erlbaum.com 9 8
iaster.com 7 7
ucla.edu 32 7
uctjournals.com 7 7
elsevier.com 2861 6
inah.gob.mx 7 5
medicaljournals.se 5 5
mohr.de 13 5
QUERY: SELECT  domain,
COUNT(*) as journal_homepages,
SUM(blocked)
FROM homepage
GROUP BY domain
ORDER BY SUM(blocked) DESC
LIMIT 20;

Top duplicated domains:

url COUNT(*)
http://www.studia.ubbcluj.ro/serii/index_en.html 22
http://jurnal.unimed.ac.id/ 12
http://wydawnictwo.uwm.edu.pl/artykul/14/czytelnia.html 12
https://benjamins.com/ 12
http://pbc.up.krakow.pl/dlibra/pubindex?dirids=5 11
http://www.ecorfan.org/bolivia/research_journals.php 9
http://www.minervamedica.it/index2.t 9
http://www.koreascience.or.kr/journal/AboutJournal.jsp 8
https://www.benjamins.com/ 8
http://dlibra.up.krakow.pl:8080/dlibra/dlibra/collectiondescription?dirids=5 7
http://gesundheitsfoerderung.ch/ueber-uns/downloads.html 6
http://www.ijmra.us/ 6
http://www.inderscience.com/browse/index.php 6
https://www.iospress.nl/ 6
http://kernel.bg.ajd.czest.pl/wydawnictwo.php#FP 5
http://nsd.no/ 5
http://www.duei.de/show.php/de/content/publikationen/giga-focus/giga-focus.html 5
http://www.hottopos.com/revistas.htm 5
http://www.inderscience.com/ 5
http://www.publicaciones.fahce.unlp.edu.ar/ 5
QUERY: SELECT  url,
COUNT(*)
FROM homepage
GROUP BY url
ORDER BY COUNT(*) DESC
LIMIT 20;

Number of journals with a homepage that points to web.archive.org or archive.org:

COUNT(DISTINCT issnl)
164
QUERY: SELECT COUNT(DISTINCT issnl)
FROM homepage
WHERE domain = 'archive.org';

Top publishers that have journals in wayback:

publisher COUNT(*)
39
EDP Sciences 11
PERSEE Program 3
CAIRN 2
Fabula 2
Institut du monde et du développement pour la bonne gouvernance publique 2
ANPAD 1
ANR le Saint-Simonisme 18-21 1
Ad hoc (Rennes) 1
Al Manhal FZ, LLC 1
QUERY: SELECT  publisher,
COUNT(*)
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.domain = 'archive.org'
GROUP BY journal.publisher
ORDER BY COUNT(*) DESC
LIMIT 10;

Top publishers by number of journals missing a homepage:

publisher COUNT(*)
34205
Peter Lang International Academic Publishers 1309
Elsevier 1036
Informa UK (Taylor & Francis) 650
Springer-Verlag 465
OMICS Publishing Group 413
Georg Thieme Verlag KG 357
Wiley (John Wiley & Sons) 330
Egypts Presidential Specialized Council for Education and Scientific Research 266
Science Publishing Group 260
SAGE Publications 256
Al Manhal FZ, LLC 250
Bentham Science 214
Wiley (Blackwell Publishing) 212
Medknow Publications 203
Inderscience Enterprises Ltd 170
African Journals Online 166
Diva Enterprises Private Limited 166
PERSEE Program 142
Scientific Research Publishing, Inc 139
QUERY: SELECT  publisher,
COUNT(*)
FROM journal
WHERE any_homepage=0
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 20;