aboutsummaryrefslogtreecommitdiffstats
path: root/fatcat_scholar/templates/help.html
blob: f5486b33c0aa1dfe1eeacf92b0e69909765ecceb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
{% extends "base.html" %}

{% macro example_search_box(query) -%}
<form class="" id="" action="{{ lang_prefix }}/search" method="get" role="search" aria-label="papers" itemprop="potentialAction" itemscope itemtype="https://schema.org/SearchAction">
    <meta itemprop="target" content="https://{{ settings.SCHOLAR_DOMAIN }}/fulltext/search?q={q}"/>
    <div class="ui form">
    <div class="ui action input large fluid">
        <input type="search" value="{{ query }}" name="q" aria-label="search metadata" required itemprop="query-input" style="border-radius: 0; border: 1px #999 solid;">
        <button class="ui green button" style="border-radius: 0; background-color: grey; font-size: 1.2rem;">{{ _("Try It") }}</button>
    </div>
    </div>
</form>
<br>
{% endmacro %}

{% block main %}
<h1>Scholar Search User Guide</h1>
<p><i>See also: <a href="{{ lang_prefix }}/about">About Scholarly Search</a></i>

<p>In addition to the basic filtering and sorting options, this search
interface also allows the use of Lucene query syntax in the search box. You can
restrict term queries on multiple metadata fields using colon statements like
<code>journal:Science</code>, set filters like <code>lang:de</code>, and
apply range queries like <code>year:&gt;1989 year:&lt;2000</code>.


<h3>Example Queries</h3>

<p>Search for digitized pages about a topic from specific years:

{{ example_search_box('"egyptian pyramid" access_type:ia_sim year:<2000') }}

<p>Search for papers in Chinese matching a term:

{{ example_search_box('lang:zh 临床表现多样') }}

<p>Conference papers with an author name query:

{{ example_search_box('type:paper-conference author:"natasha noy"') }}

<h3>Details</h3>

<p>A partial list of metadata fields is:

<ul>
  <li>title
  <li>author
  <li>journal
  <li>year
  <li>issue
  <li>volume
  <li>doi
  <li>type (eg, "article-journal", "dataset", "book")
  <li>stage (eg, "published", "submitted", "accepted", "draft")
  <li>lang (value is a 2-character lower-case ISO lanuage code)
  <li>country (value is a 2-character lower-case ISO country code)
  <li>access_type (eg, "wayback", "ia_file", "ia_sim")
  <li>tag
</ul>

<p>You can restrict to records where the field exists with an asterisk like
<code>doi:*</code>, and negate any term like
<code>!type:article-journal</code>.

<p>In-depth documentation of the query syntax is available <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-query-notes">from the open source project</a>. The complete current search document schema can be fetched (in JSON format) <a href="https://search.fatcat.wiki/qa_scholar_fulltext/_mapping">from the search index itself</a>.


<h3>Known Issues</h3>

<p>This project is currently a <i>prototype</i>, with only a limited amount of
content indexed.

<p>Some known bugs and issues:

<ul>
  <li>web.archive.org PDF links sometimes return "not found" errors. This is impacting up to 1% of recent papers. In almost all cases there is a preserved copy of the file that should be available.
  <li>Poor metadata quality for conference proceedings. Many are labeled "unpublished" and are not associated with 
  <li>Duplicate versions of same work. Eg, different versions of the same paper or dataset. We are working on basic entity-deduplication in the fatcat catalog.
  <li>Mis-matching of file content or version with work metadata. For example, sometimes pre-prints or author manuscripts are incorrectly associated with version-of-record metadata, or vica-versa.
</ul>
{% endblock %}