Search appliance custom queries, grammar and commands
How to create a custom query
There are two ways to instruct the search appliance to force custom query:
- through the API
- through the search box, when the query starts with q>
Custom query special symbols
Symbol | Meaning |
AND |
AND operator |
OR |
OR operator (can be omitted) |
NOT |
NOT operator |
() |
brackets; used to group expressions |
"" |
quotes; used for exact phrase search |
^boost_value |
boost operator to boost a term (the boost_value is a positive floating point number) |
[] |
square brackets; - for stored value search (see below) |
; |
separator for values in multi-value queries (see below) |
.. |
separator for range-value queries (see below) |
Defining values and value types
Implemented types: int (date), text
For the DateTime to be parsed correctly and for smaller index size it is converted in "int". This integer represents the number of days from 1900-01-01 until the date that we want to represent.
Supported kinds of queries:
value definition/th> | description |
[x] |
- single value query |
[x;y;z] |
- multi-value query |
[x..y] |
- range value query |
Index fields
name | type |
url |
text |
title |
text |
crawler_name |
text |
crawl_date |
int (date) |
content |
text - the text with encoding base64 |
crawl_date |
int (date) |
filetype |
text - the file extension. Empty, if no extension |
filesize |
int - size in KB. Less than 1 KB is recorded as 1KB |
Examples for searching in the search textbox
General queries
Example | Description |
term1 term2 |
search for term1 OR term2 |
term1 OR term2 |
the same as above |
term1 AND term2 |
search for term1 and term2 |
"term1 term2" |
search for the phrase "term1 term2" |
Custom queries
Example | Description |
q>term1 term2 |
search for term1 OR term2 |
q>term1 OR term2 |
the same as above |
q>title:term1 |
search for term1 in the indexed field named "title" |
q>title:(term1 AND term2) |
group search for the whole expression in brackets in the field named "title" |
q>title:(term1 term2) term1 term2 |
whenever a field name is not specified, the default one is assumed, which is "content". So, this query translates like:
search for term1 OR term2 in the "title" OR term1 OR term2 in the "content".
Another way to write it is: title:(term1 OR term2) OR content:(term1 OR term2) |
q>(term1)^boost term2 |
boost operator to boost a term (the boost is a positive floating point number) |
q>crawl_date: [40282] |
select all documents, which content is current to the date '16 April 2010' |
q>filesize:[1500..3000000] |
select all documents that have the int value between 1500 KB and 3000000 KB |
q>filetype:[pdf] |
select all documents that have the .pdf file extension |
All of the above can be mixed to create complex queries, e.g.:
(title:("evolution" OR "kate" OR selene)^10 OR content:(underworld movie characters)) AND crawl_date:[40000..40282] AND filesize:[1000..5000]
Note on the NOT operator
The NOT can be used only in conjunction with the AND operator, for example:
term1 AND NOT term2 - select all docs that have "term1" in the "content" field, but not "term2"
(term1 OR NOT term2) - this is forbidden, as it has no meaning.
Crawler related external commands
robots.txt
ToDO
ROBOTS meta tag
ToDO
NOINDEX html comment
ToDO