public Search appliance custom queries and query grammar

Contents [-]

Define custom query

There are two ways to instruct the search appliance to force custom query:
  • through the API
  • through the search box, when the query starts with q>

Custom query special symbols

Symbol Meaning
AND AND operator
OR OR operator (can be omitted)
NOT NOT operator
() brackets; used to group expressions
"" quotes; used for exact phrase search
^boost_value boost operator to boost a term (the boost_value is a positive floating point number)
[] square brackets; - for stored value search (see below)
; separator for values in multi-value queries (see below)
.. separator for range-value queries (see below)

Defining values and value types

Implemented types: int (date), text

For the DateTime to be parsed correctly and for smaller index size it is converted in "int". This integer represents the number of days from 1900-01-01 until the date that we want to represent.

Supported kinds of queries:

value definition/th> description
[x] - single value query
[x;y;z] - multi-value query
[x..y] - range value query

Index fields

name type
url text
title text
crawler_name text
crawl_date int (date)
content text - the text with encoding base64
crawl_date int (date)
filetype text - the file extension. Empty, if no extension
filesize int - size in KB. Less than 1 KB is recorded as 1KB

Examples for searching in the search textbox

General queries

Example Description
term1 term2 search for term1 OR term2
term1 OR term2 the same as above
term1 AND term2 search for term1 and term2
"term1 term2" search for the phrase "term1 term2"

Custom queries

Example Description
q>term1 term2 search for term1 OR term2
q>term1 OR term2 the same as above
q>title:term1 search for term1 in the indexed field named "title"
q>title:(term1 AND term2) group search for the whole expression in brackets in the field named "title"
q>title:(term1 term2) term1 term2 whenever a field name is not specified, the default one is assumed, which is "content". So, this query translates like:

search for term1 OR term2 in the "title" OR term1 OR term2 in the "content".

Another way to write it is: title:(term1 OR term2) OR content:(term1 OR term2)
q>(term1)^boost term2 boost operator to boost a term (the boost is a positive floating point number)
q>crawl_date: [40282] select all documents, which content is current to the date '16 April 2010'
q>filesize:[1500..3000000] select all documents that have the int value between 1500 KB and 3000000 KB
q>filetype:[pdf] select all documents that have the .pdf file extension

All of the above can be mixed to create complex queries, e.g.:

(title:("evolution" OR "kate" OR selene)^10 OR content:(underworld movie characters)) AND crawl_date:[40000..40282] AND filesize:[1000..5000]

Note on the NOT operator

The NOT can be used only in conjunction with the AND operator, for example:

term1 AND NOT term2 - select all docs that have "term1" in the "content" field, but not "term2"

(term1 OR NOT term2) - this is forbidden, as it has no meaning.

Last edited by nadia on 27 Oct 2010 | Rev. 1 | This page is public | Views: 1
Comments: 0 | Filed under: root | Tags: