Software Scientific: the competitive edge of intelligence.

Machine Intelligence

Natural Language Queries

Many of Software Scientific's products use natural language queries posed by the user. This page is designed to illustrate what does, and does not, constitute a good 'query'.


General Rules of Thumb

Queries are Triggers

Queries are not interpreted by Software Scientific technology as questions which require a specific answer. That is to say they specify an interest or bias, rather than a question.

No Meta-Content

Queries should not contain 'meta-content'. The query Find me documents on Japanese systems which use Artificial Intelligence would be better phrased simply as Japanese systems which use Artificial Intelligence. What constitutes 'meta-content' can vary: it always includes question 'super-structure' (such as please in Japanese systems which use Artificial Intelligence, please), but can also be words which would not be 'meta-content' in other contexts. For example the word 'patent' in a patent searching system is meta content, but on the World-Wide-Web probably is not.

Lots of Meat

The more useful words there are, the better. Regard the system as a moderately-intelligent but poorly educated student. The more information you can give, the more the system has to work with. Ideally a query should contain at least a dozen words as a complete sentence.

Varied Vocabulary

If you wish to refer to the same item more than once in a query, try to describe it in different ways each time. For example: What are the Prime Minister's engagements for today? Will Tony Blair be visiting the Royal Children's Hospital this afternoon?.

Good English

Generally the system can make more mileage of your query if it is well-written and punctuated. Case can also be useful, so use mixed case. Only capitalise proper nouns, or the beginning of a sentence.


System Specifics

Bullets

The bullets system can use paranthesis to indicate minor words. For example the query John Major, the (former) Prime Minister only uses the word former to determine which of two documents which both discuss John Major should be considered the most relevant. A document which was about a former Derby champion would not be selected at all, since the word former is not in the context of John Major.

Word truncation can be insisted on by using a # at the relevant end, and a * can be used to force wild-carding. Note that the default truncation and wild-carding used depends upon the language and the type of word.

Concept Engine

Concept Engine can be used either in a simple mode, with a straight query, or a more advanced mode, which we consider here.

Advanced queries have four components:

QueryA free-text natural language query describing exactly what you want. For example techniques of piano manufacture in the 19th century.
ExQueryA description of aspects of the query that you are less interested in. For example Bluthner would make documents which described this make of piano be considered as less relevant than other piano documents.
AreaThis describes the general subject domain in which answers to your exact query might be expected to lie. The system uses this as an aid to getting to 'rich' areas, but it is only of minor relevance in the actual ranking of documents. So if your query were Natural cures to insomnia your area might be medicine or herbs.
ExAreaThis describes aspects of the area that you happen to know are not likely to be helpful.

It is important to understand the difference between the query (and ex-query) and area (and ex-area.) The former specify what you want (or don't want) and the latter specify where it might be found (or not found). If they contradict each other, the system will not be able to do such a good job.

Script Search

Script Search does not use natural language queries. The words you enter into the four query boxes should be keywords (generally the most important nouns), with no 'superstructure'. However, Script Search does use fuzzy matching.