Our new toolkit for summarising HTML summaries web documents or groups of web documents.
| toolkit | ||
Features | These summaries are aware of the structure of html, so the summary has the same look and feel as the original, but is shorter. The background will be the same, and a few of the most relevant images will have been included. Tables and lists will also have been meaningfully summarised, without compromising the relationship between elements. |
|
| Meaning based | ![]() However, this is not a structural summariser. Rather it leads with the meaning and only follows with the format, so the summary remains meaning based. As ever, a summary can be focused with a natural language query. This works very well in conjunction with a search engine, where you want summaries of documents focused on your original search query. |
|
| Focused summaries | These abstracts can either be from the point of view of the document itself, or can be angled to a natural language query. This last feature works very well alongside a searching system, as the abstacts provide a way of 'drilling-down' into the document. |
|
| Algorithms | When creating an abstract, the summariser bases its analysis on 'Discussion Flow Analysis', a machine intelligence technique which constrains the resultant abstract to have the same flow of meaning as the original document. Apart from giving much better quality abstracts than statistical or computational linguistic (rule based) techniques, discussion flow analysis alows an abstract to be focused on some particular aspect. |
|
| Control | Other features include the ability to control the generated html. For example, you could decide it was for viewing immediately on a web browser, or you could ask for it to be suitable for inclusion in a larger web document. The summariser can handle badly formed html, but only produced well-formed html. |
|
| Fast! | As ever, the summarisation process is fast. In fact, a document can be summarised quicker than it could be copied on a hard disk, due to our parser which can parse both natural language and html simultaneously. |
|
Toolkit | This summariser is available as a toolkit for C and Delphi programmers, available from the developer support pages, and will shortly be available as a CGI script. |