/httpd/html/Corpus Eye

This corpus interface integrates three different types of corpora (text, CG-annotated and treebanks) and three different search-techniques (grep, cqp, tree-search) for a growing number of VISL-languages (Danish, Portuguese, English, German, Spanish, French and Esperanto).

Text corpora are running text collections from newspapers, novels, spoken language transcripts and historical texts. Text corpora can be searched both through the standard interface and (for some languages) through the new cqp-interface. Search input can be ordinary words or word sequences, and output will be in concordance format, with the search items centered and the sentence cut at a defined distance on both sides, and the number of hits at top or bottom. The cqp-interface also allows sorting and ordering of result based on left and right contexts. An info-button will provide some context and added information.

Annotated CG-corpora carry word based information on form and function (word class, syntactic function, in some cases semantic type). In order to access (search for) such information, you have to enter CG/VISL tags in the search string (standard interface). Details and links to tag-lists, definitions etc. are given at the individual search pages. The cqp-interface allows menu-based selection of search categories without prior knowledge to tag conventions.

Treebanks are annotated sentences that have been enriched with structural information. In VISL-source format, node daughters are indented for depth, in graphical VISL-format ("java-trees") nodes are linked by lines, and trees can be unfolded/collapsed interactively, completely or partially. The treebanks can be searched both for text and tag sequences, even for node variables. Smaller teaching treebanks with selected and pedagogically ordered sentences can be found for 22 languages at the VISL main site.

E-grep is a standard unix tool for fast searches in text files. T-grep is a special variant, designed for searches in syntactic treebanks. For more information, see Douglas Rohde's tgrep2 home page.

Regular expressions allow variables for characters and sets of characters, as well as repetition and negation operations on these variables. All interfaces on this site allow regular expressions on an optional basis for experienced users. For some examples see the search-help page of the (new) cqp-interface or the search manual of the (old) standard interfaces or one of the numerous internet-guides, - or call VISL for a folder.

CQP, the Corpus Query Processor, was developed at the Institut für Maschinelle Sprachverarbeitung, Stuttgart. It is both a search engine and a special query language, allowing fast and complex searches, once a corpus has been transformed into optimized search structures. Our interface is a graphical and menu-based front-end for this tool, but allows also direct searches in "cqp-speak". For some examples, see the cqp help file.