|Oracle® Text Application Developer's Guide
10g Release 1 (10.1)
Part Number B10729-01
This chapter discuses the following topics:
This chapter describes how to get started with creating an Oracle Text developer and building simple text query and catalog applications. For each type of application, this chapter steps you through the basic SQL statements for loading, indexing and querying your tables.
More complete application examples are given in the Appendices. To learn more about building document classification applications, see Chapter 6.
Create and delete Oracle Text indexing preferences
Use the Oracle Text PL/SQL packages
To create an Oracle Text application developer user, execute the following SQL statements as the system administrator user:
Step 1 Create User
The following SQL command creates a user called
MYUSER with a password of
CREATE USER myuser IDENTIFIED BY myuser_password;
The following SQL command grants the required roles of
GRANT RESOURCE, CONNECT, CTXAPP TO MYUSER;
Step 3 Grant EXECUTE Privileges on CTX PL/SQL Packages
There are ten Oracle Text packages that enable you to perform actions ranging from synchronizing an Oracle Text index to highlighting documents. For example, the
CTX_DDL.SYNC_INDEX package enables you to synchronize your index.
To call any of these procedures from a stored procedure, your application requires execute privileges on the packages.
For example, to grant to
MYUSER execute privileges on all Oracle Text packages, issue the following SQL commands:
GRANT EXECUTE ON CTX_CLS TO myuser; GRANT EXECUTE ON CTX_DDL TO myuser; GRANT EXECUTE ON CTX_DOC TO myuser; GRANT EXECUTE ON CTX_OUTPUT TO myuser; GRANT EXECUTE ON CTX_QUERY TO myuser; GRANT EXECUTE ON CTX_REPORT TO myuser; GRANT EXECUTE ON CTX_THES TO myuser;
In a basic text query application, users enter query words or phrases and expect the application to return a list of documents that best match the query. Such an application involves creating a CONTEXT index and querying it with CONTAINS.
This example steps you through the basic SQL statements you use to load your text table, index your documents, and query your index.
Typically, query applications require a user interface. An example of how to build such a query application using the CONTEXT index type is given in Appendix A.
Step 1 Connect as the New User
Before creating any tables, assume the identity of the user you just created.
Step 2 Create your Text Table
The following example creates a table called
docs with two columns,
text, by using the
TABLE statement. This example makes the
id column the primary key. The
text column is
CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));
Step 3 Load Documents into Table
You can use the SQL
INSERT statement to load text to a table.
To populate the
docs table, use the
INSERT statement as follows:
INSERT INTO docs VALUES(1, '<HTML>California is a state in the US.</HTML>'); INSERT INTO docs VALUES(2, '<HTML>Paris is a city in France.</HTML>'); INSERT INTO docs VALUES(3, '<HTML>France is in Europe.</HTML>');
You can also load your table in batch with SQL*Loader.
Index the HTML files by creating a
CONTEXT index on the text column as follows. Since you are indexing HTML, this example uses the
NULL_FILTER preference type for no filtering and uses the
CREATE INDEX idx_docs ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.HTML_SECTION_GROUP');
Use the NULL_FILTER because you do not need to filter HTML documents during indexing. However, if you index PDF, Microsoft Word, or other formatted documents, use the CTXSYS.INSO_FILTER (the default) as your FILTER preference.
This example also uses the HTML_SECTION_GROUP section group which is recommended for indexing HTML documents. Using HTML_SECTION_GROUP enables you to search within specific HTML tags, and eliminates from the index unwanted markup such as font information.
You query the table with the SELECT statement with CONTAINS to retrieve the document ids that satisfy the query.
Before doing so, set the format of the SELECT statement's output so that it is easily readable. To do so, set the width of the
text column to 40 characters:
COLUMN text FORMAT a40;
Now use SELECT. The following query looks for all documents that contain the word France:
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'France', 1) > 0; SCORE(1) ID TEXT ---------- ---------- ---------------------------------------- 4 3 <HTML>France is in Europe.</HTML> 4 2 <HTML>Paris is a city in France.</HTML>
In a real application, you might want to present the selected document to the user with query terms highlighted. Oracle Text enables you to mark up documents with the CTX_DOC package.
We can demonstrate HTML document markup with an anonymous PL/SQL block in SQL*Plus. However, in a real application you might present the document in a browser.
This PL/SQL example uses the in-memory version of CTX_DOC.MARKUP to highlight the word France in document 3. It allocates a temporary CLOB (Character Large Object datatype) to store the markup text and reads it back to the standard output. The CLOB is then de-allocated before exiting:
SET SERVEROUTPUT ON; DECLARE 2 mklob CLOB; 3 amt NUMBER := 40; 4 line VARCHAR2(80); 5 BEGIN 6 CTX_DOC.MARKUP('idx_docs','3','France', mklob); 7 DBMS_LOB.READ(mklob, amt, 1, line); 8 DBMS_OUTPUT.PUT_LINE('FIRST 40 CHARS ARE:'||line); 9 DBMS_LOB.FREETEMPORARY(mklob); 10 END; 11 / FIRST 40 CHARS ARE:<HTML><<<France>>> is in Europe.</HTML> PL/SQL procedure successfully completed.
When you create a
CONTEXT index, you need to explicitly synchronize your index to keep it up to date with any inserts, updates, or deletes to the text table.
Oracle Text enables you to do so with the
Add some rows to the
INSERT INTO docs VALUES(4, '<HTML>Los Angeles is a city in California.</HTML>'); INSERT INTO docs VALUES(5, '<HTML>Mexico City is big.</HTML>');
Since the index is not synchronized, these new rows are not returned with a query on city:
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0; SCORE(1) ID TEXT ---------- ---------- -------------------------------------------------- 4 2 <HTML>Paris is a city in France.</HTML>
Therefore, synchronize the index with 2Mb of memory, and reexecute the query:
EXEC CTX_DDL.SYNC_INDEX('idx_docs', '2M'); PL/SQL procedure successfully completed. COLUMN text FORMAT a50; SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0; SCORE(1) ID TEXT ---------- ---------- -------------------------------------------------- 4 5 <HTML>Mexico City is big.</HTML> 4 4 <HTML>Los Angeles is a city in California.</HTML> 4 2 <HTML>Paris is a city in France.</HTML>
Oracle Text enables you to build simple Text and Catalog Web applications with the Oracle Text Wizard addin for Oracle JDeveloper. The wizard automatically generates Java Server Pages or PL/SQL server scripts you can use with the Oracle-configured Apache Web server.
Both JDeveloper and the Text Wizard can be downloaded for free from the following Oracle Technology Network (OTN) sites. Note that you need to register with OTN before you can access these pages.
You can obtain the latest JDeveloper software from:
See "Building the JSP Web Application" for an example.
You can obtain the Text, Catalog, and Classification Wizard addins from:
You can find instructions on using the Oracle Text Wizard and setting up your JSP files to run in a Web server environment from:
Follow the "Text Search Wizard for JDeveloper" link.
This example creates a catalog index for an auction site that sells electronic equipment such as cameras and CD players. New inventory is added everyday and item descriptions, bid dates, and prices must be stored together.
The application requires good response time for mixed queries. The key is to determine what columns users frequently search so that we can create a suitable
CTXCAT index. Queries on this type of index are issued with the
Note:Typically, query applications require a user interface. An example of how to build such a query application using the CATSEARCH index type is given in Appendix B.
Step 1 Connect as the Appropriate User
In this case, we connect as the user
myuser, whom we created in section "Create User".
Step 2 Create Your Table
Set up an auction table to store your inventory:
CREATE TABLE auction(
item_id NUMBER, title VARCHAR2(100), category_id NUMBER, price NUMBER, bid_close DATE);
Figure 2-1 illustrates this table.
Step 3 Populate Your Table
Now populate the table with various items, each with an
id, title, price and
INSERT INTO AUCTION VALUES(1, 'NIKON CAMERA', 1, 400, '24-OCT-2002'); INSERT INTO AUCTION VALUES(2, 'OLYMPUS CAMERA', 1, 300, '25-OCT-2002'); INSERT INTO AUCTION VALUES(3, 'PENTAX CAMERA', 1, 200, '26-OCT-2002'); INSERT INTO AUCTION VALUES(4, 'CANON CAMERA', 1, 250, '27-OCT-2002');
You can also load your table in batch with SQL*Loader.
Step 1 Determine your Queries
You need to determine what criteria are likely to be retrieved. In this example, you determine that all queries search the title column for item descriptions, and most queries order by price. When using the
CATSEARCH operator later, we'll specify the terms for the text column and the criteria for the structured clause.
Step 2 Create the Sub-Index to Order by Price
For Oracle Text to serve these queries efficiently, we need a sub-index for the price column, since our queries will order by price.
Therefore, create an index set called
auction_set and add a sub-index for the price column:
EXEC CTX_DDL.CREATE.INDEXT_SET('auction_iset'); EXEC CTX_DDL.ADD_INDEX('auction_iset','price'); /* sub-index A*/
Figure 2-1 shows how the sub-index relates to the columns.
Step 3 Create the CTXCAT Index
Create the combined catalog index on the
AUCTION table with
INDEX as follows:
CREATE INDEX auction_titlex ON AUCTION(title) INDEXTYPE IS CTXSYS.CTXCAT PARAMETERS ('index set auction_iset');
Figure 2-1 shows how the
CTXCAT index and its sub-index relates to the columns.
Figure 2-1 Auction table schema and CTXCAT index
When you have created the
CTXCAT index on the
AUCTION table, you can query this index with the
First set the output format to make the output readable:
COLUMN title FORMAT a40;
Now execute the query:
SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0; TITLE PRICE --------------- ---------- PENTAX CAMERA 200 CANON CAMERA 250 OLYMPUS CAMERA 300 NIKON CAMERA 400 SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'price <= 300')>0; TITLE PRICE --------------- ---------- PENTAX CAMERA 200 CANON CAMERA 250 OLYMPUS CAMERA 300
You can update your catalog table by adding new rows. When you do so, the CTXCAT index is automatically synchronized to reflect the change.
For example, add the following new rows to our table and then reexecute the query:
INSERT INTO AUCTION VALUES(5, 'FUJI CAMERA', 1, 350, '28-OCT-2002'); INSERT INTO AUCTION VALUES(6, 'SONY CAMERA', 1, 310, '28-OCT-2002'); SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0; TITLE PRICE ----------------------------------- ---------- PENTAX CAMERA 200 CANON CAMERA 250 OLYMPUS CAMERA 300 SONY CAMERA 310 FUJI CAMERA 350 NIKON CAMERA 400 6 rows selected.
Note how the added rows show up immediately in the query.
The function of a classification application is to perform some action based on document content. These actions can include assigning a category id to a document or sending the document to a user. The result is classification of a document.
Documents are classified according to pre-defined rules. These rules select for a category. For instance, a query rule of 'presidential elections' might select documents for a category about politics.
Oracle Text provides several types of classification. One type is simple, or rule-based classification, discussed here, in which you create both document categories and the rules for categorizing documents. With supervised classification, Oracle Text derives the rules from a set of training documents you provide. With clustering, Oracle Text does all the work for you, deriving both rules and categories. (For more on classification, see Chapter 6, " Document Classification" in this book.)
To achieve simple classification in Oracle Text, you create rules, which are essentially a table of queries. You index these rules in a
CTXRULE index. To classify an incoming stream of text, use the
MATCHES operator in the
WHERE clause of a
SELECT statement. Refer to Figure 2-2 for the general flow of a classification application.
Figure 2-2 Overview of a Document Classification Application
Step 1 Connect As the Appropriate User
In this case, we connect as the user
myuser, which we created in section "Create User".
Step 2 Create the Rule Table
We must create a rule table and populate it with query rules. In this example, we create a table called
queries. Each row defines a category with an id, and a rule which is a query string:
CREATE TABLE queries ( query_id NUMBER, query_string VARCHAR2(80) ); INSERT INTO queries VALUES (1, 'oracle'); INSERT INTO queries VALUES (2, 'larry or ellison'); INSERT INTO queries VALUES (3, 'oracle and text'); INSERT INTO queries VALUES (4, 'market share');
Step 3 Create Your CTXRULE Index
CTXRULE index as follows:
CREATE INDEX queryx ON queries(query_string) INDEXTYPE IS CTXRULE;
MATCHES operator in the
WHERE clause of a
SELECT statement to match documents to queries and hence classify.
COLUMN query_string FORMAT a35; SELECT query_id,query_string FROM queries WHERE MATCHES(query_string, 'Oracle announced that its market share in databases increased over the last year.')>0; QUERY_ID QUERY_STRING ---------- ----------------------------------- 1 oracle 4 market share
As shown, the document string matches categories 1 and 4. With this classification you can perform an action, such as writing the document to a specific table or emailing a user.
See Also:Chapter 6, " Document Classification" for more extended classification examples.