The Ontology Directed Classifier (ODC) is an interactive tool that enables the autonomous assignment of objects, based on their descriptions, to nodes in a target taxonomy. For example, ODC may classify "Band-Aid® Adhes Band Plst Strips 3/4 in. x 3 in." to UNSPSC class:
42 – Medical Equipment and Accessories and Supplies
4231 – Wound care products
423115 – Bandages and dressings and related products
42311505 – Bandages or dressings for general use
Classifying an object description to a taxonomy is the first step in relating one object to another. It also provides the context for attribute extraction since it is necessary to learn the class of the object to know what kind of attributes may be extracted.
Classifier Key Features:
- Taxonomy independent:
- Any user taxonomy can be loaded from a text file and used as the basis of classification. For example, NAICS and UNSPSC are commonly used taxonomies
- Statistics based on "word" frequencies, with extensions that improve on Bayesian classifiers
- Trainable:
- By loading pre-classified examples
- By user-validated/corrected previously classified items
- Can be boot-strapped, starting with no training items
- Supports flexible user-definable abbreviation expansions
- Generates an "answer cone" for each classified item:
- Multiple alternatives to the best guess can be generated, with associated confidences
- Size and shape of cone are controlled by user-settable parameters
- Powerful GUI allows the user to invoke automatic classification, verify results, and correct misclassified items using Drag and Drop
- Explanations for why an item is classified as it is, are provided on demand
- Validation mode:
- Supports user validation of percent of correct automatic classifications, using accepted data quality standards
- Statistics measuring quality of automatic classifications, provided on demand
- Read/write to/from field-delimited ASCII text files or databases
- Based on CDF
- Supports taxonomy-to-taxonomy mapping:
- Allows the user to build mappings between taxonomies, with automatic help
- Taxonomy-to-taxonomy mappings can be saved and restored
- If input items are pre-classified to taxonomy T1, and there exists a mapping of T1 to taxonomy T2, then the mapping can be used to improve the automatic classification of the input items to T2
- Scalable:
- Taxonomies of tens of thousands of nodes, with up to a hundred thousand training items can be handled
- Time-to-classify depends on the size of the taxonomy and the size of the training set; for large taxonomies, 5 items per second is reasonable
- Supports multiple users
- Supports batch-processing mode of use
- Uses WordNet in support of simple lexical transformations
Classifier Platform:
- Based on XJ
- Java WebStart enabled
For a solutions overview of Ontology Directed Classifier, click here.