Text Analytics

Enterprise Content Management – Context

Enterprise Content Management systems(ECM) or Intelligent Information Management systems (IIM) is the development of strategies, methods, and tools used to create, capture, automate, deliver, secure and analyse content and documents related to organizational process. There are a number of enterprise solution providers in this arena – AABBY, Documentum, Hewlett Packard, IBM, Laserfiche, Microsoft, Oracle to name a few. Each with their own perspective and collection of tools that make their solution the right one.

Organizational process revolves around structure and their supporting information including contracts, forms, agreements and the like, which for the most part are managed as structured information (collections of clauses, responses, form-based data…). How those structured data are acquired, imputed, processed and consumed is the foundation of the solutions offered by those mentioned above and others.

Another basic tenant of Enterprise Content Management is the employment of strategies for managing, categorizing and indexing unstructured content in support of the organizational processes. A common approach for giving unstructured content structure is to employ Tags. Meta Tags, keywords, Key Phrases – concise descriptions that can be added to the profile of the content such as Document Tagging, which enhances the contextual accuracy for searching and retrieving content when required.

When unstructured content, legal precedence for instance, is being logged into an Enterprise Content Management system, if the person tasked with logging the content is not the author and the document does not have author-provided Tags there are only a couple of options for giving that unstructured content any resemblance of structure. As long as the operator has security clearance to view the content, they can – i) read the document and define the tags that should be used; ii) use the document title, first paragraph or synopsis (if there is one) along with the file name to make a best efforts guess and select a generic category item from a pre-defined list of options already set in the system for categorizing content; or iii) use a phrase parsing strategy and referential library (Bayesian / Heuristic algorithms) to give the content some structure based on general, pre-defined subject matter terminology. The latter forms of content classification are okay, at least they provide some structure and a better chance that the content being managed can be retrieved with a little more accuracy.

These somewhat automated referential approaches rely upon pre-cast, narrow focus, subject specific referential libraries – that may or may not relate to the content being managed. For instance, medical malpractice is an entirely different subject matter from bio-tech patent law. Both dealing with legal and medical matters, however, the subject terminology of each are at different ends of the spectrum. This is a simple example of where a generic referential process really doesn’t work and to correct it what is needed are two specific referential libraries, each tailored to that branch of law. Not very efficient and expensive to produce.

What about a fourth option, where an understanding of the construct of human language is employed allowing for the target content to be parsed, in context of itself, to reveal a primary set of key phrases? In essence a process that strips away all of the conjugative words, the if’s, and’s, but’s…, to reveal a collection of content specific key phrases. And, that process compares how many times each key term is used throughout the document and the frequency of each relevant term given a ranking. The highest ranked terms are then used to retrieve the most predominant examples from the target content of that term’s use. A Key phrase / Keyword extracted summary if you will. Automatically, without training (no need for referential libraries), unsupervised, solely in context of the target content, accurately with pure subject relevance.

This fourth option is a patent-backed artificial intelligence and machine learning based approach available today. A content specific, key term extraction approach that relies upon patented Artificial Intelligence and machine learning technologies for deriving target document – accurate, contextual, relevant Tags. This strategy is baked into Doc-Tags™. The only solution available today providing document specific, contextually accurate, unsupervised process for Automatically giving a document or collection of documents their own file specific Tags.

Now, think of employing Doc-Tags™ in an Enterprise Content Management system where unstructured content can be given its own custom structure based upon relevant, contextual, accurate Tags. ECM content now stored, secured, analyised with the most accurate search and retrieval possible. Test Drive Context Today – Doc-Tags.com™.

Accurate, Contextual, Relevant – Unsupervised, Automatic Document Tagging – www.Doc-Tags.com™

Advertisements

DOCUMENT TAGs – part I

Document Tagging
Why are Contextually Accurate Document Tags Important?

You are likely aware, every file on your system has a set of attributes called file properties that include such things as the name of the author and the date that the file was last modified. Tags are another type of file property, designed to be customized by the user. Tags are great for making searching easier because you can use words or even phrases that make sense to you. Think of Tags as keywords.

WordDocPropertiesPage-Taggeds

Tags are must-haves for Document Management Systems, making search retrieval incredibly efficient. By adding your own tags to your Word Documents and other file types, you will make your own search retrieval significantly more accurate, especially when using the later editions of Windows File Explorer (Windows 7 and later).

Tagging a Word document as you’re creating it (saving the file) is a great habit to get into. That’s great but what if you haven’t been in the habit of adding Document Tags? In other words, what to do if you want to make a collection of existing documents, that haven’t been tagged, highly searchable and in context of specific subject matter? Add Contextually Accurate Document Tags to those documents, likely using one of three approaches:

>    If you’re familiar with the content of a particular file you can directly right mouse click on the file and edit the properties by adding your own custom tags (see the image above);

>>  The second option would be to open the file and read it – once you’ve got a handle on the content select Save As and the ‘More Options…’ click right underneath the file name field and add your new custom – Author, Tag, Title and Subject key words;

WordDocAddTagOnSave_500s

>>> The third option, crack open Doc-Tags (www.dbi-tech.com/doctags) and have this clever utility create Contextually Accurate Document Description Tags for you.

DocTags_DefaultLaunchTrimmed550

Automatically!

FileProperties-NoTag-Tagged