Documents, audio recordings, videos, images – data is growing daily in the business world and most of it is unstructured, which makes it difficult for many organizations to extract insights and actionable information for improved business operations and smarter decision-making. These business issues are now changing with the rise of new AI technologies, machine learning and natural language processing in today’s search and analytics solutions. From e-commerce and customer service, to intranet portals and collaboration, business leaders hungry for greater automation and intelligence are finding opportunities to improve processes and better empower their workforce to drive transformative results.

AI, Machine Learning, Natural Language Processing can be complicated strategies to integrate into Enterprise systems. Give your unstructured data project a kick-start with a commercially proven and industry tested solution that brings AI, ML and NLP all under one hood and with a straight forward API – Doc-Tags / xAIgent!

We’re making Complex Data Driven Decisions a whole lot easier!
Investigate Doc-Tags Today  =


Enterprise Content Management – Context

Enterprise Content Management systems(ECM) or Intelligent Information Management systems (IIM) is the development of strategies, methods, and tools used to create, capture, automate, deliver, secure and analyse content and documents related to organizational process. There are a number of enterprise solution providers in this arena – AABBY, Documentum, Hewlett Packard, IBM, Laserfiche, Microsoft, Oracle to name a few. Each with their own perspective and collection of tools that make their solution the right one.

Organizational process revolves around structure and their supporting information including contracts, forms, agreements and the like, which for the most part are managed as structured information (collections of clauses, responses, form-based data…). How those structured data are acquired, imputed, processed and consumed is the foundation of the solutions offered by those mentioned above and others.

Another basic tenant of Enterprise Content Management is the employment of strategies for managing, categorizing and indexing unstructured content in support of the organizational processes. A common approach for giving unstructured content structure is to employ Tags. Meta Tags, keywords, Key Phrases – concise descriptions that can be added to the profile of the content such as Document Tagging, which enhances the contextual accuracy for searching and retrieving content when required.

When unstructured content, legal precedence for instance, is being logged into an Enterprise Content Management system, if the person tasked with logging the content is not the author and the document does not have author-provided Tags there are only a couple of options for giving that unstructured content any resemblance of structure. As long as the operator has security clearance to view the content, they can – i) read the document and define the tags that should be used; ii) use the document title, first paragraph or synopsis (if there is one) along with the file name to make a best efforts guess and select a generic category item from a pre-defined list of options already set in the system for categorizing content; or iii) use a phrase parsing strategy and referential library (Bayesian / Heuristic algorithms) to give the content some structure based on general, pre-defined subject matter terminology. The latter forms of content classification are okay, at least they provide some structure and a better chance that the content being managed can be retrieved with a little more accuracy.

These somewhat automated referential approaches rely upon pre-cast, narrow focus, subject specific referential libraries – that may or may not relate to the content being managed. For instance, medical malpractice is an entirely different subject matter from bio-tech patent law. Both dealing with legal and medical matters, however, the subject terminology of each are at different ends of the spectrum. This is a simple example of where a generic referential process really doesn’t work and to correct it what is needed are two specific referential libraries, each tailored to that branch of law. Not very efficient and expensive to produce.

What about a fourth option, where an understanding of the construct of human language is employed allowing for the target content to be parsed, in context of itself, to reveal a primary set of key phrases? In essence a process that strips away all of the conjugative words, the if’s, and’s, but’s…, to reveal a collection of content specific key phrases. And, that process compares how many times each key term is used throughout the document and the frequency of each relevant term given a ranking. The highest ranked terms are then used to retrieve the most predominant examples from the target content of that term’s use. A Key phrase / Keyword extracted summary if you will. Automatically, without training (no need for referential libraries), unsupervised, solely in context of the target content, accurately with pure subject relevance.

This fourth option is a patent-backed artificial intelligence and machine learning based approach available today. A content specific, key term extraction approach that relies upon patented Artificial Intelligence and machine learning technologies for deriving target document – accurate, contextual, relevant Tags. This strategy is baked into Doc-Tags™. The only solution available today providing document specific, contextually accurate, unsupervised process for Automatically giving a document or collection of documents their own file specific Tags.

Now, think of employing Doc-Tags™ in an Enterprise Content Management system where unstructured content can be given its own custom structure based upon relevant, contextual, accurate Tags. ECM content now stored, secured, analyised with the most accurate search and retrieval possible. Test Drive Context Today –™.

Accurate, Contextual, Relevant – Unsupervised, Automatic Document Tagging –™

Automatically Extract Contextually Accurate Keyphrases

Using advanced linguistic focused Artificial Intelligence and Machine Learning processes found in to the patented Extractor technology, the xAIgent RESTful web service provides subscribers with an effortless software service for automatically extracting contextually accurate keyphrases / key words / key terms – from any subject matter content.
The xAIgent RESTful service, uses the patented Extractor hybrid Artificial Intelligence and machine learning Linguistic Technology to provide subscribers with the most accurate and contextually relevant key terms from any subject domain text, automatically (unsupervised).
In contrast, it’s worthwhile to note there are other keyphrase extraction systems and most based on heuristic and Bayesian derived key word extraction models. Each inherently requiring their systems to be manually trained per each subject domain the developer / user wishes to employ.  Training is a process whereby a library (corpus) of pre-defined, domain specific content and keywords must first be compiled an then incorporated in to the comparative structure (supervised process) of that system.  Cumbersome at best. Tedious and time intensive expert knowledge required.
The xAIgent automatic keyphrase extraction RESTful web service is ready for consumption immediately, without further training, supporting English, French, German, Japanese, Korean and Spanish, and provides subscribers with the most accurate, contextually relevant keyphrases of any solution available today.
Where would an automatic, contextually accurate key word / keyphrase extraction RESTful service be useful?
Think of document management and content management systems, where their contents must first have key terms / key words assigned to each document prior to its inclusion in the repository. If the author has not previously tagged the content, then a subject matter expert must be employed to appropriately determine the key terms that describe the document. Read / Re-read the content, identify the key words, terms and phrases and then annotate to the document. Then the document / content can be included into the document management system.
To help alleviate the read / re-read document process, often document management systems will have a generic list of subject terms to select from and assign to the document. That may be all well and good if the documents being consumed are all of similar subject matter, but is that really the best approach? Wouldn’t it be better to have contextually accurate key terms per document that would then allow the true value of the documents being included into the management system to be exposed? Allowing them to b effectively accessed, searched, referenced and reported on.
Of course and the simple answer… subscribe to the xAIgent RESTful web service and have objective (human generated key words by their nature are subjective), contextually accurate key terms generated automatically. For all unstructured content. In other words, set your own xAIgent enhanced system to work through a collection of content folders and have each document automatically associated with its own set of key words / key terms / key phrases / tags – Automatically. Come back when the process is completed (over night) and start to fully realize the enhanced value that has now been surfaced for that collection of documents.
There are many other aspects of the xAIgent (Extractor) service to note and we’ll do that in subsequent editions, including why xAIgent is so good at retrieving key phrase content from websites and why research shows the automatic xAIgent (nee Extractor) key term extraction process carries an accuracy rating of up to 87 percent.