You know the saying: “It’s like finding a needle in a haystack.” Not when you pay attention to descriptive metadata, an extremely important yet sometimes overlooked aspect of enterprise content management (ECM).
What is metadata, you ask? Well, metadata is “information about information,” used to describe and categorize content in a content management system (CMS). The document author, date created, and file type are examples of very basic metadata. On a large website with multiple topics, finding a document that doesn’t contain metadata is difficult.
Narrowing the search and related content with metadata
Last fall we launched a new website for Mackenzie Investments, powered by OpenText’s Web Site Management Server (previously known as RedDot) and the Google Search Appliance (GSA). One of the core features of the site is a rich faceted search that allows users to quickly find the web page, PowerPoint presentation, or PDF document they are looking for. Another vital feature is the contextual related content presented on pages. For example, when viewing a fund page, you have instant access to related fund materials and regulatory documents.
Storing and managing assets
Associating a document with a particular topic, fund or fund manager is all accomplished through metadata. When working with thousands of business-essential documents, a reliable, and sustainable method is required for managing this in the CMS. More importantly, it has to be feasible for the content authors populating the CMS.
OpenText’s Web Site Management Server features an Asset Manager for storing and managing documents within the CMS. Out-of-the-box, it enables users to bulk import files and add basic metadata on files individually.
The solution for Mackenzie Investments required the ability to validate the metadata and populate or update the metadata for thousands of documents at a time.
Developing a solution with technical requirements and the content author in mind
As part of a holistic approach, we introduced the concept of an “Asset Wrapper” — a CMS page that “wraps” the document and provides fields for populating and validating the metadata. In SmartEdit and SmartTree (the visual and backend authoring interfaces in Website Management Server), content authors can create an Asset Wrapper, add the document, and populate the metadata.
Data from an Asset Wrapper is published from the CMS into a Google Search Appliance Feed — a special XML file that the Google Search Appliance crawls, and contains a link to the document as well as all the metadata. Typically, a Google Search Appliance crawls the website discovering linked documents and parsing the documents for data. By using a feed, we can provide metadata, which isn’t part of the content, in a single file, allowing the appliance to update the index and search results much quicker.
<gsafeed> <header> <datasource>mackenzie</datasource> <feedtype>metadata-and-url</feedtype> </header> <group> <record url="sample-document.pdf" mimetype="text/html" last-modified="2014-08-13T13:45:19"> <metadata> <meta name="maximum-order-quantity" content="0" /> <meta name="product-code" content="0" /> <meta name="language" content="en" /> <meta name="description" content="This is a sample document." /> <meta name="file-size" content="161431" /> <meta name="display-date" content="2014-08-13" /> <meta name="title" content="Sample document" /> <meta name="short-title" content="Sample document" /> <meta name="company-code" content="1" /> <meta name="file-type" content="PDF" /> <meta name="audience" content="All" /> <meta name="business-group" content="Communications" /> <meta name="category" content="Retirement Planning" /> <meta name="display-group" content="Marketing Materials" /> <meta name="document-type" content="Brochures and Guides" /> <meta name="partner" content="Mackenzie Investments" /> <meta name="topic" content="Retirement Planning" /> </metadata> </record> </group> </gsafeed>
The website was built from the ground up, requiring content authors to populate each document. With tight timelines and thousands of documents, we created an application to drastically reduce the manual effort.
A strength of OpenText’s Web Site Management Server is its web service and comprehensive programming interface called RedDot Query Language (RQL). This lets applications execute RQL commands against the CMS to do anything a user can do manually, from creating a content class and elements to creating a page and populating content.
Here at Yellow Pencil, we developed a .NET RQL library, an object-oriented command set of RQL commands, which makes it possible to perform complex operations with a small amount of code. We used our library, the same one behind our Script Runner tool, to create the Bulk Asset Tagger. It’s an application for bulk creating Asset Wrappers and populating the metadata using a comma-separated values (CSV) file as the data source. An advantage of using a CSV file is users unfamiliar with the CMS can create and maintain the file, and then provide it to content authors for processing.
Content authors would simply install the application, select the CSV file and location where the Asset Wrappers should be created, and let the Bulk Asset Tagger take care of the rest. The application would validate the CSV file, advising the user of any issues, and create the Asset Wrappers. Not only could the application be used for initially populating content, it also allowed content authors to update existing Asset Wrappers by running it again with an updated CSV file.
On large websites with tons of content, knowing how to efficiently manage metadata won’t necessarily eliminate the proverbial haystack, but it can make that needle a whole lot larger — and easier to find.
Want to learn more about how to manage metadata in OpenText Website Management? Just send me an email or leave a comment below.