Content Integration

Content in multiple formats can be integrated by pointing to the sources rather than by converting them.

Sometimes a minimal amount of conversion is necessary though. We'll help you by minimizing and automatizing the amount of work necessary to achieve your ability to add semantics to the pointers.

Ingest sources


Extract Topics from Content

Our ingestion tools help you acquire topics automatically from your content. 

The ingest process can be run at any time whenever the content is updated.

If your content is structured, you can define the kind of topics that will be created depending on various elements from your content.

If your content is not structured, we can help you create specific processes or integrate third-party tools to extract topics from the content.

Customized Formats

We can help you extract topics from your own formats.

Topics can be acquired from documents in a variety of source formats.

XML Extract XML using XPath patterns. You can assign a topic type depending on the element type in which a topic is found.
Excel Spreadsheet, CSV Use data stored in Excel spreadsheets or CSV files. Customize extraction by assigning a topic type depending on the column.
Databases Extract from fields to topics with field-type dependent topic types.
Web Pages Extract metadata to create topics. Customize extraction by using headers.
EPUB You can extract topics from ebooks in EPUB format.
Word, OpenOffice Extract topics based on metadata, styles, index terms.
Text Use a list of terms to extract topics from text. Integrate third-party data mining software to get smart extraction of topics from full text.
PDF Retrieve content from within PDF (unless it is an image).
JSON Use JSON as an API to extract topics.