Content in multiple formats can be integrated by pointing to the sources rather than by converting them.
Sometimes a minimal amount of conversion is necessary though. We'll help you by minimizing and automatizing the amount of work necessary to achieve your ability to add semantics to the pointers.
Extract Topics from Content
Our ingestion tools help you acquire topics automatically from your content.
The ingest process can be run at any time whenever the content is updated.
If your content is structured, you can define the kind of topics that will be created depending on various elements from your content.
If your content is not structured, we can help you create specific processes or integrate third-party tools to extract topics from the content.
Topics can be acquired from documents in a variety of source formats.
|XML||Extract XML using XPath patterns. You can assign a topic type depending on the element type in which a topic is found.|
|Excel Spreadsheet, CSV||Use data stored in Excel spreadsheets or CSV files. Customize extraction by assigning a topic type depending on the column.|
|Databases||Extract from fields to topics with field-type dependent topic types.|
|Web Pages||Extract metadata to create topics. Customize extraction by using headers.|
|EPUB||You can extract topics from ebooks in EPUB format.|
|Word, OpenOffice||Extract topics based on metadata, styles, index terms.|
|Text||Use a list of terms to extract topics from text. Integrate third-party data mining software to get smart extraction of topics from full text.|
|Retrieve content from within PDF (unless it is an image).|
|JSON||Use JSON as an API to extract topics.|
We can help you extract topics from your own formats.