Friday, March 30, 2012

Is it possible to perform terms lookup on unstructured files ?

Hi,
I need to categorize a lot of html or text files according to a list of terms and I wonder if terms lookup is adequate for this. The problem is that terms lookup can only take an Oledb source as input. My files can be up to 80 Kb big and aren't columns structured.

Should I import my files in a table ? But if so, how can I import a column with more than 8000 characters ?

Thank you in advance.

I think you may have this the wrong way around. The list of terms must be stored in an OLE-DB sourced table, but the input is the data you want to examine. This can come from any upstream component. You will still need to get your data into the pipeline, but that is perhaps not quite as hard as OLE-DB. Maybe the Import Column Transform could help?

You mention a 8000 character limit, which is the limit for non-unicode strings in the varchar (T-SQL) or DT_STR (SSIS) data types. Whilst the Term transformations only support unicode data types, with their 4000 character limit, they do support the DT_NTEXT type, equivalent to the T-SQL ntext type, which allows up to 2GB of data.

|||Thank you very much for your quick reply. My mistake, you're right, I'm a new user of SSIS and I misunderstood the explanations on the lookup. I'm digging into this. Thanks again for your help.

No comments:

Post a Comment