Hi,
I have to take a hierarchical XML file and store it into a table.
I'm looking at the XML Source and it shows me all of the various XML tags as separate tables. I know that a lot (if not all) of the data can be denormalized so that I have flattened records in one table instead of multiple tables.
I was hoping the XML source would allow me to designate how to denormalize the data but it doesn't seem like I can without using a bunch of sorts and merges in the data flow to denormalize it which I'm trying to avoid.
Does anyone know if SSIS has any easy way to flatten XML data? I know about XSL transforms but I was hoping that there might be an easier method.
Thanks,
-dhideal
The pivot transform wil denormalise data so perhaps that could help you in some way. Fundamentally though you can't stop the XML source from producing multiple outputs because that is what it does - converts input into something that SSIS understands.
-Jamie
|||Couple of ways to flatten XML in SSIS. First off, flattening it using the stock SSIS source adapter is quite arduous, because of all the sort/merges. The more hierarchical, the worse off the the morass becomes. It rapidly becomes unmaintainable.
4 ways to do this, with various degree of difficulty (a fifth non SSIS solution thrown in for good measure)
1. XSLT, as you mentioned. Not too bad, dependency on your profienciency (and/or antipathy) for XSL.
2. Build a custom source adapter and use .NET 2.0 XPathExpressions to load the pipeline.
3. Build a custom source adapter based on an XmlSerializer:
To do so, get the XSD, convert the XSD to an XmlSerializer calss with xsd.exe as follows
xsd sample_schema.xsd /classes /language:vb
4. Build a custom source adapter based on a dataset
xsd sample_schema.xsd /dataset /language:vb
5. Load it into SQL server and use XQuery (not an SSIS solution)|||
jaegd,
Thanks for the pointers.
It seems though that I'm not understanding something because I looked up the items you mentioned and it doesn't seem like they accomplish what I want. The following is my understanding of things (which is most likely incorrect):
1) XSLT requires the use of XPath to identify the various tags to be processed and the structure of the XSL document will take care of the flattening.
2) Based on some information I found, XPathExpressions are just compiled XPath statements. I don't understand how this would flatten the data. It seems I would have to have some custom code in the source adapter to take the results of the XPathExpressions searches and join them together to flatten the data.
3) Based on what I read, it seems like I would have to deserialize the XML data in order to be able to access it. The deserialization would then restore the original hierarchichal structure so I'm not quite sure I understand how this would flatten the data. It seems like if I could use the serialized version fo the data it might get me what I need.
4) Based on what I read, the dataset would be comprised of datatables and their relationships. It seems like I would then have to identify the relationships in the custom code to identify how the tables should be linked together and then denormalize the data by joining them together in the source adapter. I'm not quite sure how this would be different from using suggestion 5 except for the benefit of not having to load the data into SQL Server first.
5) We're trying to avoid this particular solution for now if we can.
Once again, thanks for the response and the pointers. I'm sure I have misunderstood some of the items as I have not really worked much with XML data prior to this.
-dhideal
|||It seems you understand it quite well. To flatten, though must join and/or aggregate.Flattening XML is custom, and there is no "push to flatten" task or component in SSIS, for lack of better term.
Now, I have tested all five of these approaches, and used two of them
(XPath,XmlSerializer, leaving out XQuery for the moment). They all
require you to essentially do XML "joins" via custom code, whether that
code is a set of XPath expressions in a custom adapter, a set of
property "gets" in in a custom adapter.
Had Microsoft shipped a client side XQuery implementation, I would have
recommended that, since XQuery allows you to do XML joins, which is
basically what you're looking for.sql
No comments:
Post a Comment