Wednesday, November 14, 2012

Anatomy of an SSIS Package InfoPath Import

Notes: This is the draft of a sample project and code article being developed to document a solution for collecting disparate values via a Microsoft Office InfoPath form deployed in SharePoint server.   The data model may remind you of structures that are referred to as 'key-value pairs' or 'entity-attribute-value'.  Although the focus here was on meeting a business need versus strict adherence to a pattern

This has been core to my work, but I believe the generic pattern you see may have some use to others.  Since I could take this topic in many directions
·         upstream to the mechanics of deploying this form in a SharePoint server
·         downstream to the database structures for working with this data all the way to incorporating it into performance metrics and a data warehouse, or
·        across to the management of multiple InfoPath forms using this model
Since I'm not yet decided where to start the sample project, this is an anatomy of the 'middle', the place where the data crosses from the user interface oriented structures of InfoPath to the raw data where it can be used in further database processes.

Preparation Step
Creates the Working folder and Archive folder if they don't already exist

For Each Form
The 'Prepare XML File' container prepares the XML file generated by submitting an InfoPath form for access as a tabular data set, suitable for importing to a relational database.   The XML file is progressively transformed and the output of each step stored in the working folder.  This enables instrumentation or troubleshooting to occur.

The steps include:
"Copy File Into Working Space" [File System Task]: Copying the current file being processed as is into the working directory
"Strip Namespaces" [XML Task]: Removes to a custom namespace generated by the InfoPath form submittal called 'my'.  This is accomplished through a short XSLT script.
"Wrap in a table element" [XML Task]: Encapsulates the default 'myFields' node in a node called 'myTable'.   This allows the later data flow task to see the file as a dataset with each InfoPath field as a column.  Without this, each column appears as a separate data set. This is accomplished through a short XSLT script.
"Pivot to Key-Values" [XML Task]: In this solution, the exact names and quantity of the fields vary over time or are unknown at design time.  This step converts every InfoPath field into a 'field' record in the file, making the true InfoPath field name into an attribute.  It then can be imported as data rather than being part of the metadata.

The 'Import Raw Values to Output' container actually imports from the file into a data destination such as a relational database in the 'Import Raw Values' [Data Flow Task].   This is how the values from multiple InfoPath forms get to a single location against which further queries can be written.  Since the form was earlier pivoted to key values, the InfoPath field name is one of the columns that is available to import.

The last step 'Archive the File' [File System Task] completes the processing of that form.  The form is moved to a subdirectory of the Directory being processed, renamed with a timestamp that can be used to examine the contents of the file at that point-in-time.  That subdirectory is compressed or archived in a separate process.

Connection Managers

The connection managers for the package support the InfoPath XML translation.  The output of each step, and then the final transformed copy is output to the file system to ease troubleshooting.  Each of the separate XSLT scripts that does some transformation to the file is a file connection.  This allows the XSLT to be housed in a shared source directory. 

For the sample code, the 'output' and 'outputtext' connection managers exist to demonstrate exporting multiple InfoPath forms to a flat files.  In production, this would be a relational database such as SQL Server

1 comment:

Unknown said...

The data assessment and its understanding is not easy for anyone, because not a common men knows the machine language and they cannot manage the huge data for making any project, that’s why people or companies hires the data scientist for the helping of understanding the big data and machine language for making the project and you can get them form this