How do you change the content of a content control in Word 2007 with OpenXml SDK 2.0?

21,390

Solution 1

I found a better way to do the above using http://wiki.threewill.com/display/enterprise/SharePoint+and+Open+XML#SharePointandOpenXML-UsingWord2007ContentControls as a reference. Your results may vary but I think this will get you off to a good start:

using (WordprocessingDocument wordprocessingDocument = WordprocessingDocument.Open(filePath, true)) {
    var sdtRuns = mainDocumentPart.Document.Descendants<SdtRun>()
        .Where(run => run.SdtProperties.GetFirstChild<Tag>().Val.Value == contentControlTagValue);

    foreach (SdtRun sdtRun in sdtRuns) {
        sdtRun.Descendants<Text>().First().Text = replacementText;
    }

    wordprocessingDocument.MainDocumentPart.Document.Save();
}

I think the above will only work for Plain Text content controls. Unfortunately, it doesn't get rid of the content control in the final document. If I get that working I'll post it.

http://msdn.microsoft.com/en-us/library/cc197932.aspx is also a good reference if you want to find a rich text content control. This one talks about adding rows to a table that was placed in a rich text content control.

Solution 2

One EXCELLENT way to work out how to achieve the desired result is to use the document reflector tool that comes with the Open XML SDK 2.0....

For example, you could:

  1. In the Properties dialog for each of the content controls in your document, check the "Remove content control when the contents are edited".
  2. Fill them in and save it as a new doc.
  3. Use the reflector to compare the original and the saved version.
  4. Hit the show/hide code button and it will show you the code required to turn the original into the filled in version.

It's not perfect, but it's amazingly useful. You can also just look directly at the markup of either document and see the changes that filling in the controls caused.

This is a somewhat brittle way to do it because Wordprocessing ML is can be complicated; it's easy to mess it up. For simple text controls, I just use this method:

private void FillSimpleTextCC(SdtRun simpleTextCC, string replacementText)
    {
        // remove the showing place holder element      
        SdtProperties ccProperties = simpleTextCC.SdtProperties;
        ccProperties.RemoveAllChildren<ShowingPlaceholder>();

        // fetch content block Run element            
        SdtContentRun contentRun = simpleTextCC.SdtContentRun;
        var ccRun = contentRun.GetFirstChild<Run>();

        // if there was no placeholder text in the content control, then the SdtContentRun
        // block will be empty -> ccRun will be null, so create a new instance
        if (ccRun == null)
        {
            ccRun = new Run(
                new RunProperties() { RunStyle = null },
                new Text());
            contentRun.Append(ccRun);
        }

        // remove revision identifier & replace text
        ccRun.RsidRunProperties = null;
        ccRun.GetFirstChild<Text>().Text = replacementText;

        // set the run style to that stored in the SdtProperties block, if there was
        // one. Otherwise the existing style will be used.            
        var props = ccProperties.GetFirstChild<RunProperties>();
        if (props != null)
        if (props != null)
        {
            RunStyle runStyle = props.RunStyle;
            if (runStyle != null)
            {
                // set the run style to the same as content block property style.
                var runProps = ccRun.RunProperties;
                runProps.RunStyle = new RunStyle() { Val = runStyle.Val };
                runProps.RunFonts = null;
            }
        }
    }

Hope that helps in some way. :D

Solution 3

Your first approach to remove the sdtRun and adding a new one will obviously remove the formatting because you are only adding a Run but not the RunStyle. To preserve the formatting you should create run elements like

new Run( new RunProperties(new RunStyle(){ Val = "MyStyle" }),
                            new Text("Replacement Text"));

Your second approach to replace all Decendants<Text> will work for Plain Text Content Control only because a Rich Text Content Control does not have SdtRun element. Rich Text Content Control is SdtBlock with SdtContent elements. A rich text content control can have multiple paragraphs, multiple Runs and multiple Texts. So your code, sdtRun.Descendants<Text>().First().Text = replacementText, will be flawed for a Rich Text Content Control. There is no one line code to replace the entire text of a rich content control and yet preserve all the formatting.

I did not understand what you mean by "it doesn't get rid of the content control in the final document"? I thought your requirement here is to change the text (content) only by preserving the content control and the formatting.

Solution 4

Another solution would be

        SdtRun rOld = p.Elements<SdtRun>().First();

        string OldNodeXML = rOld.OuterXml;
        string NewNodeXML = OldNodeXML.Replace("SearchString", "ReplacementString");

        SdtRun rNew = new SdtRun(NewNodeXML);


        p.ReplaceChild<SdtRun>(rNew, rOld);

Solution 5

CONTENT-CONTROL TYPES

Depending on the insertion point in the Word document, there are two types of content-controls that are created:

  • Top-level (at the same level as paragraphs)

  • Nested (typically within an existing paragraph)

Confusingly, in the XML, both types are tagged as <sdt>...</sdt> but the underlying openXML classes are different. For top-level, the root is SdtBlock and the content is SdtContentBlock. For nested, it is SdtRun & SdtContentRun.

To get both types, ie all content-controls, it is better to iterate via the common base class which is SdtElement and then check the type:

List<SdtElement> sdtList = document.Descendants<SdtElement>().ToList();

foreach( SdtElement sdt in sdtList )
{
   if( sdt is SdtRun )
   {
      ; // process nested sdts
   }

   if( sdt is SdtBlock )
   {
      ; // process top-level sdts
   }
}

For a document template, all content-controls should be processed - it is common for more than one content-control to have the same tag-name eg customer-name, all of which typically need to be replaced with the actual customer name.

CONTENT-CONTROL TAG NAME

The content-control tag-name will never be split.

In the XML, this is:

<w:sdt>
...
<w:sdtPr>
...
<w:tag w:val="customer-name"/>

Because the tag-name is never split, it can always be found with a direct match:

   List<SdtElement> sdtList = document.Descendants<SdtElement>().ToList();
        
   foreach( SdtElement sdt in sdtList )
   {
       if( sdt is SdtRun )
       {
         String tagName = sdt.SdtProperties.GetFirstChild<Tag>().Val;

         if( tagName == "customer-name" )
         {
            ; // get & replace placeholder with actual value
         }

Obviously, in the above code, there would need to be a more elegant mechanism to retrieve the actual value corresponding to each different tag-name.

CONTENT-CONTROL TEXT

Within a content-control, it is very common for the rendered text to be split into multiple runs (despite each run having the same properties).

Among other things, this is caused by the spelling/grammar checker & number of editing attempts. Text splitting is more common when de-limiters are used eg [customer-name] etc.

The reason why this is important is that without checking the XML, it is not possible to guarantee that placeholder text has not been split so it cannot be found and replaced.

ONE SUGGESTED APPROACH

One suggested approach is to use only plain-text content-controls, top-level and/or nested, then:

  • Find the content-control by tag-name

  • Insert a formatted paragraph or run after the content-control

  • Delete the content-control

     List<SdtElement> sdtList = document.Descendants<SdtElement>().ToList();
    
     foreach( SdtElement sdt in sdtList )
     {
        if( sdt is SdtRun )
        {
           String tagName = sdt.SdtProperties.GetFirstChild<Tag>().Val;
    
           String newText = "new text"; // eg GetTextByTag( tagName );
    
           // should use a style or common run props
    
           RunProperties runProps = new RunProperties();
    
           runProps.Color    = new Color   () { Val   = "000000" };
           runProps.FontSize = new FontSize() { Val   = "23" };
           runProps.RunFonts = new RunFonts() { Ascii = "Calibri" };
    
           Run run = new Run();
    
           run.Append( runProps );
           run.Append( new Text( newText ) );
    
           sdt.InsertAfterSelf( run );
    
           sdt.Remove();
        }
    
        if( sdt is SdtBlock )
        {
           ; // add paragraph
        }
     }
    

For top-level types, a paragraph would need to be inserted.

In this approach, content-controls are used only as placeholders that can guaranteed to be found (by tag-name) and then entirely replaced with the appropriate text (that is consistently formatted).

Also, this removes the need to format the content-control text (which then may be split so cannot be found.)

Using a suitable naming convention for the tag-names, eg Xpath expressions, enables further possibilities such as using other XML documents to populate templates.

Share:
21,390
Jason
Author by

Jason

I love writing code, whether it's working with a database, creating a RESTful API, writing JavaScript and C#, or tweaking CSS. I enjoy creating software that makes people's lives easier.

Updated on July 09, 2022

Comments

  • Jason
    Jason almost 2 years

    About to go mad with this problem. I'm sure it's so simple I'm just missing it, but I cannot for the life of me find out how to change the content of a content control in Word 2007 with the OpenXml SDK v2.0 in C#.

    I have created a Word document with a plain text content control. The tag for this control is "FirstName". In code, I'd like to open up the Word document, find this content control, and change the content without losing the formatting.

    The solution I finally got to work involved finding the content control, inserting a run after it, then removing the content control as such:

    using (WordprocessingDocument wordProcessingDocument = WordprocessingDocument.Open(filePath, true)) {
    MainDocumentPart mainDocumentPart = wordProcessingDocument.MainDocumentPart;
    SdtRun sdtRun = mainDocumentPart.Document.Descendants<SdtRun>()
     .Where(run => run.SdtProperties.GetFirstChild<Tag>().Val == "FirstName").Single();
    
    if (sdtRun != null) {
     sdtRun.Parent.InsertAfter(new Run(new Text("John")), sdtRun);
     sdtRun.Remove();
    }
    

    This does change the text, but I lose all formatting. Does anyone know how I can do this?