Wednesday 3 December 2014

Document Conversion Services in SharePoint 2013

In this article I'm going to walk you through how to set up Document Conversion Services in SharePoint 2013. This service enables Microsoft Word (.docx) documents to be converted to SharePoint web pages.
Before you get carried away and implement this capability I urge you to read the 'Stuff they don't tell you' section at the end of this article as you might be surprised to discover, as I was, that sadly this service may fail to meet with your expectations.
There are a few steps to get things sorted and mostly these have not changed since SP2010.
Setting up Document Conversion Services in SharePoint 2013 is a fairly straight forward affair. Just start and configure the Document Conversions Load Balancer Service and the Document Conversions Launcher Service and then configure and enable document conversions on the web applications where the service is to be used.
Be aware that the capability is only available in Sharepoint Server so if your platform is Foundation then you are out of luck.
Also, note that you need to enable publishing infrastructure on the target publishing site.
You are then able to publish source Word documents as web pages via and ECB 'Convert To' menu item.

Start the services

To enable document conversions the Document Load Balancer and the Document Conversion Launcher services need to be started.
To do that simply login to SharePoint Central Admin (CA) and click on the 'Manage services on server' link in the System Settings group

First start the Document Conversion Load Balancer Service

Now start the Document Conversions Launcher Service and you'll be taken to the Launcher Service Settings page.

At this page you select the Load Balancer server and port number for the Launcher Service. It should now be fairly obvious why you need to start the Document Conversions Load Balancer Service first as you would be unable to specify the Load Balancer in the Launcher Service Settings page until there is a load balancer service available.
Both services should now be running.

Configure the Document Conversion Settings

We next need to configure the services for use on web applications which require the capability.
Go to the General Application Settings page in SharePoint CA

Click on the 'Configure document conversions' link in the External Service Connections group to access the Configure Document Conversions page.

From this page select the target web application and choose to enable (or subsequently disable) document conversions. Select the load balancer server and configure the conversion schedule. By default a conversion job runs every minute.
In the Converter Settings section you will notice 4 links which enable the customised configuration of the service when tasked to process InfoPath forms, Word documents (with and without macros) and XML to web pages.
The screenshot below shows the customised settings page for Word documents (.docx) to Web Pages but all four links point to the same DocTranCustomizeAdmin.aspx page but with a different TID web URL parameter to distinguish between them and as such the customise configuration setting options are the same for each.

Basically you can choose to enable/disable the convertor for that task. So you could allow for the conversion of Word documents (.docx) but disable the option to convert Word macro enabled documents (.docm) for example.
The other settings for time out, retries and maximum file size are fairly self-evident. Leave the 'Maximum file size (in KB)' box empty if you do not set a file size limit.
When you have finished configuring the custom settings (as may be required) click on the Apply button in the Configure Document Conversions page. Strangely SharePoint gives you no indication that the settings have been successfully applied but clicking Ok and then re-accessing the page will confirm that the settings have indeed been saved.

Prepare a test site

If you don't already have a site configured for the service then create a new site ensuring that it contains a document library to host source Word documents.
Access the site and upload a sample Word document.
Now enable the Publishing Infrastructure feature. This is a site collection scoped feature and so needs to be enabled from the site settings page in the root web site.

Finally we need to enable the SharePoint Server publishing feature which is scoped at the web level so can be accessed from the 'Manage site features' link in the Site Actions group of the Site Settings page.

Perform a test run

If everything has been successfully configured you should now see a new 'Convert Document > From Word Document to Web Page' item in the Edit Control Block (ECB) menu for a source Word document.

When you select this menu item you will be directed to the CreatePage.aspx page as shown below.

From here you can use the Browse button specify the target site location, provide a page title and description and specify the page URL (which defaults to the source document file name but changes the extension to .aspx).
You can also choose whether the page is created synchronously (while you wait) or asynchronously via the time job.
If you have an email server configured you can also choose to notify specific users when the page has been created.
If you choose to create the page synchronously then you'll initiate a long running process which puts you in a holding pattern with a spinner.

When processing is complete then the browser is directed to the newly provisioned page, initially as a draft, where it can then be published for general consumption.

How cool is that?

Page updating

Cool as that may be we now have two versions of the information and unless we take care they could easily diverge if the Word document is updated but not republished or if the web page is updated in isolation.
This is not good because we really do not want two versions of the truth. So we now have a choice to make with regard to our modus operandi. If the intent here is simply to import content from a source Word document and from then on declare the web page as the definitive information source then all we now have to do is to delete the source Word document and make any future necessary changes to the web page directly.
On the other hand, our concept of operations might be that we want to retain the Word document and maintain it as the definitive information source and then periodically publish an updated rendition as a web page.
What we must not do is simultaneously make changes to both the Word source document and the web page, for that way madness lies! If we make changes to the web page there is no way of pushing those changes back to the source Word document and so any changes we make will simply be overridden when the source Word document is republished.
This essentially comes down to good governance but it is fundamentally important to have a plan as to which modus operandi you are using.
To use the Word document as the definitive information source we can simply edit the Word document and republish it as before.

This time SharePoint detects that a page already exists at the default target URL and simply asks you if you want to update the existing page.
Clicking the Create button reconstructs the page with the new content.

In my case I added the line "Now with some changes made" to the source Word document.
If you want to create a new page from the updated source document you can easily do so by unchecking the 'Update existing page' checkbox where you can specify new publishing settings.

Of course the one thing that you must change is the target URL or else you will just end up with the message highlighted above.

The stuff they don't tell you

Now you may now be thinking this is great and indeed the concept is, I believe, a brilliant one because in my experience users like to author content in Word and not in a web browser. Please see my article entitled SharePoint Wiki Woes if you are interested in my more detailed views on this matter.
No, the problem is not in the concept but rather the execution. You see the Microsoft service has one major flaw and that is that it fails to process embedded content i.e. images!

See what happens if I update the source Word document so that it now contains an embedded image and the republish as before.

The page politely informs me that the web page has been updated but that it was unable to process my embedded image.
And this is the resulting web page.

Not only does the html show a missing image place holder but it buggers up the rest of the formatting as well. This is not what I had in mind at all!
Sadly this make the capability next to useless unless your source documents are devoid of any graphical content. In my experience that is unlikely to be the case.
So what is this good for, well sadly not a lot really other than to lift the source text (only) from a Word document and dump it in a web page. I might as well have done the whole thing with cut and paste and at least that technique would work in Foundation and not be restricted to a publishing site.
I did wonder why all the demos and screenshots I have seen for this service subtly avoids Word content with images and now I know.
For me this is a huge disappointment in that it seems that Microsoft have taken something potentially really useful and failed at the final hurdle.
Other features I would like to see are:
  • The option to publish Word documents to wiki libraries and not just publishing pages
  • The ability to automatically synchronise content such that when the source Word document is updated the web page rendition is automatically updated. This could be set up to happen either on every change or on major version publication of the source Word document.
  • The ability to publish content outside of the current site collection i.e. the source Word document and the target web page rendition could be in separate site collections.
  • Extending the previous observation, it would be really useful if we could publish the web page renditions to a different farm entirely or how about up to O365.
  • To make the capability available in Foundation (pushing it a bit maybe). 


Ref: http://www.kaboodlekonnect.com/colin/Lists/Posts/Post.aspx?ID=19

No comments:

Post a Comment