Wednesday, October 27, 2010

Important Wcf performance issue + workaround

@YaronNaveh

I have written about Wcf performance issues before, but this one seems to be the biggest. Valery had published in the Wcf forum an interesting performance issue. In short, a WCF client tries to consume a non-WCF service where the contract looks something like this:

class Foo
{
   
byte[] picture;
}


In soap, byte arrays are encoded as base64 strings so it can look like this:

<picture>/9j/4AAQSkZJReV6R8MLi7nW6UUUViWf/Z.....</picture>

or with line breaks after each 73 characters, like this:

<picture>/9j/4AAQSkZJReV6R8MLi7nW61+58zBz5Q+7Xpdj
/PK/4AAQSkPOIeV6R8MLi7nW61+58zBz5Q+7Xpdj
/9R/4AAQSkZJReV6R8MLi7nW6VZ788zBz5Q+7Xpdj
4U4wVoqwUUUViWf/Z</picture>

both options are valid according to the base64 RFC:

Implementations MUST NOT add line feeds to base-encoded data unless
the specification referring to this document explicitly directs base
encoders to add line feeds after a specific number of characters.

Ok so it does not really advocate this... But it is a fact that many soap stacks still use this MIME-originated format and also Wcf supports it.

So what is the problem?
It seems that when Wcf gets a message which contains base64 with CRLF, the processing is slower in a few seconds(!). A drill down shows that the problem is in the DataContract serializer. Take a look at this program:

[DataContract]
public class Foo
{
   
[DataMember]
   
public byte[] picture;
}

class Program
{
       
static void Main(string[] args)
       
{
           
var t1 = getTime(@"C:\temp\base64_with_line_breaks.txt");
           
var t2 = getTime(@"C:\temp\base64_without_line_breaks.txt");            
          
           
Console.WriteLine("Time with breaks: " + t1);
           
Console.WriteLine("Time with no breaks: " + t2);

           
Console.ReadKey();
       
}

       
static double getTime(string path)
       
{
           
var ser = new DataContractSerializer(typeof (Foo));
           
var stream = new FileStream(path, FileMode.Open);
           
var start = DateTime.Now;

           
for (int i = 0; i < 40; i++)
           
{
               
ser.ReadObject(stream);                
               
stream.Position = 0;
           
}

           
var end = DateTime.Now;
           
var t = end - start;
           
return t.TotalSeconds;
       
}
}

For those of you who are interested to test this, the files are here and here.

The output is:

Time with breaks: 10.8998196 seconds
Time with no breaks: 0.0029994 seconds

This clearly reveals a performance problem.

Why does this happen?

While debugging the .Net source code, I have found this in the XmlBaseReader class (code comments were in the source - they are not mine):


int ReadBytes(...)
{
  try
 {
   ...
 }

 catch (FormatException exception)
  
{
      
// Something was wrong with the format, see if we can strip the spaces

      int i = 0;
      
int j = 0;
      
while (true)
      
{
          
while (j < charCount && XmlConverter.IsWhitespace(chars[j]))
              
j++;
          
if (j == charCount)
               
break;
          
chars[i++] = chars[j++];
      
}
...
}
}

So the data contract serializer tries to read the base64 string, but for some reason succeeds only if the string does not have white spaces inside it (we can further debug to see how that happens but it is exhausting for one post :). The serializer then removes all the white spaces (which requires copying the buffer again) and tries again. This is definitely a performance issue.

Notes:

  • This happens with both .Net framework 3.5 and 4.0.

  • This is a DataContract specific issue - it does not happen when you use other .Net mechanisms such as Convert.FromBase64String

    I have reported this in Microsoft connect, you are welcome to vote this issue up.

    Workarounds

    There a few workarounds. The trade-offs are generally convenience of API (or "where you prefer to put the 'dirty' stuff").

    1. As Valery noticed, you can change the contract to use String instead of byte[]. Then Convert.FromBase64String will give you the byte array.

    2. Change your contracts to use the XmlSerializer instead of DataContract serializer. The former does not experience this issue. The XmlSerializer is generally slower (when base64 does not appear that it) so this is what you loose. You get a better API here as clients do not need to manipulate the base64 string.

    3. The best of course is to change the service implementation to return base64 without line breaks. Also if large binaries are returned anyway it may be a better idea to employ MTOM.

    4. A Wcf custom encoder can strip the spaces from the message before it is deserialized. However this also involves copy of strings and this is beneficial only in rare cases.
  • @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Friday, October 22, 2010

    All your prefix are belong to us now

    @YaronNaveh

    UPDATE: seems like I'm late to the party in a few years...

    This one really surprised me. I have built a utility which saves WSDL files locally. Now a WSDL can reside in exotic locations such as "http://server/service/?q=wsdl" or "http://service/...[900 chars]..." which do not imply a natural candidate for the local file name (the latter also violated the 256 characters path limit). But I wouldn't want the utility to use opaque names such as "1.wsdl" either. So I use this code snippet to generate a meaningful name from a url, which is cool. Usually. The other day a user complained that the utility does not work with his wsdl which resides here:

    http://server.com/con.service?wsdl

    Seems pretty straight forward. Until you do this:

    1. open notepad.exe
    2. try to save as "con.txt"

    this yields the below:

    con.txt
    This file name is reserved for use by Windows.
    Choose another name and try again.

    And if you try to rename a file directly from explorer you face:

    The specified device name is invalid.

    But it all makes sense when you read the rules:

    Do not use the following reserved device names for the name of a file:

    CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9

    Applications that automatically save files from url's must take this into account.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Saturday, October 9, 2010

    And the winners are...

    @YaronNaveh

    Two weeks ago I announced the Msdn ultimate giveaway contest. A lot of people published great comments to the post which made it a really hard decision for me. I finally had to decide and I have based my decision on the following criteria: Relevancy to web services interoperability, amount of implementation details, and presence of rationale behind the details.

    It was a tough decision and a very close call. I learned a lot from your responses and thank you all for participating.

    The winners are:

  • Ladislav Mrnka
  • tatman
  • Ken Egozi

    Please contact me to get your award.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!
  • Friday, October 8, 2010

    Wcf self hosting - also in VS 2010

    @YaronNaveh

    A while ago I published a Wcf self hosting template project for visual studio 2008.
    The same template also works for VS 2010: just put this under

    "%My Documents%\Visual Studio 2008\Templates\ProjectTemplates\Visual C#\WCF"

    and the new template is under the C# / Wcf node:



    Enjoy :)

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Tuesday, October 5, 2010

    Invalid SKI in X.509 certificate

    @YaronNaveh

    This one is rare but may happen when you get a malformed x.509 certificate.
    Upon sending a request your WCF proxy / server will throw this:

    Error: The length of this argument must be greater than 0.
    Parameter name: identificationData

    The reason is that the SubjectKeyIdentifier extension in the X.509 certificate is invalid. It is not empty (which is ok) and does not contain a legal value either. It is just malformed.

    The only solution here is to use a different valid certificate.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Sunday, October 3, 2010

    "Reverse engineering" X.509 certificates

    @YaronNaveh

    Well, the title makes it sound bigger than it really is.
    Sometimes a soap request or response contains an X.509 certificate encoded in base64 string:

    <o:BinarySecurityToken wsu:Id=”uuid-a687c39f-f848-481b-8552-35de5b5a4d51-2”>  
    MQ+PASL89QWEQW2367ASDDASjn7812ASDDAS781mFSDJK78…
    </o:BinarySecurityToken> 

    It may be useful to create the actual certificate that this encoded string represents, usually for the purpose of debugging.
    This code snippet will do the trick:

    byte[] b = Convert.FromBase64String(@"MQ+PASL89QWEQW2367ASDDASjn7812ASDDAS781mFSDJK78...");
    File.WriteAllBytes(@"c:\server.cer", b);

    now the certificate is ready in the designated path:

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Friday, October 1, 2010

    Wcf: Cannot find a token authenticator

    @YaronNaveh

    The below is a common error with Wcf clients in security interoperability scenarios:

    Cannot find a token authenticator for the 'System.IdentityModel.Tokens.X509SecurityToken' token type. Tokens of that type cannot be accepted according to current security settings.

    What does it mean?

    When a signed response comes back from the server it has two ways to reference the signing certificate.

    Option A (key identifier):


    <o:BinarySecurityToken wsu:Id=”uuid-a687c39f-f848-481b-8552-35de5b5a4d51-2”>  
    MQ+PASL89QWEQW2367ASDDASjn7812ASDDAS781mFSDJK78…
    </o:BinarySecurityToken>  
     
    <Signature xmlns="http://www.w3.org/2000/09/xmldsig#">  
       
    <SignedInfo>  
         
    <CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"></CanonicalizationMethod>  
         
    <SignatureMethod Algorithm="...rsa-sha1"></SignatureMethod>  
           …  
       
    </SignedInfo>  
       
    <SignatureValue>tWTQzQhKg3zJb75P4sUfMPa3...</SignatureValue>  
       
    <KeyInfo>  
         
    <o:SecurityTokenReference>  
           
    <o:KeyIdentifier ValueType="...#X509SubjectKeyIdentifier" EncodingType="...#Base64Binary">gBfL0123lM6cUV5YA4=</wsse:KeyIdentifier> 
         
    </o:SecurityTokenReference>  
       
    </KeyInfo>  
    </Signature> 

    Option B (direct reference):

    <o:BinarySecurityToken wsu:Id=”uuid-a687c39f-f848-481b-8552-35de5b5a4d51-2”>  
    MQ+PASL89QWEQW2367ASDDASjn7812ASDDAS781mFSDJK78…
    </o:BinarySecurityToken>  
     
    <Signature xmlns="http://www.w3.org/2000/09/xmldsig#">  
       
    <SignedInfo>  
         
    <CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"></CanonicalizationMethod>  
         
    <SignatureMethod Algorithm="...rsa-sha1"></SignatureMethod>  
           …  
       
    </SignedInfo>  
       
    <SignatureValue>tWTQzQhKg3zJb75P4sUfMPa3...</SignatureValue>  
       
    <KeyInfo>  
         
    <o:SecurityTokenReference>  
           
    <o:Reference ValueType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-x509-token-profile-1.0#X509v3" URI="#uuid-a687c39f-f848-481b-8552-35de5b5a4d51-2"></o:Reference>  
         
    </o:SecurityTokenReference>  
       
    </KeyInfo>  
    </Signature>

    The above error means that the response has key identifier but the client is configured to require a direct reference.

    How to fix it?

    On your client, configure allowSerializedSigningTokenOnReply to true:

    <customBinding> 
     
    <binding> 
        ...
       
    <security allowSerializedSigningTokenOnReply="true" /> 
        ...
      </
    binding> 
    <customBinding>

    An alternative can be to build a custom message encoder which changes the response from option B to A. This is possible since we know what is the certificate (using the reference) so we can create the binary token. Of course this alternative is much harder and in the general case the former should be preferred.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!