Visual Studio 2005 and later versions have code coverage feature built-in – only available in team edition. However a standalone profile is available to download from Microsoft site. Code coverage is a great help for programmer’s confidence on the code. It is not possible to test all possible execution path- however, if a test executes 90% code lines, the chance of bugs becomes quite small.

Our project utilizes a different process, which can’t be managed entirely with GUI. To begin with, we manage our unit tests with boost.test, with each test file built into an executable file. In large projects we have hundreds of test files, and managing them through GUI interface is quite formidable.

Another issue is the frustration with displaying metrics. The interface provided is quite clunky, and you have to navigate thousands of functions and class methods in order to find coverage for the method you are focusing. The color display in the editor has a lot for improvement.

I recently investigated possibilities to collect and analyze code coverage data programmatically. It was a success. I noticed that the documentation on this aspect is quite weak, so I’d like to share some points with readers.

Collecting Data

In order for the code coverage to be collected, the executable must be instrumented. Some web pages I found like this state that link flag –PROFILE is required, while others not mentioning this requirement. In our build system, this link flag triggers the instrumentation. The build system we are using is boost.build, we changed the file msvc.jam slightly to launch instrumentation process once <linkflags>-PROFILE is passed.

Visual Studio documentation calls those related tools “Performance Tools” and they are located at C:\Program Files\Microsoft Visual Studio [Version]\Team Tools\Performance Tools directory. It is recommended to add this path into PATH variable in vcvars.bat so that you gain the access to the command after entering command prompt.

To instrument the executable, run the command:

vsinstr.exe <YOUR_EXE> /COVERAGE

As I stated, this process is added into our build system automatically – if <linkflags>-PROFILE is found, the executable will be instrumented.

Now the second step is to collect data. To collect data you first start vsperfmon process. Because you can run multiple executables in one session, this step is now carried out manually:

start VSPerfMon /COVERAGE /OUTPUT:<REPORT_FILE_NAME>

Here we use start  command to open a separate console, because execution will block the current console.

Now run the tests. Code coverage data will be collected.

After we have done testing, shut down the vsperfmon process:

vsperfcmd –shutdown

After this command is carried out, the coverage data is stored in the file we specified. The coverage file is in a proprietary format with no documentation about its structure. Fortunately, MS allows us to export using Visual Studio or calling code analysis API.

Converting to XML

You can drop the coverage file newly created into Visual Studio. You can not do much with its interface, as you have to navigate many functions to locate the ones you wanted to view. Furthermore, it does not code coverage ratio on source file basis, but rather on method basis.

You can export the XML file from Visual Studio, or use the code I provided below to export programmatically.  Oddly the two give different results – the one exported from Visual Studio is encoded in UTF-16 and with no line endings – I spent a quite a bit time to convert it into a UTF-8 with line endings so that my favorite editor can open it. It looks to me that Visual Studio uses a different method to export, and it might be written in native C++.

The programmatic way that Microsoft wants us to use is through assembly. There are several MSDN blogs on this topic; unfortunately the one I found contain two errors: it failed to point out that symbol path must be set, and the WriteXML call was wrong. I posted my code below:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.VisualStudio.CodeCoverage;

namespace coverdump
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length != 3)
            {
                Console.WriteLine("Usage: coverdump coverage xml symbolpath");
                return;
            }

            String coveragepath = args[2];
            CoverageInfoManager.SymPath = coveragepath;
            CoverageInfoManager.ExePath = coveragepath;

            // Create a coverage info object from the file
            CoverageInfo ci = CoverageInfoManager.CreateInfoFromFile(args[0]);

            // Ask for the DataSet.  The parameter must be null
            CoverageDS data = ci.BuildDataSet(null);

            // Write to XML
            data.WriteXml(args[1]);
        }
    }
}

I had to overcome some issues not mentioned elsewhere. The first issue is references. You must add reference to two files: Microsoft.VisualStudio.Coverage.Analysis.dll and Microsoft.DbgHelp.manifest. Both are located under C:\Program Files\Microsoft Visual Studio [Version]\Common7\IDE\PrivateAssemblies. When I first ran the code, it complained that the assembly does not match the one requested, and I found that the file Microsoft.DbgHelp.manifest does not match dbghelp.dll. I do not know if the initially not match, or just because a subsequent update. Anyway I have to update the version number as well as remove publicKey attribute in order for the program to run properly.

The third argument is the symbol path and executable path. You can specify multiple paths separated by semicolon. If you work with Visual Studio 2010, the interface changed a little bit as the property SymPath and ExePath are now string arrays.

It is a little bit odd to ask for symbol path, considering that the fact that Visual Studio does not ask for it when loading a coverage file. In other words, the coverage file must contain the path to EXE (and PDB file name can be found in EXE for debug build).

Analyzing Data

PHP is my favorite script language, and I use it to analyze the data. As the first step, my goal is display the code coverage ratio for each source file specified at command line, also produce a code diff that our programmers can view subsequently. This is much better than what we did previously – programmers have to navigate thousands of methods before they can locate useful information. From project management perspective, now we are able to view code coverage ratio on source file basis.

The resulted XML file can be very big – or even huge. The first one I got was 200MBytes out of a mere 15M coverage file. If you load all the file into memory the process can take quite a while. In light of this issue, I choose XMLReader class to read the file – XMLReader reads XML file as a stream. I read all <Method> elements as well as all <SourceFiles> elements.

In order for programmers to view the difference, my script creates another file based on line coverage data. If the code is not covered, it writes a blank line into the file. After all lines are written, the script calls diff –u 100 to produce a diff file between the original and the new source file. The option –u 100 tells the diff to display 100 lines context, which basically produces a diff file containing all original source lines.

In the future we might expand the script to do more reporting. XSLT is an interesting option here.

One annoyance aspect as a programmer is having to to know the sublets of the language. Many features are great on paper, however, you run into trouble immediately if you do not understand the possible issue associated.

Many C++ programmers are now familiar with keyboard const. They want to apply the keyword anywhere if they see fit. They argue that const keyword adds a lot more constraints that help discover bugs by compilers, and itself also help compiler to optimize the code for better performance.

Let’s start with a simple example:
class IRule
{
public:
virtual const char* getName() const = 0;
};

The above is an abstract class, which defines the methods that all derived classes must implement. The method returns a const pointer, which means that the caller can’t change the content that the pointer points to. The const at the end indicates that the implement should not change the object state, a.k.a. const this pointer.

All went well and you wrote a dozen derived classes that inherits IRule. A couple of months later you have to implement another one, and this time, the situation is a little bit different. You have to obtain the name from a remote server, and for performance reason, you can only do so at the first time it called. You write this:
class myRemoteRule : public IRule
{
public:
myRemoteRule(const char* serverA);
virtual const char* Name(void) const;
private:
std::string strName;
std::string serverAddr;
};


const char* myRemoteRule::Name(void) const
{
if (strName.empty()) {
strName = getNameFromRemote(serverAddr);
}
return strName.c_str();
}

Cool. How the above code does not compile. The compiler complains that you can’t declare the method as const as the data member strName, is modified in it. Now bad. Either you have to find a workaround, or just go change all declaration of this method, which can take a while. Luckily, C++ allows us to cast the const off using const_cast, so the following code compiles:
const char* myRemoteRule::Name(void) const
{
if (strName.empty()) {
const_cast<myRemoteRule*>(this)->strName = getNameFromRemote(serverAddr);
}
return strName.c_str();
}

Let’s look at the real case – the example above is just too trivial. In reality, a class may have several dozen methods and data members. It is difficult to predict whether we may run into such situation when we wrote the interface prototype. We may well end up with writing a lot of const_cast in the place, further glutting the code.

The morale of the story is that in real world programming is not easy. Simply applying const on a seemingly const construct may be an over constraint and hurts the coding.

I recently bought a Nook. Originally I did not expect much, and consider only to read ebooks in .epub format. To my surprise, it also reads PDF. It displays PDF in a different way than what people imagine - it retrieves the text and graph, and reflows the paragraphs. I am impressed. It has some flaws, especially in dealing with tables,  and program code, but good enough for me to read PDFs with heavy text.

After I loaded several PDF files into Nook, I found that some books are shown “Unknown Author” and strange titles in “My Documents” directory, which lists all ebooks. The title and author information are taken from the PDF metadata, and obviously the software some publishers used did not set them properly.

Nook Unknown Author

Nook Document List

For example, the book “the Facebook Era” written by Clara Shih, shows 013715318X.pdf as title and Unknown Author as Author.

With the help of PDFLeo, released by rockpdf.com, I am able to determine the cause: incorrect metadata. The following is the output:

J:\My Documents>pdfleo -i facebook.pdf
Morovia (R) pdfleo Server 32-bit Version 1.0 (build 4162)
File: facebook.pdf
Title: 013715318X.pdf
Author:
Subject:
Keywords:
Created: 03/24/2009 03:49:24 PM
Modified: 02/02/2010 08:20:45 PM
Application: Acrobat: pictwpstops filter 1.0
PDF Producer: PDFKit.NET 2.0.28.0
PDF Version: 1.6 (Acrobat 7.x)
Number of Pages: 254
Tagged PDF: No
Linearized: No
Page Size: 6.01x9.25 in
....

The software used the book’s ISBN as title, and set empty string to Author (If the field does not exist in the file, PDFLeol displays N/A). It is more obvious when you look inside the file:Incorrect MetaData Fields

Metadata fields can also be viewed in Adobe Reader:

After pinpointing the problem, now it comes to fix. We need to set metadata to the PDF file so that the title and author display correctly in Nook (or Kindle, SONY reader etc.). Fortunately, PDFLeo is capable of modifying metdata with –info-dict switch.


J:\My Documents>pdfleo --info-dict="Title=The Facebook Era;Author=Clara Shih" facebook.pdf facebook.new.pdf

Now the new file has the metadata corrected, which can be verified by loading the file in Adobe Reader:

correct-metadata-pdf1

After overwriting the existing file, Nook shows the title and author correctly.

Although I can fix one by one, I certainly hope that publishers can get the things correct at the first step.  Here is my call - publishers, check your metadata before shipping the PDF version! You can use the PDFLeo tool for that purpose.

Morovia launches PDF products

View from Morovia Office

Our first PDF product, PDFLeo was launched today. See http://rockpdf.com for more information.  PDFLeo supports -

  • Encryption. Encrypt a PDF document with either password security or public key security. Remove PDF encryption if you are able to open the file. Retain PDF encryption and permission settings in the new PDF with other part of document modified.
  • Linearization. Optimize PDF documents to be viewed over slow connection from a capable web server.
  • Size Optimization. Reduce the file size by removing redundant contents, compressing streams and moving objects to streams.
  • Query document information, such as meta data, security, document permission and font information.
  • Insert and modify predefined or custom document information entries.
  • Insert, view and modify XMP metadata.

bjam parallel build

On a high profile computer with plenty of RAM, you can use -j option to run several shell commands at the same time. For example

bjam -j4 release

will run four shell commands simultaneously. CPU usage can easily hit 100% but the overall time can be greatly reduced.

On my computer (3G RAM), -j 6 cut the overall compile time to half on a large project.

ssize_t type problem

Type size_t is well know to C++ programmers. It is an alias for whichever unsigned integer type capable of representing the size of the largest possible object in the target environment. Depending on the target, size_t might be unsigned, unsigned long, or unsigned long long.

There is another type frequently used by C programmers - ssize_t. From its name, it represents a signed size_t. Unfortunately, it is not part of C standard. C standard library provides another types, ptrdiff_t for this purpose - although on the first glance, the two looks quite different.

ssize_t is not defined in VC++ (at least in VC 8.0). It is available in GCC.  If a library is written using this type, it will compile OK under gcc, but not under msc. For this reason, many library has the following code (or similar):

#if !defined(ssize_t)
#define ssize_t long
#endif

Now comes the problem - if you have two such libraries defining ssize_t, you may run into problem when you include header files from both libraries. Worse, some libraries use typedef to define this type, and two typedefs (or one define) conflict each other.

The suggestion here is not to use this ssize_t as it is non-standard. Use ptrdiff_t instead.

Adding Barcodes to Open Office Writer

Open Office is an open source package that competes directly with Microsoft Office. Yes it not as powerful as MS Office, but in many occasions it is enough. And do not forget that it is free.

Adding Barcodes to Open Office is easy with Monterey Barcode Creator. Just drag and drop the barcode from Barcode Creator to the Writer.

Barcode in OO Writer

Barcode in OO Writer

The OO Writer seems to rasterize the object under screen resolution. As a result it looks not clear as the one in Microsoft Word. But this is just a cosmetic difference. When you print the document to the printer, or export to PDF, the barcode quality is as good as you get from Word.

Attachment: PDF document exported from OO Writer

New Knowledge Base at morovia.com

See link:

http://mdn.morovia.com/kb/

More KB articles are coming…

Morovia.com has introduced a new backend for site search. The search results are more relevant, and up to date.

Try the new search page here:

http://mdn.morovia.com/search/search.php

Generating UPC Check Digits in Bulk

Morovia offers a check digit calculator online at http://www.morovia.com/education/utility/upc-ean.asp. With this utility, you can calculate check digit by entering your numbers - no matter it is a UPC, EAN, SSCC, GTIN or BLN.  It is a wonderful utility and has been running for many years since Morovia started.

Now, Morovia added another online utility to allow you to calculate check digits in bulk. This is quite useful when you have a range of number. The typical case is the UPC-A numbers you own. For example, you could have a GS1 prefix 123456789, which leaves you 2 digits for your item number. And you want a list of final numbers with check digits.  Repeating 100 times is tedious work on a web page. Now no more.

Go to http://www.morovia.com/bulk-check-digit-calculation/index.php and enter 123456789??. Note that the two question marks repsent any digits in the last two positions.  Click on Submit and the web page send out the answer -

123456789005
123456789012
123456789029
123456789036
123456789043
123456789050
123456789067
123456789074
123456789081
123456789098
123456789104
123456789111
123456789128
123456789135
123456789142
123456789159
123456789166
123456789173
123456789180
123456789197
123456789203
123456789210
123456789227
123456789234
123456789241
123456789258
123456789265
123456789272
123456789289
123456789296
123456789302
123456789319
123456789326
123456789333
123456789340
123456789357
123456789364
123456789371
123456789388
123456789395
123456789401
123456789418
123456789425
123456789432
123456789449
123456789456
123456789463
123456789470
123456789487
123456789494
123456789500
123456789517
123456789524
123456789531
123456789548
123456789555
123456789562
123456789579
123456789586
123456789593
123456789609
123456789616
123456789623
123456789630
123456789647
123456789654
123456789661
123456789678
123456789685
123456789692
123456789708
123456789715
123456789722
123456789739
123456789746
123456789753
123456789760
123456789777
123456789784
123456789791
123456789807
123456789814
123456789821
123456789838
123456789845
123456789852
123456789869
123456789876
123456789883
123456789890
123456789906
123456789913
123456789920
123456789937
123456789944
123456789951
123456789968
123456789975
123456789982
123456789999

All 100 numbers are listed in order. Quite neat.