Archive for 'Uncategorized'

Morovia.com has switched https site wide

Today we switched the whole domain www.morovia.com to https. In recent years more and more web sites are switching to https (the secure http protocol). You probably won’t notice the difference. However it is remarkable for us. Because we have a modular web site design, there are not a lot of changes in the code. And best of all, we can switch back to normal http at any time.

We got a big server

We  just spend a few days to assemble a server for our company to use in the next 5 years. The old one is a 1U supermicro chasis. The tag on it reads “2005″. yes it has a floppy drive on it.

existing 1U server

We are not happy that the old server does not have vast storage capability, so we bough a Norco 4224 case that can hold 24 hard drives. If we put 2T drives into each bay, we will have a total storage of 48T bytes.

Norco 4224 24-bay case

Plenty of storage for us. We are going to migrate many services to this new server in the near future.

Resumable Download at morovia.com

Recently we implemented resumable download feature at our web site. Through this feature user can pause a download, and resume at any time back. This feature also supports download accelerators.

Is this a big deal? Certainly not. We used to server trial downloads in a public accessible directory and web server automatically has this feature. However, we want to our user to get *versioned* filenames, as it will be easy for our users to tell which version they downloaded. Implementing resumable downloads in a scripting language is not a trivial task.

Resumable download in Firefox

Resumable download in Firefox

Resumable Download in Internet Explorer

Resumable Download in Internet Explorer

Morovia KB 10148 talks about how to use Barcode ActiveX to generate barcode images in PHP 5. Barcode ActiveX 3 has a long history in Morovia, as it was conceived in year 2003 and our first ActiveX control product.

In this article, barcode is generated first in a front end script and the image is saved to a disk file. In the backend script the image is loaded and sent to the browser. Sometimes an in-memory solution is preferred, as programers will not have to deal with file permission, clean up and possible privacy issues.

The classic ASP example, which is included in the installer package, is an in-memory solution. It utilizes ADODB.Stream object. It exports barcode images to ADODB.Stream object, then calls Response.BinaryWrite to write all the bytes out. While it looks simple, the ADODB.Stream object’s Read method does not give back a sequence of bytes as defined in PHP. It is actually a COM SafeArray packaged in VARIANT type.

try {
$objBarcode = new COM("Morovia.BarcodeActiveX");
$objBarcode->RasterImageResolution = 96; //Set the resolution to 96 dpi (screen)
$objBarcode->NarrowBarWidth = 15; //Default NarrowBarWidth 15 mils
$objBarcode->BorderStyle = 0; //No border
$objBarcode->ShowComment = false; //No comment
$objBarcode->ShowHRText = false; //No human readable text
// By default the Barcode ActiveX creates margins around the symbol. The margins are around 100 mils
// You can increase or decrease the margins by setting 4 SymbolMargin properties like
$objBarcode->SymbolMarginTop = 100;

// Retrieve input parameters through the URL query string
$objBarcode->Rotation = $_GET["rotation"];
$objBarcode->ShowHRText = (strlen($_GET["showhrtext"])==0 || $_GET['showhrtext']==0) ? 0 : 1;
$objBarcode->Symbology = $_GET["symbology"];
$objBarcode->NarrowBarWidth = $_GET['narrowbarwidth'];

$objBarcode->BarHeight = $_GET['barheight'];
$objBarcode->Message = $_GET['message'];

$objBarcode->Font->Name = “MRV OCRB I”;
$objBarcode->Font->Bold = false;
$objBarcode->ShowComment = false;

// The Stream object is available in MDAC 2.5 and above versions. You can download the most
// recent MDAC at http://www.microsoft.com/data/mdac/
$objStream = new COM(”ADODB.Stream”);
$objStream->Open();
$objStream->Type = adTypeBinary;

// Export the Image to Stream object.
// After transfer completes, the Position must set to 0 before Calling Response.BinaryWrite
// otherwise an “unknown type” is reported.
$objBarcode->ExportImage($objStream, imgTypePNG);

// BinaryWrite the image data to the client browser, in PHP use echo
$objStream->Position = 0;

// until we figured out how to convert a byte array in COM to PHP,
// we have to save it to a file first, then read it back to PHP.
header(’Content-Type: image/png’);
$buffer = $objStream->Read();

// Note: this does not work!
echo($buffer);

$objStream = null;
$objBarcode = null;
} catch(Exception $ex) {
header('Content-Type: text/plain');
echo($ex->getMessage());
}

It turns out that the solution is pretty simple - although PHP does not provide direct way to convert a COM safe array to string, it provides a way to iterate the content. Thus, we can write:


foreach ($buffer as $byte) echo chr($byte);

In the place of echo($buffer) statement. It worked.

Today I am testing the IDispatch of a COM object. The method has the following fingerprint:

[id(6)] HRESULT DataMatrixEncodeSet(
        [in] BSTR strDateToEncode,
        [in] LONG sizeID,
        [out,retval] LONG* chunks);

I was quite surprise that I got an “type mistach” error from Invoke function. My code is as below:

args[0].vt = VT_BSTR;
args[0].bstrVal = SysAllocString(L"this is data to encodeencodeencodeencodeencode");
args[1].vt = VT_I4;
args[1].lVal = 0;

dp.cArgs  = 2;   
dp.rgvarg = args;   
dp.cNamedArgs = 0;   
dp.rgdispidNamedArgs = NULL;

VARIANT vaRet;
VariantClear(&vaRet);
EXCEPINFO  exInfo;
unsigned int uErr;
hr = disp->Invoke(dispID[0], IID_NULL, LCID_NEUTRAL,
        DISPATCH_METHOD, &dp, &vaRet, &exInfo, &uErr);
BOOST_CHECK(SUCCEEDED(hr));

After some search, I finally found the cause: the arguments packed in DISPPARAMS must be in reverse order. In other words, the args[] array must be reversed. After I changed that, the error disappeared.

Visual Studio 2005 and later versions have code coverage feature built-in – only available in team edition. However a standalone profile is available to download from Microsoft site. Code coverage is a great help for programmer’s confidence on the code. It is not possible to test all possible execution path- however, if a test executes 90% code lines, the chance of bugs becomes quite small.

Our project utilizes a different process, which can’t be managed entirely with GUI. To begin with, we manage our unit tests with boost.test, with each test file built into an executable file. In large projects we have hundreds of test files, and managing them through GUI interface is quite formidable.

Another issue is the frustration with displaying metrics. The interface provided is quite clunky, and you have to navigate thousands of functions and class methods in order to find coverage for the method you are focusing. The color display in the editor has a lot for improvement.

I recently investigated possibilities to collect and analyze code coverage data programmatically. It was a success. I noticed that the documentation on this aspect is quite weak, so I’d like to share some points with readers.

Collecting Data

In order for the code coverage to be collected, the executable must be instrumented. Some web pages I found like this state that link flag –PROFILE is required, while others not mentioning this requirement. In our build system, this link flag triggers the instrumentation. The build system we are using is boost.build, we changed the file msvc.jam slightly to launch instrumentation process once <linkflags>-PROFILE is passed.

Visual Studio documentation calls those related tools “Performance Tools” and they are located at C:\Program Files\Microsoft Visual Studio [Version]\Team Tools\Performance Tools directory. It is recommended to add this path into PATH variable in vcvars.bat so that you gain the access to the command after entering command prompt.

To instrument the executable, run the command:

vsinstr.exe <YOUR_EXE> /COVERAGE

As I stated, this process is added into our build system automatically – if <linkflags>-PROFILE is found, the executable will be instrumented.

Now the second step is to collect data. To collect data you first start vsperfmon process. Because you can run multiple executables in one session, this step is now carried out manually:

start VSPerfMon /COVERAGE /OUTPUT:<REPORT_FILE_NAME>

Here we use start  command to open a separate console, because execution will block the current console.

Now run the tests. Code coverage data will be collected.

After we have done testing, shut down the vsperfmon process:

vsperfcmd –shutdown

After this command is carried out, the coverage data is stored in the file we specified. The coverage file is in a proprietary format with no documentation about its structure. Fortunately, MS allows us to export using Visual Studio or calling code analysis API.

Converting to XML

You can drop the coverage file newly created into Visual Studio. You can not do much with its interface, as you have to navigate many functions to locate the ones you wanted to view. Furthermore, it does not code coverage ratio on source file basis, but rather on method basis.

You can export the XML file from Visual Studio, or use the code I provided below to export programmatically.  Oddly the two give different results – the one exported from Visual Studio is encoded in UTF-16 and with no line endings – I spent a quite a bit time to convert it into a UTF-8 with line endings so that my favorite editor can open it. It looks to me that Visual Studio uses a different method to export, and it might be written in native C++.

The programmatic way that Microsoft wants us to use is through assembly. There are several MSDN blogs on this topic; unfortunately the one I found contain two errors: it failed to point out that symbol path must be set, and the WriteXML call was wrong. I posted my code below:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.VisualStudio.CodeCoverage;

namespace coverdump
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length != 3)
            {
                Console.WriteLine("Usage: coverdump coverage xml symbolpath");
                return;
            }

            String coveragepath = args[2];
            CoverageInfoManager.SymPath = coveragepath;
            CoverageInfoManager.ExePath = coveragepath;

            // Create a coverage info object from the file
            CoverageInfo ci = CoverageInfoManager.CreateInfoFromFile(args[0]);

            // Ask for the DataSet.  The parameter must be null
            CoverageDS data = ci.BuildDataSet(null);

            // Write to XML
            data.WriteXml(args[1]);
        }
    }
}

I had to overcome some issues not mentioned elsewhere. The first issue is references. You must add reference to two files: Microsoft.VisualStudio.Coverage.Analysis.dll and Microsoft.DbgHelp.manifest. Both are located under C:\Program Files\Microsoft Visual Studio [Version]\Common7\IDE\PrivateAssemblies. When I first ran the code, it complained that the assembly does not match the one requested, and I found that the file Microsoft.DbgHelp.manifest does not match dbghelp.dll. I do not know if the initially not match, or just because a subsequent update. Anyway I have to update the version number as well as remove publicKey attribute in order for the program to run properly.

The third argument is the symbol path and executable path. You can specify multiple paths separated by semicolon. If you work with Visual Studio 2010, the interface changed a little bit as the property SymPath and ExePath are now string arrays.

It is a little bit odd to ask for symbol path, considering that the fact that Visual Studio does not ask for it when loading a coverage file. In other words, the coverage file must contain the path to EXE (and PDB file name can be found in EXE for debug build).

Analyzing Data

PHP is my favorite script language, and I use it to analyze the data. As the first step, my goal is display the code coverage ratio for each source file specified at command line, also produce a code diff that our programmers can view subsequently. This is much better than what we did previously – programmers have to navigate thousands of methods before they can locate useful information. From project management perspective, now we are able to view code coverage ratio on source file basis.

The resulted XML file can be very big – or even huge. The first one I got was 200MBytes out of a mere 15M coverage file. If you load all the file into memory the process can take quite a while. In light of this issue, I choose XMLReader class to read the file – XMLReader reads XML file as a stream. I read all <Method> elements as well as all <SourceFiles> elements.

In order for programmers to view the difference, my script creates another file based on line coverage data. If the code is not covered, it writes a blank line into the file. After all lines are written, the script calls diff –u 100 to produce a diff file between the original and the new source file. The option –u 100 tells the diff to display 100 lines context, which basically produces a diff file containing all original source lines.

In the future we might expand the script to do more reporting. XSLT is an interesting option here.

One annoyance aspect as a programmer is having to to know the sublets of the language. Many features are great on paper, however, you run into trouble immediately if you do not understand the possible issue associated.

Many C++ programmers are now familiar with keyboard const. They want to apply the keyword anywhere if they see fit. They argue that const keyword adds a lot more constraints that help discover bugs by compilers, and itself also help compiler to optimize the code for better performance.

Let’s start with a simple example:
class IRule
{
public:
virtual const char* getName() const = 0;
};

The above is an abstract class, which defines the methods that all derived classes must implement. The method returns a const pointer, which means that the caller can’t change the content that the pointer points to. The const at the end indicates that the implement should not change the object state, a.k.a. const this pointer.

All went well and you wrote a dozen derived classes that inherits IRule. A couple of months later you have to implement another one, and this time, the situation is a little bit different. You have to obtain the name from a remote server, and for performance reason, you can only do so at the first time it called. You write this:
class myRemoteRule : public IRule
{
public:
myRemoteRule(const char* serverA);
virtual const char* Name(void) const;
private:
std::string strName;
std::string serverAddr;
};


const char* myRemoteRule::Name(void) const
{
if (strName.empty()) {
strName = getNameFromRemote(serverAddr);
}
return strName.c_str();
}

Cool. How the above code does not compile. The compiler complains that you can’t declare the method as const as the data member strName, is modified in it. Now bad. Either you have to find a workaround, or just go change all declaration of this method, which can take a while. Luckily, C++ allows us to cast the const off using const_cast, so the following code compiles:
const char* myRemoteRule::Name(void) const
{
if (strName.empty()) {
const_cast<myRemoteRule*>(this)->strName = getNameFromRemote(serverAddr);
}
return strName.c_str();
}

Let’s look at the real case – the example above is just too trivial. In reality, a class may have several dozen methods and data members. It is difficult to predict whether we may run into such situation when we wrote the interface prototype. We may well end up with writing a lot of const_cast in the place, further glutting the code.

The morale of the story is that in real world programming is not easy. Simply applying const on a seemingly const construct may be an over constraint and hurts the coding.

Morovia launches PDF products

View from Morovia Office

Our first PDF product, PDFLeo was launched today. See http://rockpdf.com for more information.  PDFLeo supports -

  • Encryption. Encrypt a PDF document with either password security or public key security. Remove PDF encryption if you are able to open the file. Retain PDF encryption and permission settings in the new PDF with other part of document modified.
  • Linearization. Optimize PDF documents to be viewed over slow connection from a capable web server.
  • Size Optimization. Reduce the file size by removing redundant contents, compressing streams and moving objects to streams.
  • Query document information, such as meta data, security, document permission and font information.
  • Insert and modify predefined or custom document information entries.
  • Insert, view and modify XMP metadata.

bjam parallel build

On a high profile computer with plenty of RAM, you can use -j option to run several shell commands at the same time. For example

bjam -j4 release

will run four shell commands simultaneously. CPU usage can easily hit 100% but the overall time can be greatly reduced.

On my computer (3G RAM), -j 6 cut the overall compile time to half on a large project.

ssize_t type problem

Type size_t is well know to C++ programmers. It is an alias for whichever unsigned integer type capable of representing the size of the largest possible object in the target environment. Depending on the target, size_t might be unsigned, unsigned long, or unsigned long long.

There is another type frequently used by C programmers - ssize_t. From its name, it represents a signed size_t. Unfortunately, it is not part of C standard. C standard library provides another types, ptrdiff_t for this purpose - although on the first glance, the two looks quite different.

ssize_t is not defined in VC++ (at least in VC 8.0). It is available in GCC.  If a library is written using this type, it will compile OK under gcc, but not under msc. For this reason, many library has the following code (or similar):

#if !defined(ssize_t)
#define ssize_t long
#endif

Now comes the problem - if you have two such libraries defining ssize_t, you may run into problem when you include header files from both libraries. Worse, some libraries use typedef to define this type, and two typedefs (or one define) conflict each other.

The suggestion here is not to use this ssize_t as it is non-standard. Use ptrdiff_t instead.