Welcome to Atalasoft Community Sign in | Join | Help

Today, we launched 31 Apps in 31 days, where the engineers at Atalasoft are going to deliver one application made with DotImage each day in May (including weekends).

Steve is first up with a Motivational Poster generator. At Atalasoft, if you break the build, you own the build chicken until you fix it. The transfer of the chicken is announced with a blood-curdling chicken scream.


 

I feel motivated already (mmmmm..... Dropped caps).

I previously showed how easy it is to generate Code 39 Barcodes using DotImage, now I want to show you how to do it in pure JavaScript.

I wanted to make it easy to deploy so I don't use images. Luckily, 1D barcodes are easy to make with just colored table cells, so I just generate a one row table with a column for each bar.  I set the width of the column to be the width of the bar and then color it either black or white.

Here's a page where you generate barcodes with just JavaScript. The code is attached to this entry.

To use it, just add Barcode.js to your site, include it in a script tag, then use the AtalasoftBarcode39 object to draw the barcode (there is a file called BarcodeTest.html in the zip file to show an example).  Here's some sample code

<script src="Barcode.js" type="text/javascript"></script>
<script type="text/javascript">
function writeBC(s)
{
    var bc = new AtalasoftBarcode39(s);
    document.getElementById('bcArea').innerHTML =
        bc.getBarcode(50, 2, 8);
}

writeBC("ABC123")
</script>

<div id="bcArea"></div> 

The three arguments to getBarcode are the height and the widths of the narrow bars, and the width of the wide bars.

 

3 Comments

Attachment(s): Barcode.zip

I just noticed this blog entry from picturetom pointing to an interesting study on trends in digital photography available for purchase from InfoTrends.

One of our customers, David Cardinal from ProShooters, fits this bill:

‘Today´s photographers are technologically savvy, as a significant majority is using image editing software and many are using RAW conversion and color management software’, commented Ed Lee, Director at InfoTrends.

‘It´s particularly interesting to note that 83% of professional photographers are using the Web as part of their business and about 30% use an online photo service provider. This suggests that a variety of Web services providers could see future growth opportunities.’

David is a photographer who can code and about a year ago wrote a review on choosing an imaging toolkit (ours) for Dr. Dobb's.

At Pro Shooters (www.proshooters.com), our primary product is DigitalPro for Windows, an image-management system for digital photographers. We sell to serious photographers who have large image libraries and often need to process hundreds or even thousands of images at a time. As a result, we serve a niche market and can only afford a small programming team. That means we need to be more productive than the competition to stay ahead. So, instead of trying to do everything ourselves, we rely on tools such as third-party libraries in areas where we can find excellent tools that don't compromise on quality and features.

David also pushes us to continually expand our RAW support (mostly supporting more and more cameras) -- DotImage 6.0 and the upcoming 6.0b both have support for more cameras than previous releases.

1 Comments
Filed under:

OCR Engines are very good at reading text from clean machine print documents. If you have older scans or if the documents are not meant to be easily read by a machine, there are still some things you can do to improve your accuracy.

This report from the GPO on Optimizing OCR Accuracy made some interesting findings, but also some errors which I will try to explain.

The report cites that thresholding the documents didn't improve accuracy and shows this result:

The issue is not that thresholding wouldn't help. In fact, since most OCR engines can only work on thresholded documents, they will do it for you if you do not. They are right to point out that the scans should be done at full color -- but that's because you then get a chance to apply the thresholding yourself (instead of letting the scanner do it). If you use a good thresholding algorithm, you can do quite a bit better.

Using DynamicThresholdCommand from DotImage Document Imaging and SpeckRemovalCommand from Advanced Document Cleanup with default parameters, I got this result:

I don't have the original, so I cannot check OCR accuracy, but I bet I will get a better result than they found using the default threshold in Photoshop. In any case, a threshold must be done before OCR, so either you do it under your control or the OCR engine will do it for you.

Another problem they found is with downsampling. They had scans at high DPI, but the OCR vendor recommendation was for 300 DPI so they downsampled. I am sure that the OCR vendor meant at least 300 DPI, and they did not have to do this. It is sure that you will reduce OCR accuracy with downsampling as you have to lose information in downsampling. Even if you do apply it, you must make sure to choose a good algorithm -- there are benefits to downsampling (increased speed), but if accuracy is the main concern, then you should not do it.

The use of image processing before OCR can increase accuracy, but you must use the proper algorithm. A limiting factor of their tests is that applied their pre-preprocessing steps manually with Photoshop and therefore could not try a lot of different options. By using an image processing toolkit, you can easily run a lot of tests in batch. You are essentially solving an optimization problem, so applying a hill-climbing or genetic algorithm would help decide the best processing choice for your collection of documents.

(Full Disclosure: I was a tech reviewer for this book and received a free copy)

ADO.NET 3.5 Cookbook

I read the ADO.NET 3.5 Cookbook last November coincidentally while I was writing automatic image fetching from SQL Server and OleDB databases into DotImage 6.0 (DbImageSoruce).

I've been using the various incarnations of Microsoft data access technologies for quite some time and have been using ADO.NET for a few years, so I wondered whether I was going to learn anything new from this book. It covers all of the territory to get started (connection strings, basic usage of ADO.NET classes, etc.), but what I really appreciated was that it topics that advanced ADO.NET users would find useful and I certainly learned a few new tricks.

The topic on writing provider and database independent code (Section 10.22) which covers how to do it right if you are targeting .NET 1.1 (which we do) was particularly useful to me. Chapter 10 (Optimizing .NET Data Access) is just generally a good chapter no matter what your level and covers asynchronous SQL calls (executing and cancelling), ASP.NET data caching, paging queries, SQL Server stored procedure debugging and more.

The other thing I liked was the general best practices advice that was sprinkled into the recipes in appropriate places. If you are new to writing DB code, read the "Storing Connection Strings" section (1.1) carefully so that you do it correctly. Bill not only explains how, but why. And since this is the first recipe, it sets the tone for the rest of the book as being practical ("here's the code"), but also gives you the background to understand it.

Since my job was to actually run every code snippet, I can vouch for their quality. Most are built off the AdventureWorks sample database that comes with SQL Server Express, so they are ready to run. The rest come with full DDL to create what you need (databases, stored procedures, etc), and the code and SQL is available online so you don't have to type it in.

And, of course, since it's updated for ADO.NET 3.5, it includes information on LINQ and SQL Server 2008. 

If you are in the Springfield, Massachusetts area, you might want to check out the RTC. Today, I attended an event that they put together, called Breaking Through: IT-Enabled Business Opportunities. The next one builds on some of the ideas in it, and is on Social Networks (May 2nd).

Paul Gillin gave the keynote about the new influencers (bloggers, online-communities and social networks). He offered some examples as inspiration for trying to use these phenomenons as an alternative to mass-media and traditional advertising based campaigns (Will it Blend). He also gave some examples of what the new influencers can do if you get things wrong (AOL Hell), and what you can do if it happens to you (respond quickly).

The panel brought together some IT executives to talk about key IT trends. On the ECM Imaging front, many of the participants mentioned Document Imaging as a key initiative. Drivers for that included increasing productivity, cutting costs (and paper), collaboration and a key issue was searchability of the resulting repository.

Next month, Atalasoft is going to deliver 31 Apps in 31 Days (one for each day in May). We are inviting anyone outside of Atalasoft that wants to participate to submit applications. We're looking for small, useful desktop applications that use DotImage and that you are willing to distribute for free (you can have an upgrade to a pay version).

Applications submitted by non-Atalasoft employees are eligible for a prize. If you want to join in, go the to the 31 Apps sign up page and we'll tell you how to submit your entry.

So, Rick, Jacob, Adam and Elaine have all given their impressions. I guess I should chime in.

The thing that struck me about Code Camp is that with a couple of people organizing, some sponsors and volunteer speakers, you can put on a technical conference that just blows away what's out there. Sure, the food could be better (how would you feed 500+ people on a budget? Pizza!), and everyone sure would want access to wifi, but here's what I think Code Camp gets right.

1. Free. For someone who has to manage a budget, this makes the decision easy. We put up five developers in a hotel in Boston and took them out to dinner, and it still cost less than sending just one developer to a standard conference.

2. Hardly any vendor presentations (if any). The Code Camp manifesto makes it very hard (all code must be free). I gave a presentation on writing cmdlets in Powershell.  To do that, I needed some interesting .NET objects to wrap, so I chose DotImage -- however, I don't think anyone would think I was hawking DotImage -- but, just to make sure, we gave away free copies of DotImage Photo so that all of the people that came to the talk can run my code. (fully functioning copies, not evals). Ok, I guess showing MS tools and API's are vendor presentations, but most of that is available for free too.

Vendors need to be very careful about just giving demos at conferences -- look at what Bruce Eckels had to say about the last Pycon:

I believe that this year's Pycon organizers suffered from inexperience and naivete, because they didn't know that some vendors will ask for anything just to see how far they can push it. And that it's a negotiation, that you must push back rather than give in just because the conference might get some money for it. More importantly, that the imperative to grow Pycon does not mean "at all costs." I've already spoken to more than one vendor who was dismayed by the state of things, so we are not talking about all vendors here by any means.

At first the morning plenary sessions -- where the entire conference audience was in a single room -- just seemed a bit commercial. But then I slowly figured out that the so-called "diamond keynotes" were actually sold to vendors. It must have sounded great to some vendors: you get to pitch to everyone and nothing else is going on so the audience is trapped.

From what I saw at AjaxWorld last year and heard from this year's -- they are also leaning more towards vendor presentations. Code Camp presentations (even by vendors or authors selling books) were full of useful content -- I never felt like I was being sold to.

3. Very developer centric -- all attendees were invited to give presentations -- this is the perfect place to get some practice giving talks. Most presentations showed code and built something in the time allotted. I saw a game built in XNA, an IronPyton extension added to a .NET application, a bunch of useful HTTP Handlers and modules built from scratch, and a lot more.

4. Access to experts -- I have pages of notes for ideas for how to improve our product and process at Atalasoft. Michael Cummings told me about InternalsVisibleTo, which alone was worth spending the weekend in Waltham. I got a chance to talk to Edwin Ames after seeing his presentation on TDD, Behavior Driven Development and mocking frameworks, and he told me more about NMock2 and Rhino. I got some info on FileSystem fields in SQL Server 2008 from Matthew Roche.

Ok, here's how you create a CmdLet that uses DotImage:

1 Download the attached zip open the solution

2. Build it -- my dll is named AtalasoftSnapin.dll

Here is the code for GetImage.cs, the implementation of the get-image cmdlet

using System;

using System.ComponentModel;

using System.Management.Automation;

 

using Atalasoft.Imaging;

 

namespace Atalasoft.Commands

{

      // the name of the cmdlet is get-image

      [Cmdlet("Get", "Image")]

      public class GetImageCommand : Cmdlet

      {

            private string _filename;

           

            // it has one parameter and it is mandatory

            // and can be piped in

            [Parameter(Mandatory=true, Position=0,

              ValueFromPipeline=true)]

            public string Filename

            {

                  get{

                        return _filename;

                  }

                 

                  set {

                        _filename = value;

                  }

            }

           

            protected override void BeginProcessing()

            {

            }

           

            // take the string they pass, treat it as

            // a filename, and load it into an image

            protected override void ProcessRecord()

            {

                  WriteObject(

                   new AtalaImage(_filename));     

            }

           

            protected override void EndProcessing()

            {

            }

      }

}

3. Install it with these lines in Powershell

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\installutil AtalasoftSnapin.dll

add-pssnapin AtalasoftSnapin

4. Now in Powershell, you can type

> get-image "full path to image"
> dir *.jpg | get-image

After adding in the cmdlets for gray-image and combinechannels-image, you can now create the red-blue 3D images I showed.

1. Take two pictures, one with your left eye, and one with your right
2. Make them both gray:

> get-image "path to image" | gray-image | save-image "output image"

3. And now put the left image in the red channel, and the right image in the blue and green.

> combinechannels-image -r (get-image "left image") -g (get-image "right image") -b (get-image "right image") | save-image "3d image"

replace "left image" and "right image" with the names of your gray images. replace "3d image" with the name of your output image

You might think that the whole point of the presentation was to get this picture (click for larger version):


But, it's not -- the point of the presentation is to get this picture:

 

A solution with my code is attached below. 

Ok, you're going to want to install Powershell before reading the rest of this. Once, you've done that, try typing these commands at the prompt (type in what appears after the >, the text in italics is just commentary):

Here's the hello world program
> "hello world"
Remember, the result of your expression is just printed by the shell. "hello world" is a .NET System.String

Here's some other primitives and simple expressions:
> 3
> 3+4
> 1GB
> "hello " + "world"
> "hello world".substring(6)
> 1..10

Remember that Powershell's directory primitives operate on a provider interface.  You can call dir and cd on anything that implements it (like the registry)
> cd HKLM:\
> dir
> cd SOFTWARE\Microsoft\Windows
> dir
> c:
> dir

There is extensive built-in help:
> get-help get-process
> get-command *process*
> "hello" | get-member
> get-history
> get-command *history*

You can call .NET assemblies (variables start with $)
> $feed = [xml](new-object System.Net.WebClient).DownloadString(
"http://www.atalasoft.com/cs/blogs/loufranco/rss.aspx"
)
> $feed.rss.channel.item | select title
> $feed.rss.channel.item | ? { $_.title -match "ower" }

And you can call any third-party assemblies (like DotImage).  Once you have installed DotImage, copy Atalasoft.Shared.dll, Atalasoft.DotImage.dll and Atalasoft.DotImage.Lib.dll to C:\WINDOWS\system32\windowspowershell\v1.0 -- this is one of the places Powershell looks for assemblies. You load them like this:

> [System.Reflection.Assembly]::Load("atalasoft.dotimage")
> [System.Reflection.Assembly]::Load("atalasoft.dotimage.lib")
> [System.Reflection.Assembly]::Load("atalasoft.shared")
 

In all of the next examples, you'll want to give full paths to filenames -- Powershell has an odd notion of the current directory which is confusing to explain right now.

Here's how you load an image
> $img = new-object Atalasoft.Imaging.AtalaImage("fullpath to image")

And now run the oilpaint command
> $cmd = new-object Atalasoft.Imaging.ImageProcessing.Effects.OilPaintCommand
> $cmd.BrushWidth = 15
> $img2 = $cmd.Apply($img).Image

And save it (any undefined variable is null, so don't define $null)
> $img2.Save("full path to image", (new-object Atalasoft.Imaging.Codec.JpegEncoder(85)), $null)

That turns this

 

into this


Tomorrow I'll post the code that makes it easier to do this (using Cmdlets) 

Update: Here is Image Processing with Powershell, Part II

This is a placeholder blog entry where I'll post my slides and code for my Extending Powershell talk at Code Camp on Saturday.

The code I am showing uses DotImage, and we want everyone to be able to play with it, so we'll have free DotImage Photo licenses for everyone who comes to the talk (and some schwag to raffle off).

UPDATE: If you are coming here from the link we gave you at the show -- here is Part I from the Powershell presentation.

0 Comments
Filed under:

If you want to process Office documents in .NET, for instance, if you use the DotImage AJAX Image Viewer and want to view a Word document, the easiest way is to use Office 2007 to do the heavy lifting. Here are instructions for doing that, but please read this KB about using Office on the serverside.

The easiest way to make an Office document readable by DotImage is to convert it into a PDF and then use our PDF Reader to read it. If you use Office 2007, download and install the Office Save as PDF Add-in to add this capability to Office, so that we can call it from .NET.

Once you do that, here's the code for converting a Word Doc to a PDF:

    private void Word2Pdf(

         string wordDocFile, string pdfDocFile)

    {

        object falseObj = false;

        object trueObj = true;

        object emptyStr = "";

        object missing = System.Reflection.Missing.Value;

 

        Microsoft.Office.Interop.Word.Application

              wordApp = null;

        try

        {

            wordApp = new

              Microsoft.Office.Interop.Word.Application();

            wordApp.Visible = false;

            object wordDocFileObj = wordDocFile;

            Microsoft.Office.Interop.Word.Document

              wordDoc = null;

            try

            {

                wordDoc = wordApp.Documents.Open(

                 ref wordDocFileObj,

                 ref falseObj,

                 ref trueObj,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing,

                 ref missing);

 

                wordDoc.ExportAsFixedFormat(
                  pdfDocFile,  

                  WdExportFormat.wdExportFormatPDF,

                  false,

                  WdExportOptimizeFor.

                     wdExportOptimizeForOnScreen,

                  WdExportRange.wdExportAllDocument,

                  -1, -1,  

                  WdExportItem.wdExportDocumentWithMarkup,

                  true, true,  

                  WdExportCreateBookmarks.

                     wdExportCreateNoBookmarks,

                  false, true, true, ref missing); 

            }

            finally

            {

                if (wordDoc != null)

                    wordDoc.Close(ref falseObj,

                      ref missing, ref missing);

                wordDoc = null;

            }

        }

        finally

        {

            if (wordApp != null)

                wordApp.Quit(ref falseObj,

                  ref missing, ref missing);

            wordApp = null;

        }

    }

Most of the work for doing this is to provide the right kind of automation objects to the Office API. 

You need to add a reference to Microsoft.Interop.Word, and add the following using statement:

using Microsoft.Office.Interop.Word;

 

The attached solution will show you how to read in Excel and PowerPoint into our web image viewer as well.

1 Comments

Attachment(s): WordWebViewer.zip
A bunch of us are going to Code Camp in Waltham, MA on April 5th and 6th. I'll be giving a presentation on how to extend Powershell and showing how I created cmdlets for image processing. The presentation is on April 5th, at 9:10 in the Rhode Island room.

I read this kind of funny story about a kid who got himself into a little trouble reading and writing barcodes by hand. After reading it, I figured if he's going through the trouble of getting graph paper and markers to write Code 39, I can certainly provide the code to doing it with DotImage.

Code 39 is a simple barcode format -- it supports only capital letters, numbers, and seven other special characters (so 43 characters total). Each letter is encoded with nine alternating black and white bars, always starting and ending with black. Three of the nine bars will be wide, and the rest narrow (thus the name -- sometimes also called 3 of 9). Between each letter is a narrow white bar, and there is a start and stop code so you know that you are looking at a Code 39.

One nice thing about Code 39 is that you can calculate the size in pixels before encoding, since each letter is the same size (3 wide and 6 narrow).

Here's some code:

First I grabbed the table of encodings from the wikipedia page on Code 39 and put them into a hashtable:

private static Hashtable _encoding = new Hashtable();

static BarcodeCode39()
{
    _encoding['*'] = "bWbwBwBwb";
    _encoding['-'] = "bWbwbwBwB";

    // rest of encodings removed 

Here is how you calculate the width of the image (there are two extra characters added to the barcode, and I have a margin as well).

    public int ImageWidth
    {
        get
        {
            return (_data.Length + 2) * ((3 * _wideWidth) + (6 * _narrowWidth)) +
                (_data.Length+1) * _narrowWidth +
                2 * _margin;
         }
    }
 

To make the barcode, I create the image:

    int imageWidth = ImageWidth;
 
    AtalaImage barcodeImg = new AtalaImage(imageWidth, _imageHeight, pf, Color.White);
    Canvas barcodeCanvas = new Canvas(barcodeImg);
    Fill fill = new SolidFill(Color.Black);
    AtalaPen pen = new AtalaPen(Color.Black);

 

Then here is my main loop through the data to encode:

    // add start/stop
    string dataToEncode = "*" + _data + "*";
    int x = _margin;
    foreach (char c in dataToEncode)
    {
        string bars = (string)_encoding[c];
        if (bars == null)
        {
            throw new NotSupportedException("Invalid character in data: '"+c+"'");
        }

        foreach (char b in bars)
        {
            switch (b)
            {
                case 'B': // Wide Black
                    barcodeCanvas.DrawRectangle(new Rectangle(x, _margin, _wideWidth, _imageHeight-2*_margin), pen, fill);
                    x += _wideWidth;
                    break;
                case 'W': // Wide White
                    x += _wideWidth;
                    break;
                case 'b': // Narrow Black
                    barcodeCanvas.DrawRectangle(new Rectangle(x, _margin, _narrowWidth, _imageHeight-2*_margin), pen, fill);
                    x += _narrowWidth;
                    break;
                case 'w': // Narrow White
                    x += _narrowWidth;
                    break;
            }
        }
        x += _narrowWidth; // between characters
    }
 

I attached Assemblies built off of DotImage 6.0 -- if you have a maintenance release (you are reading this after 6.0a has been released), you can find a version of this in the Demo directories.

Here's how you use it:

    BarcodeCode39 bc39 = new BarcodeCode39(data);
    bc39.Margin = 1;
    AtalaImage img = bc39.CreateBarcodeImage(PixelFormat.Pixel24bppBgr);

Then, you can use the image to overlay onto another with OverlayCommand. There are properties to adjust the outcome (NarrowWidth, WideWidth, ImageHeight, Margin). 

Here's how you recognize that barcode with the Atalasoft Barcode Reader:

      BarReader br = new BarReader(img);
      ReadOpts opts = new ReadOpts();
      opts.Symbology = Symbologies.Code39;
      BarCode[] bc = br.ReadBars(opts);

 

Share this post: email it! |