In the windows world, file extensions are a common means of identifying file
content type
If you see a file named foo.txt, you would usually be safe in assuming that
the file contains plain ASCII text
Likewise, if you have a file that is foo.png, you would usually be safe in
assuming the file is a Portable Network Graphics (PNG) image file.
However, the reality is that the extension in Windows does tell windows "what
program should try and open this" there's nothing to stop someone from renaming
the file foo.png to foo.txt and thus causing the image file to be opened in
Notepad (which is an ugly thing)
It should be noted that Atalasoft support has even run into customer cases
where a customer has taken a file (foo.png for instance) and renamed its
extension (for example renaming foo.png to foo.pdf) and then wondering why the
file won't open in acrobat after they "converted to PDF".
So, obviously changing the extension of a file does not actually change the
content type of the file.. it merely changes a part of the file name that
Windows and other applications use to give a clue about what type of content the
file has / which program to use to open the file.
Now, on one level, DotImage somewhat hides this from you... if you took
the aforementioned foo.png file and renamed it say, to foo.jpg and opened it in
one of our viewers or even just an AtalaImage:
AtalaImage img = new
AtalaImage("foo.jpg");
It would open the file just fine (assuming the file was in the correct path
and the image was actually a valid (png) image)
Basically, we ignore the file extension.
once we have the file as an AtalaImage object, we no longer care what type it
was on disk.. you can effectively save it out to any supported type...
so
img.Save("out.png", new PngEncoder(),
null);
would save it as a proper png image
img.Save("out.bmp", new BmpEncoder(), null);
would
save it as a bitmap, and so-on
So, how dos DotImage determine what type of image a file is?
Welcome to the RegisteredDecoders class and its Decoders collection
When you use DotImage to read an image in one of our viewers or directly use
classes such as FileSystemImageSource or AtalaImage, you pass it a filename or a
stream containing the data you want to open.. and what DotImage does is that it
iterates through every decoder it finds in
RegisteredDecoders.Decoders
and checks to see if the image is a type the current decoder knows how to
handle... Once it finds a decoder that indicates it knows how to handle the
data, it uses the decoder to render the image.
The way any decoder knows how to handle an image type is that the specific
decoder knows how to read its supported type's header / data information /
structure to say "ahh this is a type I know how to handle"
Internally, in a given image file, toward the beginning of the file there
will be a byte sequence that identifies the content.. for a TIFF file that data
looks like
II* (in hex, it would be bytes 0x49 0x49 0x2A at the very beginning of the
file)
For PDF it might be
%PDF-1.6 (in hex, it would be bytes 0x25 0x50 0x44
0x46 0x2D 0x31 0x2E 0x36 or similar at the beginning of the file)
So, we're actually reading the image data header from the file and
determining what decoder we have that knows it.
This becomes relevant because it answers the question "when I open this
file in DotImage, why do I get a 'Unrecognized file type' error?"
Unrecognized Image Type Exception
The answer is "because the file you opened did not contain data that any
ImageDecoder in the RegisteredDecoders.Decoders collection knew how to
handle"
Now, this could be for a number of reasons
1) if you were opening a stream object but the stream was not at the 0 /
start position.. to fix this, always ensure you reset the stream before trying
to read it
myStream.Seek(0, SeekOrigin.Begin);
AtalaImage
img = new AtalaImage(myStream);
2) Something in your code has removed the needed decoder (you should take
care never to remove any decoders from RegisteredDecoders.Decoders and should
NEVER call
RegisteredDecoders.Decoders.Clear();
3) The decoder your image needs is one that we do not include in the base
implementation either because it's uncommon or because it's an add-on which
requires additional licensing such as PdfDecoder, DwgDecoder, DicomDecoder,
Jb2Decoder, etc..
In order to add support for a given extra image type you must ensure you have
added its decoder to the RegisteredDecoders.Decoders collection. You should do
this in a static constructor for your class/app so that it is done once, and
only once to avoid double-adding
static Form1()
{
RegisteredDecoders.Decoders.Add(new PdfDecoder() { Resolution = 200
});
}
in VB this is done in a Shared Sub New
Shared Sub New
RegisteredDecoders.Decoders.Add(new PdfDecoder() With { .Resolution = 200
})
End Sub
4) Your file is corrupt/damaged or is simply not of a type that we
support
Determine Image Type Programmatically
There is a practical question that comes up quite often: "I have need to
programmatically know what type of image the file was.. maybe because I want to
apply special processing that is only valid for certain types.. such as using a
PdfAnnotationDataImporter (which will error out if I try and use it on a file
that is not a valid PDF)"
The RegisteredDecoders class again comes to your rescue... this time with the
RegisteredDecoders.GetDecoder(...) method.
NOTE: do NOT be tempted to use the ImageType property of ImageInfo as the
ImageType is not aware of any additional decoders.. this is deprecated....
ImageType typ = RegisteredDecoders.GetImageInfo(...).ImageType;
Don't use the ImageType enumeration at all... please.
You can get the decoder that the collection has found for your file and
use reflection to get the type and decide what actual type the image is.. here's
a simple example just to say "is this a PDF?
private bool (IsFilePdf(string
fileName)
{
using (FileStream fs = new FileStream(fileName,
FileMode.Open, FileAccess.Read, FileShare.Read))
{
object
rawDec = RegisteredDecoders.GetDecoder(fs);
if (rawDec !=null
&& rawDec.GetType() == typeof(PdfDecoder))
{
return true;
}
}
return false;
}
Keep in mind that some decoders such as RawDecoder and OfficeDecoder
support many different file types.. OfficeDecoder supports MS word, Excel, and
PowerPoint 97 and later as well as RTF, and the RawDecoder supports raw formats
from many different vendors ... all of which are classified as "raw image
formats" but are technically entirely different from each other internally
So, with DotImage if you give it a word or excel file, it will come back with
OfficeDecoder, but you won't know which type of file it was (and you will likely
need to fall back on the file's original extension to give you the hint you
need)
Original Article:
Q10445 - HOWTO: Determine an Image File Type / Format with DotImage