Understanding UTF and ANSI encoding

I am pretty sure that I am not the only one who have had trouble with encoding when dealing with XML especially when reading from system to another. I found a good post that I will paste here mostly for my own reference next time when someone asks why I cannot read his ANSI XML even he puts UTF-8 on the XML definition.

–clip–

he Basics
Letters are represented in a computer by numeric codes. Pretty much everybody agrees that, when the computer sees a code of 100 (decimal), it represents a lowercase “d”. We don’t all agree on what 250 represents, and therein lies the rub.

ASCII vs ANSI
We commonly refer to character encoding as a letter’s “ASCII value,” when we really mean “ANSI value.” A lot of the time that’s sufficient, but in fact the ASCII standard is pretty much obsolete.

ASCII (American Standard Code for Information Interchange) is a 7-bit standard that has been around since the late 1950s (its current incarnation dates from 1968). It defines 128 different characters, which is more than enough for English: upper- and lowercase letters, punctuation, numerals, control codes (remember control-c?), and nonprinting codes such as tab, return, and backspace.

ASCII and ANSI are pretty good as long as you are western European. These two mappings are extremely limited in that they may only code (i.e. assign a number to) 256 letters, so that there is no space to include other glyphs from other languages.

Unicode
Unicode fixes the limitations of ASCII and ANSI, by providing enough space for over a million different symbols. Like the above two systems, each character is given a number, so that Russian ? is 042F, and the Korean won symbol ? is 20A9. (Note that all Unicode numbers are Hexadecimal, meaning that one counts by 16’s not 10’s, not a problem as users really don’t need to know the mapping numbers anyway.) So, although not yet totally comprehensive, Unicode covers most of the world’s writing systems. Most importantly, the mapping is consistent, so that any user anywhere on any computer has the same encoding as everyone else, no matter what font is being used.

So Unicode is a map, a chart of (what will one day be) all of the characters, letters, symbols, punctuation marks, etc. necessary for writing all of the world’s languages past and present.

What is the difference between UTF-8, UTF-16?
UTF-8 uses variable byte to store a Unicode. In different code range, it has its own code length, varies from 1 byte to 6 bytes. Because it varies from 8 bits (1 byte), it is so called “UTF-8”. UTF-8 is suitable for using on Internet, networks or some kind of applications that needs to use slow connection.

Unicode (or UCS) Transformation Format, 16-bit encoding form. The UTF-16 is the Unicode Transformation Format that serializes a Unicode scalar value (code point) as a sequence of two bytes, in either big-endian or little-endian format. Because it is grouped by 16-bits (2 bytes), it is also called “UTF-16”, which is the most commonly used standard.

Source: http://forum.iopus.com/viewtopic.php?t=2783

–clip–

Moving to vectorial

Last 1.5 years I have been drifting towards Linux world since Microsoft has been pretty much dead when it comes to interesting stuff for web developers. I can see that Microsoft is working hard on creating new API’s like XAML which is (IMHO) from web developer’s point of view quite useless. Hopefully someone in Microsoft soon wakes up for SVG.

I have been running latest Ubuntu on my laptop which works like a charm. But still I have windows in dual boot because I need to sometimes work on Photoshop, Flash and Visual Studio. But slowly I have started to move my development environment to Linux.

I think SVG is going to be big in following years; SVG has more potential than XAML simply because SVG was designed for web and it’s part of web standard. It will revolutionize interface in internet as well as on operating systems. Therefore I have decided to start move to SVG world with the Mono, Cairo and Gtk+. Gtk+ is a library used to build GUI applications on UNIX and the heart of the Gnome desktop. Gtk+ is built on top of the 2D graphics engine Cairo which released version 1.0 in August: every widget is now written using Cairo operations and most importantly developers can now draw their own widgets using the PDF-like rendering model offered by Cairo.

Cairo also brings to the end user nice touches like anti-aliased rendering for a more pleasant experience. Gtk builds on this new functionality to bring vector-based themes to the desktop as well.

AJAX framework

With Backbase software you can quickly create user-friendly and highly interactive Rich Internet Applications. These applications are based on Web Standards and can be viewed in almost any modern web browser without installing any plug-ins.
http://www.backbase.com/

Replace linebreak with HTML <br>

<!– ********************************************************************** –>
<!– ******** Replace all CRLF with <br /> ******************************** –>
<!– ******** i.e. Insert carriage returns ******************************** –>
<!– ******** Source located at: ****************************************** –>

<!– ******** http://www.dpawson.co.uk/xsl/sect2/N8321.html#d10224e153 **** –>

<!– ********************************************************************** –>
<xsl:template name=”br-replace”>
<xsl: param name=”word”/>
<xsl:choose>
<xsl:when test=”contains($word,’&#x0D;’)”>
<xsl:value-of select=”substring-before($word,’&#x0D;’)”/>
<br/>
<xsl:call-template name=”br-replace”>
<xsl:with-param Name=”word” select=”substring-after($word,’&#x0D;’)”/>
</xsl:call-template>
</xsl:when>

<xsl:otherwise>
<xsl:value-of select=”$word”/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

Here’s the call:
<xsl:call-template name=”br-replace”>
<xsl:with-param name=”word” select=”.”/>
</xsl:call-template>

Firefox got neat feature

Live Bookmarks is a new technology in Firefox that lets you view RSS news and blog headlines in the bookmarks toolbar or bookmarks menu. With one glance, quickly see the latest headlines from your favorite sites. Go directly to the articles that interest you — saving you time.

See picture below:
Subscribe to RSS on Firefox

As picture tells there is a little RSS button on page when there is a rss feed available for page contents. When you click it Firefox creates new folder into bookmarks with feed links and titles.
Niceee…

More information in Mozilla.org

Surfing XML sites

Seems like Kurt Cagle is back on blogging at BlogSpot with he?s new Metaphorical Web -website. I also notice that Kurt Cagle is on XML Developers Conference talking about XSLT2 on .NET. Too bad this conference is in U.S a bit too far for me to go another continent. 

Legendary XSLT books by Michael Kay are now in print. Here is related discussion in xsl-list about XSLT 2.0″ and “XPath 2.0” books.