KBD

Keith Devens .com

Tuesday, October 7, 2008 Flag waving
Got truth decay? Brush up on your Bible! – a billboard in New Brunswick, NJ
← Levels of abstractionAngel renewed, Smallville moves to Wednesday →

Daily link icon Tuesday, May 13, 2003

Writing Unicode-friendly markup languages and programs

For reasons I can't reveal just yet (cue sinister music), I'm being forced to educate myself on what it takes to write a markup language or a program that supports Unicode. So, here's the start of my research:

As an aside: Base64 encoding uses A-Z, a-z, 0-9, and then '+' and '/' for encoding. Part of the motivation for the characters they chose, IIRC, was to try to achieve some conformance with both ASCII and EBCDIC. I've always wondered why they used plus and slash rather than plus and minus...

From the definition of base64:

NOTE: This subset has the important property that it is
represented identically in all versions of ISO 646, including US
ASCII, and all characters in the subset are also represented
identically in all versions of EBCDIC. Other popular encodings,
such as the encoding used by the uuencode utility and the base85
encoding specified as part of Level 2 PostScript, do not share
these properties, and thus do not fulfill the portability
requirements a binary transport encoding for mail must meet.

Hmm...

These characters, identified in Table 1, below, are selected so as to be universally representable, and the set excludes characters with particular significance to SMTP (e.g., ".", CR, LF) and to the encapsulation boundaries defined in this document (e.g., "-").

So, it turns out that because it's part of MIME, and the minus has special meaning within mime, they didn't use it for base64 encoding.

Since the hyphen character ("-") is represented as itself in the
Quoted-Printable encoding, care must be taken, when encapsulating a
quoted-printable encoded body in a multipart entity, to ensure that
the encapsulation boundary does not appear anywhere in the encoded
body.

I still don't quite understand why something that is used in quoted-printable encoding impacted on what was chosen for base64... since if base64 text was transferred as quoted printable, the hyphen could just be encoded again, right? Maybe they just wanted to avoid that.

Ok, I get it... it's not because of quoted printable. A boundary between different parts of a MIME document could be something like this "--gc0p4Jq0M2Yt08jU534c0p--", which could also be valid base64 encoded text. So, since you wouldn't want to dump some base64 encoded text in a document and find that you accidentally ended your current "body part", they disallowed it. And hyphens were used for boundaries because of the desire for conformance with previous specs. Interesting.

← Levels of abstractionAngel renewed, Smallville moves to Wednesday →

Comments XML gif


Feel free to post a comment below. Please see my comment policy.

Formatting Rules (No HTML):

  • **bold**, *italic*, _underlined_, --strikeout--
  • "text"="url" creates a link, and URLs are auto-highlighted
  • Blockquote: Like e-mail, begin paragraph with > (greater-than sign)
  • Lists: begin paragraph with *,-, or + (unordered), or # (ordered)
  • Code block: ?!code:language=perl|php|sql|javascript|etc.{\n}...{\n}?!/code

:
(will be your IP address if blank)
: (optional)
(Will not be shown on site)

: (optional)
:

October 2008
SunMonTueWedThuFriSat
 1234
567891011
12131415161718
19202122232425
262728293031 



RSS feed RSS feed for Keith's Weblog
Atom feed Atom feed for Keith's Weblog
Weblog archive
Recent comments
  on 4 posts

Recent comments XML

new⇒Timesheet Calculator

Have you also seen TSheets.com?...

Time Tracker: Oct 7, 2:00am

new⇒Girls, please don't get breast implants

Hey everyone, 

I am new to this​blog and I have enjoyed reading all​your...

Sarah.M.: Oct 6, 9:45am

obout inc - ASP.NET controls

I like there components. I've got​it to work locally on my pc.​However I'm ...

Jeff: Oct 2, 4:43pm

Dumb substring behavior in C# (and Java)

Yes, the Substring function is not​helpful when you hit the length​problem,...

Mike Irving: Oct 2, 7:56am

Generated in about 0.118s.

(Used 8 db queries)

mobile phone