Jump to content


Photo

Url Encoding


  • Please log in to reply
7 replies to this topic

#1 -Insane-

-Insane-

    GMC Member

  • New Member
  • 717 posts

Posted 13 September 2007 - 04:39 PM

Hey!

I searched the forum for a bit, but it doesn't seem as though anybody has done this before. So I decided to release my custom-made URL encoding and decoding scripts.
For those of you who don't know what URL encoding is and why it's used, here's the general idea:

Some special characters can mix up or confuse HTTP requests with servers, since they are used for special meanings, such as define the end of a string or end separate parameters or something. The way this is gone by is by substituting, for instance "@" with "%40". Of course the "%" character itself would also URL-encoded, so nothing is mixed up. Basically the "40" is the hexadecimal version of "@", and the urldecoder does the same thing in reverse.

So here are the scripts. Both take one string as argument and return one too.

URLencode:
//URL-Encodes a string according to RFC 1738{var orig,new,char,tmp,ans;orig = argument0;new = "";char = 0;tmp = 0;ans = 0;for (ps=1; ps<=string_length(orig); ps+=1) {    char = string_char_at(orig,ps);    char = ord(char);    if (char < 32) || (char > 126) || (char == 36) || (char == 38) || (char == 43) || (char == 44) || (char == 47) || (char == 58) || (char == 59) || (char == 61) || (char == 63) || (char == 64) || (char == 32) || (char == 34) || (char == 60) || (char == 62) || (char == 35) || (char == 37) || (char == 123) || (char == 125) || (char == 124) || (char == 92) || (char == 94) || (char == 126) || (char == 91) || (char == 93) || (char == 96) {        tmp = floor(char/16);        ans = char-tmp*16;                tmp = string(tmp);        if (tmp = "10") tmp = "A";        if (tmp = "11") tmp = "B";        if (tmp = "12") tmp = "C";        if (tmp = "13") tmp = "D";        if (tmp = "14") tmp = "E";        if (tmp = "15") tmp = "F";                ans = string(ans);        if (ans = "10") ans = "A";        if (ans = "11") ans = "B";        if (ans = "12") ans = "C";        if (ans = "13") ans = "D";        if (ans = "14") ans = "E";        if (ans = "15") ans = "F";                new = new+"%"+tmp+ans;    } else {        new = new+chr(char);    }}return new;}

URLdecode:
//URL-Decodes a string according to RFC 1738{var orig,new,char,tmp;orig = argument0;new = "";char = 0;tmp = 0;for (ps=1; ps<=string_length(orig); ps+=1) {    char = string_char_at(orig,ps);    if (char == "%") {        ps += 1;        char = string_upper(string_char_at(orig,ps));        if (char = "A") char = "10";        if (char = "B") char = "11";        if (char = "C") char = "12";        if (char = "D") char = "13";        if (char = "E") char = "14";        if (char = "F") char = "15";                char = real(char);                ps += 1;        tmp = string_upper(string_char_at(orig,ps));        if (tmp = "A") tmp = "10";        if (tmp = "B") tmp = "11";        if (tmp = "C") tmp = "12";        if (tmp = "D") tmp = "13";        if (tmp = "E") tmp = "14";        if (tmp = "F") tmp = "15";                tmp = real(tmp);                new = new+chr(char*16+tmp);    } else {        new = new+char;    }}return new;}

-Insane

Ps: I'm doing Base64 encoding right now, if you care to know, and I will probably post it too.

Edited by -Insane-, 14 September 2007 - 10:00 PM.

  • 0

#2 -Insane-

-Insane-

    GMC Member

  • New Member
  • 717 posts

Posted 14 September 2007 - 08:42 PM

Ok, here's Base64 encoding/decoding:

base64encode:
//Encode a string into Base64.{var str,bits,new,chrr,len,b,base64str,stradd,orlen;str = argument0;bits = "";new = "";b[8] = 0;len = 0;stradd = "";base64str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";for (bc=1; bc<=string_length(str); bc+=1) {    chrr = string_char_at(str,bc);    chrr = ord(chrr);        b[1] = sign(chrr & 128);    b[2] = sign(chrr & 64);    b[3] = sign(chrr & 32);    b[4] = sign(chrr & 16);    b[5] = sign(chrr & 8);    b[6] = sign(chrr & 4);    b[7] = sign(chrr & 2);    b[8] = sign(chrr & 1);         for (bb=0; bb<8; bb+=1) {        bits = bits + string(b[bb+1]);        len += 1;    }}orlen = len;while (len mod 6 != 0) || (len mod 8 != 0) {    bits = bits + "0";    len += 1;    if (len-orlen) > 8        stradd = "==";    else        stradd = "=";}for (bc=1; bc<=len; bc+=6) {    b[1] = real(string_char_at(bits,bc));    b[2] = real(string_char_at(bits,bc+1));    b[3] = real(string_char_at(bits,bc+2));    b[4] = real(string_char_at(bits,bc+3));    b[5] = real(string_char_at(bits,bc+4));    b[6] = real(string_char_at(bits,bc+5));        chrr = b[1]*32 + b[2]*16 + b[3]*8 + b[4]*4 + b[5]*2 + b[6]*1;    new = new + string_char_at(base64str,chrr+1);}return new + stradd;}

base64decode:
//Decode a Base64 string into ASCII.{var str,bits,new,chrr,len,b,base64str,numpad;str = argument0;numpad = 0;while (string_char_at(str,string_length(str)) == "=") {    str = string_copy(str,1,string_length(str)-1);    numpad += 1;}bits = "";new = "";b[8] = 0;len = 0;base64str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";for (bc=1; bc<=string_length(str); bc+=1) {    chrr = string_char_at(str,bc);        chrr = string_pos(chrr,base64str)-1;        b[1] = sign(chrr & 32);    b[2] = sign(chrr & 16);    b[3] = sign(chrr & 8);    b[4] = sign(chrr & 4);    b[5] = sign(chrr & 2);    b[6] = sign(chrr & 1);        for (bb=0; bb<6; bb+=1) {        bits = bits + string(b[bb+1]);        len += 1;    }}while (len mod 8 != 0) || (len mod 6 != 0) {    bits = bits + "0";    len += 1;}for (bc=1; bc<=len-numpad*8; bc+=8) {    b[1] = real(string_char_at(bits,bc+0));    b[2] = real(string_char_at(bits,bc+1));    b[3] = real(string_char_at(bits,bc+2));    b[4] = real(string_char_at(bits,bc+3));    b[5] = real(string_char_at(bits,bc+4));    b[6] = real(string_char_at(bits,bc+5));    b[7] = real(string_char_at(bits,bc+6));    b[8] = real(string_char_at(bits,bc+7));        chrr = b[1]*128 + b[2]*64 + b[3]*32 + b[4]*16 + b[5]*8 + b[6]*4 + b[7]*2 + b[8]*1;    new = new + chr(chrr);}return new;}

-Insane
  • 0

#3 IceMetalPunk

IceMetalPunk

    InfiniteIMPerfection

  • GMC Elder
  • 9603 posts
  • Version:GM:Studio

Posted 14 September 2007 - 08:59 PM

For the URL encode/decode scripts, wouldn't it be shorter to use a decimal-to-hex and vice-versa script on all non-alphanumeric, non +_-=?. characters?

-IMP :GM6: :P
  • 0

:GM123: Are you an artist who likes creating original creature designs? Maybe you can help out with Elementa here! Give it a look; you might like the idea :)

:bunny: :excl: :bunny: :excl: :bunny: :excl: :bunny: :excl: :bunny: :excl: :bunny: :excl: :bunny: :excl: :bunny: :excl: :bunny: :excl: :bunny:


#4 Yourself

Yourself

    The Ultimate Pronoun

  • GMC Elder
  • 7352 posts
  • Version:Unknown

Posted 14 September 2007 - 09:17 PM

Basically the "20" is the hexadecimal version of "@"


No it isn't. 20 is the hexadecimal version of a space (" "). 40 is the @ symbol. Your encoding script can also be shortened by a lot:

var str, res, i, len, c, n, hex;
str = argument0;
len = string_length(str);
res = "";
hex = "0123456789abcdef";
for (i = 1; i <= len; i += 1) {
     c = string_char_at(str, i);
     if (string_pos(c, ' {}[]|\^~`"#%<>;/@$=:?&') > 0 || ord(c) > 126) {
         n = ord(c);
         c = '%' + string_char_at(hex, c div 16 + 1) + string_char_at(hex, c mod 16 + 1);
     }
     res += c;
}
return res;

Your decoding script could also be shortened up and optimized even more:

var str, res, i, hex, h, s;
str = argument0;
res = "";
hex = "0123456789abcdef";
i = string_pos('%', str);
while (i > 0) {
     res += string_copy(str, 1, i);
     s = string_char_at(str, i+1);
     h = 16*(string_pos(string_lower(s), hex) - 1);
     s = string_char_at(str, i+2);
     h += string_pos(string_lower(s), hex) - 1;
     str = string_delete(str, 1, i + 3);
     i = string_pos('%', str);
}
return res + str;

I can't remember what characters exactly should be escaped out of the string, but that's easy enough to change in my script.
  • 0

#5 -Insane-

-Insane-

    GMC Member

  • New Member
  • 717 posts

Posted 14 September 2007 - 10:10 PM

My bad, you're right. "@" is "%40" instead of "%20".
Also, in my original script is the complete list of all character to be URL-encoded.
Thanks for the shorter scripts, I didn't really think it over that well since I was hurrying while writing it, but now while reading I see the point.
Is there in any chance a way of shortening my Base64 script too?
I used a pretty inefficient (in my opinion) 256 -> 2 -> 64 base conversion system, because I didn't find a way to directly convert 256 to 64, and I don't think there is.

Also, I have done hexadecimal decoding too, since my sha1 dll gives me hex, I might write the encoder sometime too and post it here.

-Insane
  • 0

#6 Yourself

Yourself

    The Ultimate Pronoun

  • GMC Elder
  • 7352 posts
  • Version:Unknown

Posted 15 September 2007 - 12:08 AM

I used a pretty inefficient (in my opinion) 256 -> 2 -> 64 base conversion system, because I didn't find a way to directly convert 256 to 64, and I don't think there is.


There is a direct conversion (there's always a "direct" conversion between any two bases). Every 3 characters in base 256 correspond to 4 base 64 characters. So you'd only have to convert 3 characters at a time.
  • 0

#7 ObsidianNovels

ObsidianNovels

    GMC Member

  • GMC Member
  • 717 posts
  • Version:GM:HTML5

Posted 10 May 2014 - 10:53 PM

Since the original author of this great script has made himself unable to reach, I was wondering if there is someone here who can help me modify this little piece:

 

Special characters like Swedish å, ä, ö is not written out correctly.

Any ideas of how to solve this?


  • 0

#8 GameGeisha

GameGeisha

    GameGeisha

  • GMC Member
  • 6565 posts
  • Version:GM:Studio

Posted 11 May 2014 - 12:16 AM

This script only works on ASCII strings. It is good enough in its hayday when GM strings work in ASCII, but not today when GM strings work in UTF-8. If you use any characters outside the ASCII range, you need to use this instead.

 

GameGeisha

 

PS: Is it really necessary to necro a 7-year-old topic?


  • 0
Latest Releases:
  • GMLinear --- Matrix and vector math in one line!
  • GMAssert --- Debug invalid values and write quick unit tests with ease!
  • KameGMS --- Bring up TortoiseSVN and TortoiseGit dialogs from within the GMS IDE!
  • JSOnion v1.1 --- The stink-free way to handle JSON! (even deeply nested ones)