[ Expand All ] [ Collapse All ] - [ Properties (5) ] [ Methods (7) ] - [ Legend ]

ConvertCharset Class v. 1.1 2007-10-30 23:11

ConvertCharset

You should know about...

For good understanding this class you shouls read all this stuff first :) but if you are in a hurry just start the demo.php and see what's inside.
  1. That I'm not good in english at 03:45 :) - so forgive me all mistakes
  2. This class is a 1.1 version because changes ware small...
  3. Feel free to contact me with questions, bug reports and mistakes in PHP and this documentation (email below)

In a few words...

Why ConvertCharset class?

I have made this class because I had a lot of problems with diferent charsets. First because people from Microsoft wanted to have thair own encoding, second because people from Macromedia didn't thought about other languages, third because sometimes I need to use text written on MAC, and of course it has its own encoding :)

Notice & remember:

  • When I'm saying 1 byte string I mean 1 byte per char.
  • When I'm saying multibyte string I mean more than one byte per char.

So, this are main FEATURES of this class:

  • conversion between 1 byte charsets
  • conversion from 1 byte to multi byte charset (utf-8)
  • conversion from multibyte charset (utf-8) to 1 byte charset
  • every conversion output can be save with numeric entities (browser charset independent - not a full truth)

This is a list of charsets you can operate with, the basic rule is that a char have to be in both charsets, otherwise you'll get an error.

  • WINDOWS
  • windows-1250 - Central Europe
  • windows-1251 - Cyrillic
  • windows-1252 - Latin I
  • windows-1253 - Greek
  • windows-1254 - Turkish
  • windows-1255 - Hebrew
  • windows-1256 - Arabic
  • windows-1257 - Baltic
  • windows-1258 - Viet Nam
  • cp874 - Thai - this file is also for DOS

  • DOS
  • cp437 - Latin US
  • cp737 - Greek
  • cp775 - BaltRim
  • cp850 - Latin1
  • cp852 - Latin2
  • cp855 - Cyrylic
  • cp857 - Turkish
  • cp860 - Portuguese
  • cp861 - Iceland
  • cp862 - Hebrew
  • cp863 - Canada
  • cp864 - Arabic
  • cp865 - Nordic
  • cp866 - Cyrylic Russian (this is the one, used in IE "Cyrillic (DOS)" )
  • cp869 - Greek2

  • MAC (Apple)
  • x-mac-cyrillic
  • x-mac-greek
  • x-mac-icelandic
  • x-mac-ce
  • x-mac-roman

  • ISO (Unix/Linux)
  • iso-8859-1
  • iso-8859-2
  • iso-8859-3
  • iso-8859-4
  • iso-8859-5
  • iso-8859-6
  • iso-8859-7
  • iso-8859-8
  • iso-8859-9
  • iso-8859-10
  • iso-8859-11
  • iso-8859-12
  • iso-8859-13
  • iso-8859-14
  • iso-8859-15
  • iso-8859-16

  • MISCELLANEOUS
  • gsm0338 (ETSI GSM 03.38)
  • cp037
  • cp424
  • cp500
  • cp856
  • cp875
  • cp1006
  • cp1026
  • koi8-r (Cyrillic)
  • koi8-u (Cyrillic Ukrainian)
  • nextstep
  • us-ascii
  • us-ascii-quotes

  • DSP implementation for NeXT
  • stdenc
  • symbol
  • zdingbat

  • And specially for old Polish programs
  • mazovia

Now, to the point...

Here are main variables.

DEBUG_MODE

You can set this value to:

  • -1 - No errors or comments
  • 0 - Only error messages, no comments
  • 1 - Error messages and comments

Default value is 1, and during first steps with class it should be left as is.

CONVERT_TABLES_DIR

This is a place where you store all files with charset encodings. Filenames should have the same names as encodings. My advise is to keep existing names, because thay were taken from unicode.org ( www.unicode.org ), and after update to unicode 3.0 or 4.0 the names of files will be the same, so if you want to save your time...uff, leave the names as thay are for future updates.

The directory with edings files should be in a class location directory by default, but of course you can change it if you like.

@packageAll about charset...

Related Topics
Unicode Homepage
AuthorMikolaj Jedrzejak
Copyright Copyright Mikolaj Jedrzejak (c) 2003-2007
Properties implemented by ConvertCharset
public method CharsetTable

(array) This property keeps convert Table inside

public method Entities

(boolean) This value keeps information if output should be with numeric entities.

public method FromCharset

(string) This value keeps information about source (from) encoding

public method RecognizedEncoding

(boolean) This value keeps information if string contains multibyte chars.

public method ToCharset

(string) This value keeps information about destination (to) encoding

Methods implemented by ConvertCharset
public method Convert

PHPstring Convert(string $StringToChange) v. 1.1 2007-10-30 01:09

ConvertCharset::Convert()

This is a basic function you are using. I hope that you can figure out this function syntax :-)

Arguments

Return

Converted string in brand new encoding :)

public method ConvertArray

PHPConvertArray(mixed $array)

ConvertCharset::ConvertArray()

This method converts all values from multi-dimentional array according to *From* and *To* charset. This method is available since v.1.1

Arguments

constructor ConvertCharset

PHPConvertCharset(string $FromCharset, string $ToCharset, boolean $TurnOnEntities)

ConvertCharset::ConvertCharset()

Arguments

private method DebugOutput

PHPstring DebugOutput(integer $Group, integer $Number, mix $Value)

ConvertCharset::DebugOutput()

This function is not really necessary, the debug output could stay inside of source code but like this, it's easier to manage and translate. Besides I couldn't find good coment/debug class :-) Maybe I'll write one someday...

All messages depend on DEBUG_MODE level, as I was writing before you can set this value to:

Arguments

Return

String with a proper message.

private method HexToUtf

PHPstring HexToUtf(string $UtfCharInHex)

ConvertCharset::HexToUtf()

This simple function gets unicode char up to 4 bytes and return it as a regular char. It is very similar to UnicodeEntity function (link below). There is one difference in returned format. This time it's a regular char(s), in most cases it will be one or two chars.

See also
ConvertCharset::UnicodeEntity()

Arguments

Return

Encoded hexadecimal value as a regular char.

private method MakeConvertTable

PHParray MakeConvertTable(string $FromCharset, string $ToCharset)

CharsetChange::MakeConvertTable()

This function creates table with two SBCS (Single Byte Character Set). Every conversion is through this table.

"Format A" encoding file, if you have to build it by yourself should aplly these rules:

#
#	The entries are in ANSI X3.4 order.
#
0x00	0x0000	#	NULL end extra comment, if needed
0x01	0x0001	#	START OF HEADING
# Oh, one more thing, you can make comments inside of a rows if you like.
0x02	0x0002	#	START OF TEXT
0x03	0x0003	#	END OF TEXT
next line, and so on...

You can get full tables with encodings from http://www.unicode.org

Arguments

Return

Table necessary to change one encoding to another.

private method UnicodeEntity

PHPstring UnicodeEntity(string $UnicodeString)

CharsetChange::NumUnicodeEntity()

Unicode encoding bytes, bits representation. Each b represents a bit that can be used to store character data.

This function is written in a "long" way, for everyone who woluld like to analize the process of unicode encoding and understand it. All other functions like HexToUtf will be written in a "shortest" way I can write tham :) it does'n mean thay are short of course. You can chech it in HexToUtf() (link below) - very similar function.

IMPORTANT: Remember that $UnicodeString input CANNOT have single byte upper half extended ASCII codes, why? Because there is a posibility that this function will eat the following char thinking it's multibyte unicode char.

See also
ConvertCharset::HexToUtf()

Arguments

Return

This is an input string also with unicode chars, bus saved as entities

Generated by PHPEdit - Copyright © 1999-2004 - Sébastien Hordeaux - WaterProof SARL