检测发送文本所需的SMS数量的最佳方法

时间:2011-12-01 23:25:38

标签: php encoding sms ascii ucs2

我正在寻找一个php中的代码/ lib,我会调用它并将文本传递给它,它会告诉我:

  1. 我需要使用什么编码才能将此文本作为短信发送(7,8,16位)
  2. 我将用于发送此文本的短信数量(必须明智地计算“{3}}中的”分段信息“
  3. 您是否知道存在哪些代码/ lib会为我做这个?

    我再也不是要发送短信或转换短信,只是为了给我提供有关短信的信息

    更新:

    好的,我做了以下代码,似乎工作正常,如果你有更好/优化的代码/解决方案/ lib,请告诉我

    $text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';
    
    print $text . "\n";
    print isGsm7bit($text). "\n";
    print getNumberOfSMSsegments($text). "\n";
    
    
    
    
    function getNumberOfSMSsegments($text,$MaxSegments=6){
    /*
    http://en.wikipedia.org/wiki/SMS
    
    Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
    in which case each message will start with a user data header (UDH) containing segmentation information. 
    Since UDH is part of the payload, the number of available characters per segment is lower: 
    153 for 7-bit encoding, 
    134 for 8-bit encoding and 
    67 for 16-bit encoding. 
    The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
    While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
    and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
    Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
    */
    $TotalSegment=0;
    $textlen = mb_strlen($text);
    if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS
    
    if(isGsm7bit($text)){ //7-bit
        $SingleMax=160;
        $ConcatMax=153;
    }else{ //UCS-2 Encoding (16-bit)
        $SingleMax=70;
        $ConcatMax=67;
    }
    
    if($textlen<=$SingleMax){
        $TotalSegment = 1;
    }else{
        $TotalSegment = ceil($textlen/$ConcatMax);
    }
    
    if($TotalSegment>$MaxSegments) return false; //SMS is very big.
    return $TotalSegment;
    }
    
    function isGsm7bit($text){
    $gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
    $textlen = mb_strlen($text);
    for ($i = 0;$i < $textlen; $i++){
        if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not able to detect \ in string
    }
    return true;
    }
    

3 个答案:

答案 0 :(得分:4)

我在这里添加了一些额外的信息,因为之前的答案并不完全正确。

以下是问题:

  • 需要指定当前字符串编码为mb_string,否则可能会错误地收集
  • 在7位GSM编码中,基本字符集扩展字符(^ {} \ [〜] |€)每个需要14位进行编码,因此它们分别计为两个字符。
  • 在UCS-2编码中,您必须警惕16位BMP之外的表情符号和其他字符,因为......
  • 使用UCS-2的GSM计算16位字符,因此如果您有一个字符(U + 1F4A9),并且您的运营商和电话偷偷地支持UTF-16而不仅仅是UCS-2,它将被编码为代理UTF-16中的一对16位字符,因此被计为字符串长度的两个16位字符。 SELECT * FROM table where rowid = $id 只会将此视为单个字符。

如何计算7位字符:

到目前为止,我提出了以下几点来计算7位字符:

mb_strlen

如何计算16位字符:

  • 将字符串转换为UTF-16表示形式(以使用// Internal encoding must be set to UTF-8, // and the input string must be UTF-8 encoded for this to work correctly protected function count_gsm_string($str) { // Basic GSM character set (one 7-bit encoded char each) $gsm_7bit_basic = "@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà"; // Extended set (requires escape code before character thus 2x7-bit encodings per) $gsm_7bit_extended = "^{}\\[~]|€"; $len = 0; for($i = 0; $i < mb_strlen($str); $i++) { if(mb_strpos($gsm_7bit_basic, $str[$i]) !== FALSE) { $len++; } else if(mb_strpos($gsm_7bit_extended, $str[$i]) !== FALSE) { $len += 2; } else { return -1; // cannot be encoded as GSM, immediately return -1 } } return $len; } 保留表情符号字符。
    • 转换为UCS-2,因为mb_convert_encoding($str, 'UTF-16', 'UTF-8')这是有损的。)
  • 使用mb_convert_encoding计算字节并除以2以获得计入GSM多部分长度的UCS-2 16位字符数

*告诫emptor,关于计算字节的一个词:

  • 不要使用count(unpack('C*', $utf16str))来计算字节数。虽然它可能有效,但strlen经常在具有多字节版本的PHP安装中过载,并且在将来也是API更改的候选者
  • 避免使用strlen 。虽然它当前工作,并且正确地返回2,因为它看起来像一堆poo字符(因为它看起来像两个16位UCS-2字符),它的稳定性mb_strlen($str, 'UCS-2')是有损的从&gt; 16位转换为UCS-2时。谁能说mb_strlen将来不会有损?
  • 避免使用mb_convert_encoding 。它也当前有效,并建议在PHP文档注释中作为计算字节的方法。但IMO遇到的问题与上述UCS-2技术相同。
  • 将最安全的当前方式(IMO)保留为mb_strlen($str, '8bit') / 2字节数组,并计算出来。

那么,这看起来像什么?

unpack

全部放在一起:

// Internal encoding must be set to UTF-8,
// and the input string must be UTF-8 encoded for this to work correctly
protected function count_ucs2_string($str)
{
    $utf16str = mb_convert_encoding($str, 'UTF-16', 'UTF-8');
    // C* option gives an unsigned 16-bit integer representation of each byte
    // which option you choose doesn't actually matter as long as you get one value per byte
    $byteArray = unpack('C*', $utf16str);
    return count($byteArray) / 2;
}

将其转变为图书馆......

https://bitbucket.org/solvam/smstools

答案 1 :(得分:3)

我到目前为止的最佳解决方案:

$text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";

function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not     able to detect \ in string
}
return true;
}

答案 2 :(得分:0)

  • 第1页:160字节
  • 第2页:146个字节
  • 第3页:153个字节
  • 第4页:153个字节
  • 第5页:153个字节,....

因此,无论使用哪种语言:

// strlen($text) show bytes 

           $count = 0;
           $len = strlen($text);
                if ($len > 306) {
                    $len = $len - 306;
                    $count = floor($len / 153) + 3;
                } else if($len>160){
                    $count = 2;
                }else{
                    $count = 1;
                }