在T-SQL中反序列化PHP序列化数据

时间:2012-08-14 15:28:16

标签: php tsql magento serialization

我正在尝试从Magento订单中提取礼品卡代码。其他一些代码使用Magento API从Magento检索订单信息为XML,并将XML插入到MS SQL Server记录中。使用T-SQL我可以使用XML函数来解析从Magento API检索到的XML,并获得我所需要的几乎所有内容,但实际存储卡代码存储的唯一位置是gift_cards字段,恰好是php序列化字符串。

示例:

a:1:{i:0;a:5:{s:1:"i";s:1:"1";s:1:"c";s:12:"00XCY8S3ZXCU";s:1:"a";d:119;s:2:"ba";d:119;s:10:"authorized";d:119;}}
a:3:{i:0;a:5:{s:1:"i";s:2:"10";s:1:"c";s:12:"045EMJJWRCF1";s:1:"a";d:100;s:2:"ba";d:100;s:10:"authorized";d:100;}i:1;a:5:{s:1:"i";s:2:"11";s:1:"c";s:12:"06DUQ7Z5GVT7";s:1:"a";d:101;s:2:"ba";d:101;s:10:"authorized";d:101;}i:2;a:5:{s:1:"i";s:2:"12";s:1:"c";s:12:"07A6MRYW511J";s:1:"a";d:102;s:2:"ba";d:102;s:10:"authorized";d:102;}}

礼品卡代码是数组中带有“c”键的值,例如: 00XCY8S3ZXCU 045EMJJWRCF1 06DUQ7Z5GVT7 07A6MRYW511J

我目前正试图通过使用T-SQL函数解析值来解决这个问题,就像试图用螺丝刀驱动钉子一样。显然在here之前已经问过这个问题,唯一的建议是在T-SQL中从头开始构建一个解析器,但使用PHP来反序列化它是更好的选择。

如果Magento没有将PHP序列化数据存储在他们的数据库中,然后将其服务仍然在他们的Web服务中序列化,那将是很好的,但这就是我必须使用的。我会考虑使用C#将其转换并将其存储为数据库中的单独字段,但是能够在T-SQL中解析数据会更方便。如果我使用C#来解析和反序列化PHP对象,我可能会将它作为XML存储在数据库中,因为这是一种更好的格式来交换数据。

2 个答案:

答案 0 :(得分:2)

这是我能够想出来的。我受到了post about parsing JSON的鼓舞,并决定将其用于序列化的php对象。我采取了完全不同的方法。

更新的代码示例为now published here on github

序列化的php字符串:

a:3:{
  i:0;
  a:5:{
    s:1:"i";
    s:2:"10";

    s:1:"c";
    s:12:"045EMJJWRCF1";

    s:1:"a";
    d:100;

    s:2:"ba";
    d:100;

    s:10:"authorized";
    d:100;
  }

  i:1;
  a:5:{
    s:1:"i";
    s:2:"11";

    s:1:"c";
    s:12:"06DUQ7Z5GVT7";

    s:1:"a";
    d:101;

    s:2:"ba";
    d:101;

    s:10:"authorized";
    d:101;
  }

  i:2;
  a:5:{
    s:1:"i";
    s:2:"12";

    s:1:"c";
    s:12:"07A6MRYW511J";

    s:1:"a";
    d:102;

    s:2:"ba";
    d:102;

    s:10:"authorized";
    d:102;
  }
}

我的查询以获得结果:

select *
from parsePhpSerializedString('a:3:{i:0;a:5:{s:1:"i";s:2:"10";s:1:"c";s:12:"045EMJJWRCF1";s:1:"a";d:100;s:2:"ba";d:100;s:10:"authorized";d:100;}i:1;a:5:{s:1:"i";s:2:"11";s:1:"c";s:12:"06DUQ7Z5GVT7";s:1:"a";d:101;s:2:"ba";d:101;s:10:"authorized";d:101;}i:2;a:5:{s:1:"i";s:2:"12";s:1:"c";s:12:"07A6MRYW511J";s:1:"a";d:102;s:2:"ba";d:102;s:10:"authorized";d:102;}}')

查询结果:

element_id  parent_id   var_name                                           var_type                                           var_length  value_int   value_string                                                                                                                                                                                                                                                     value_decimal
----------- ----------- -------------------------------------------------- -------------------------------------------------- ----------- ----------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------
1           0           NULL                                               a                                                  3           NULL        NULL                                                                                                                                                                                                                                                             NULL
2           1           0                                                  a                                                  5           NULL        NULL                                                                                                                                                                                                                                                             NULL
3           1           1                                                  a                                                  5           NULL        NULL                                                                                                                                                                                                                                                             NULL
4           1           2                                                  a                                                  5           NULL        NULL                                                                                                                                                                                                                                                             NULL
5           2           i                                                  s                                                  2           NULL        10                                                                                                                                                                                                                                                               NULL
6           2           c                                                  s                                                  12          NULL        045EMJJWRCF1                                                                                                                                                                                                                                                     NULL
7           2           a                                                  d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             100
8           2           ba                                                 d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             100
9           2           authorized                                         d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             100
10          3           i                                                  s                                                  2           NULL        11                                                                                                                                                                                                                                                               NULL
11          3           c                                                  s                                                  12          NULL        06DUQ7Z5GVT7                                                                                                                                                                                                                                                     NULL
12          3           a                                                  d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             101
13          3           ba                                                 d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             101
14          3           authorized                                         d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             101
15          4           i                                                  s                                                  2           NULL        12                                                                                                                                                                                                                                                               NULL
16          4           c                                                  s                                                  12          NULL        07A6MRYW511J                                                                                                                                                                                                                                                     NULL
17          4           a                                                  d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             102
18          4           ba                                                 d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             102
19          4           authorized                                         d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             102

如果我只是想要礼品卡代码,我可以写一个这样的查询:

select value_string
from parsePhpSerializedString('a:3:{i:0;a:5:{s:1:"i";s:2:"10";s:1:"c";s:12:"045EMJJWRCF1";s:1:"a";d:100;s:2:"ba";d:100;s:10:"authorized";d:100;}i:1;a:5:{s:1:"i";s:2:"11";s:1:"c";s:12:"06DUQ7Z5GVT7";s:1:"a";d:101;s:2:"ba";d:101;s:10:"authorized";d:101;}i:2;a:5:{s:1:"i";s:2:"12";s:1:"c";s:12:"07A6MRYW511J";s:1:"a";d:102;s:2:"ba";d:102;s:10:"authorized";d:102;}}')
where   parent_id != 0 and
        var_name = 'c'

结果:

value_string
-------------
045EMJJWRCF1
06DUQ7Z5GVT7
07A6MRYW511J

以下是用于解析序列化PHP字符串的T-SQL函数:

IF OBJECT_ID (N'dbo.parsePhpSerializedString') IS NOT NULL
   DROP FUNCTION dbo.parsePhpSerializedString
GO
CREATE FUNCTION dbo.parsePhpSerializedString( @phpSerialized VARCHAR(MAX))
RETURNS @results table 
    (
        element_id int identity(1,1) not null, /* internal surrogate primary key gives the order of parsing and the list order */
        parent_id int, /* if the element has a parent then it is in this column. */
        var_name varchar(50), /* the name or key of the element in a key/value array list */
        var_type varchar(50),
        var_length int,
        value_int int,
        value_string varchar(max),
        value_decimal numeric
    )
AS
BEGIN

    /*
    Built by Matt Johnson (matt@evdat.com) 2012-08-14
    */

    -- we use this table later for collecting auto generated
    -- identity values when inserting records into @results
    declare @insertedIds table (
        element_id int
    )

    -- define variables
    declare @element_start int
    declare @var_type_end int
    declare @var_type varchar(50)
    declare @element_end int
    declare @chunk varchar(max)
    declare @var_length_start int
    declare @var_length_end int
    declare @var_length_string varchar(max)
    declare @var_length int
    declare @value_start int
    declare @value_end int
    declare @value_string varchar(max)
    declare @value_int int
    declare @value_decimal numeric
    declare @array_level int
    declare @value_string_position int
    declare @next_open int
    declare @next_close int
    declare @parent_id int
    declare @element_id int
    declare @key_element_id int
    declare @inserted_element_id int
    declare @var_name varchar(50)

    --initialize variables
    set @parent_id = 0


    --loop through the supplied @phpSerialized string until it's empty
    while 1=1 begin
        set @element_start = null
        set @var_type_end = null
        set @var_type = null
        set @element_end = null
        set @chunk = null
        set @var_length_start = null
        set @var_length_end = null
        set @var_length_string = null
        set @var_length = null
        set @value_start = null
        set @value_end = null
        set @value_string = null
        set @value_int = null
        set @value_decimal = null
        set @array_level = null
        set @value_string_position = null
        set @next_open = null
        set @next_close = null
        set @var_name = null

        --confirm that there is an element to parse and define its starting point
        --patindex will return a value of 1 if the pattern is found and this pattern
        --will only match if the element starting point is the first character in the
        --supplied string. If it is encapsulated in quotes or anything else it will not match
        set @element_start = patindex('[asid]:%[;}]', @phpSerialized)

        if @element_start <= 0 begin
            --if the supplied string is now empty check the existing results table
            --for any nested elements in any array elements

            --reset the value of @element_id to be safe
            set @element_id = null

            --only retrieve the first element found containing sub elements to parse
            select  top 1 
                    @phpSerialized = value_string,
                    @element_id = element_id    
            from @results 
            where   var_type = 'a' and 
                    value_string is not null

            --set the parent_id to the array's element_id
            set @parent_id = @element_id

            --if there were no results found then that means there either
            --were no arrays to parse, or all arrays have already been parsed
            --so break the continuous loop because we are completely done now
            if @element_id is null break

            --set the @element_start again now that we 
            --have a new string to parse for elements
            set @element_start = patindex('[asid]:%[;}]', @phpSerialized)
        end

        --find the end of the type of the element then extract the variable type from the string
        set @var_type_end = patindex('%:%', @phpSerialized)
        set @var_type = substring(@phpSerialized, @element_start, @var_type_end-@element_start)

        --generate an error if a variable type is supplied that hasn't been coded for.
        if @var_type not like '[asid]' begin
            /*
            print @var_type
            RAISERROR (N'Error parsing php serialized string. Variable type found that has not been defined to parse for.', -- Message text.
                       16, -- Severity,
                       1 -- State
                       )
            */

            --apparently errors can't be raised within a function so skip the element
            break
        end

        --array elements contain sub elements so we use different methods for parsing
        --sub elements than we do for parsing individual elements.
        if @var_type != 'a' begin
            --element has no sub elements

            --determine the end of this individual element and then extract 
            --only this individual element from the string
            set @element_end = patindex('%;%', @phpSerialized)+1
            set @chunk = substring(@phpSerialized, @element_start, @element_end-@element_start)

            --strings are serialized differently than numeric elements
            if @var_type = 's' begin
                --element has var length

                --find the starting and ending positions for the var_length and then extract the length
                set @var_length_start = @var_type_end+1
                set @var_length_end = patindex('%:%', substring(@chunk, @var_length_start, len(@chunk))) + @var_length_start - 1
                set @var_length_string = substring(@chunk, @var_length_start, @var_length_end-@var_length_start)
                if @var_length_string not like '[^0-9]' begin
                    --its nice to verify this is actually a number before casting it as such
                    set @var_length = cast(@var_length_string as int)
                end

                --find the starting and ending positions for the value and then extract the value
                set @value_start = @var_length_end+1
                set @value_end = patindex('%;%', @chunk)
                --a string value is quoted so remove quotes in start and end of substring for value
                --we set the substring starting position +1 just past the start of the quote and then
                --set the length of the extracted string -2 to account for both the starting quote and 
                --ending quote to be removed from the extracted string.
                set @value_string = substring(@chunk, @value_start+1, @value_end-@value_start-2)

            end else begin
                --element does not have a var length

                --find the starting and ending positions for the value and then extract the value as a string
                set @value_start = @var_type_end+1
                set @value_end = patindex('%;%', @chunk)
                set @value_string = substring(@chunk, @value_start, @value_end-@value_start)

                --determine what value type the string should be converted to
                if @var_type = 'i' begin
                    if @value_string not like '[^0-9.]' begin
                        set @value_int = cast(@value_string as int)
                        --clear the value_string because the element's value has been converted to its appropriate type
                        set @value_string = null
                    end
                end else if @var_type = 'd' begin
                    if @value_string not like '[^0-9.]' begin
                        set @value_decimal = cast(@value_string as numeric)
                        --clear the value_string because the element's value has been converted to its appropriate type
                        set @value_string = null
                    end
                end

            end


        end else begin
            --element is array and has sub elements

            --we are going to chop up the string to try and determine its end so we'll
            --first set the string to a variable we can destroy in this process
            set @chunk = @phpSerialized

            --find the starting and ending positions for the var_length and then extract the length
            --arrays use this to state how may elements this array contains
            set @var_length_start = @var_type_end+1
            set @var_length_end = patindex('%:%', substring(@chunk, @var_length_start, len(@chunk))) + @var_length_start - 1
            set @var_length_string = substring(@chunk, @var_length_start, @var_length_end-@var_length_start)
            if @var_length_string not like '[^0-9]' begin
                set @var_length = cast(@var_length_string as int)
            end

            --find the value starting position
            --later we will find the true end of the value
            set @value_start = @var_length_end+1

            -- to determine the ending position we have to dig through the sub elements and track the
            -- nested level to identify the ending brace for this level
            set @array_level = 0
            --we start the string position at 1 for the begining of the serialized string
            set @value_string_position = 1

            -- loop through the value chopping up the chunk while trying to find the ending brace for this array
            while 1=1 begin

                --find the next open and close braces in the chunk
                set @next_open = patindex('%{%', @chunk)
                set @next_close = patindex('%}%', @chunk)

                --check to see which brace is the next in the chunk
                if @next_open > 0 and @next_open < @next_close begin
                    --found an opening brace

                    --since this is an opening brace we need to increment the level and strip off
                    --everything from the chunk before the brace so that we can search for additional braces
                    --we also note the position in the string for use in finding the end of the value later
                    --we track the previous position and add to it because we keep chopping off the beginning of
                    --the chunk as we parse through the string, and later we will need to reference the position
                    --relative to the entire serialized string.
                    set @value_string_position = @value_string_position + patindex('%{%', @chunk)-1
                    set @chunk = substring(@chunk, patindex('%{%', @chunk)+1, len(@chunk))
                    set @array_level = @array_level + 1

                end else if @next_close > 0 begin
                    --found a closing brace
                    --print 'found close at level: ' + cast(@array_level as varchar(10)) + '(' + cast(patindex('%}%', @chunk) as varchar(10)) + ')'

                    --since this is a closing brace we need to decrement the level and strip off
                    --everything from the chunk before the brace so that we can search for additional braces
                    --we also note the position in the string for use in finding the end of the value later
                    --we track the previous position and add to it because we keep chopping off the beginning of
                    --the chunk as we parse through the string, and later we will need to reference the position
                    --relative to the entire serialized string.
                    set @value_string_position = @value_string_position + patindex('%}%', @chunk)+1
                    set @chunk = substring(@chunk, patindex('%}%', @chunk)+1, len(@chunk))
                    set @array_level = @array_level - 1

                end else break

                --once we get back to level 0 we know we've found the end of this array element
                --so break the continuous loop now that we have the ending position
                if @array_level <= 0 break
            end

            --set the ending position of the element and the value since the value is the last part of the element
            set @element_end = @value_string_position
            set @value_end = @element_end
            --an array value is surrounded by braces so remove the braces in start and end of the substring value
            --we set the substring starting position +1 just past the start of the opening brace and then
            --set the length of the extracted string -2 to account for both the opening brace and 
            --closing brace to be removed from the extracted string.
            set @value_string = substring(@phpSerialized, @value_start+1, @value_end-@value_start-2)
            set @chunk = substring(@phpSerialized, @element_start, @element_end-@element_start)

            -- if the array is empty just set it to null so that
            -- we don't try and parse the contents of the array value later.
            if @value_string = '' set @value_string = null
        end

        --we populate the results table differently depending on the element that is being parsed. 
        --Any element contained in an array has a key element and a value element. Though we parse
        --all key elements in an array all we do with them in the results table is set their value
        --as the var_name (key) for the element, and store the key_element_id for setting the value
        --in the next pass since key/value pairs are listed sequentially in a serialized string.
        if @parent_id > 0 and @key_element_id > 0 begin
            --parent_id > 0 indicates this is a sub element inside an array
            --and the array contents currently being parsed contain key elements and value
            --elements sequentially in the serialized string. Because key_element_id is not 0
            --we know this must be the value part of the element contained in the array as the
            --key was just previously defined.

            --update the existing keyed element with it's type, length and value
            update @results
            set var_type = @var_type, 
                var_length = @var_length, 
                value_string = @value_string, 
                value_int = @value_int, 
                value_decimal = @value_decimal
            where element_id = @key_element_id      

            set @key_element_id = null
        end else if @parent_id > 0 begin
            --this element is a part of the array which contains key/value paris and since
            --the @key_element_id is 0 or not defined we can asume this is the key of the pair

            --determine what the key element type is and cast it as a string to the var_name
            if @var_type = 'i' set @var_name = cast(@value_int as varchar(50))
            if @var_type = 'd' set @var_name = cast(@value_decimal as varchar(50))
            if @var_type = 's' set @var_name = @value_string

            --here we are assuming that all keys should be of type s, d, or i... so we if get anything
            --other than that throw an error since this code is not setup to handle that.
            if @var_type not like '[sid]' begin
                /*
                print @var_type
                RAISERROR (N'Error parsing php serialized string. Variable type found that should be akey but is not of type s, d, or i.', -- Message text.
                           16, -- Severity,
                           1 -- State
                           )
                */

                --apparently errors can't be raised within a function so skip the element
                break
            end

            --insert a new record into the results table defining the parent_id and var_name
            insert @results 
            (
                parent_id,
                var_name
            ) 
            output inserted.element_id into @insertedIds 
            values (
                @parent_id,
                @var_name
            )

            --since we stored the identity value in the output set that value to @inserted_element_id
            select top 1 @inserted_element_id = element_id from @insertedIds
            delete from @insertedIds

            --set the key_element_id so that the next pass catches the value and assigns it to this result record
            set @key_element_id = @inserted_element_id

        end else begin
            --this will be executed for parsed strings that are not part of an array
            --in which case the elements do not contain value_name keys

            --insert the entire element details into the results table
            insert into @results 
            (
                parent_id, 
                var_type, 
                var_length, 
                value_string, 
                value_int, 
                value_decimal
            ) 
            output inserted.element_id into @insertedIds 
            values (
                @parent_id,
                @var_type,
                @var_length,
                @value_string,
                @value_int,
                @value_decimal
            )

            --here we capture the identiy value for the inserted record
            select top 1 @inserted_element_id = element_id from @insertedIds
            delete from @insertedIds
        end

        -- if the current php serilized string was an array then it would have
        -- been parsed and elements added to the results, so we should remove
        -- the value_string from the array element in the results table so that
        -- it doesn't get parsed again in the loop. 
        if @element_id is not null begin
            --the only strings that get parsed where the @element_id has
            --a value is from an array element
            update @results
            set value_string = null
            where element_id = @element_id
        end

        --since we have parsed this element from the serialized string chop off this element
        --from the string and run the rest of it through the loop again to ensure all
        --elements have been parsed from the supplied serialized string.
        set @phpSerialized = substring(@phpSerialized, @element_end, len(@phpSerialized))

    end

    return
end

尽管它没有考虑所有序列化类型等,但这对我的目的来说已经足够了,所包含的评论应该有助于指导任何需要根据自己的需要进行调整或扩展它的人#&# 39;功能。

答案 1 :(得分:0)

这是另一个MSSQL函数,您可以使用它从PHP序列化字符串中提取令牌的第一个实例的值,这些函数通常位于Magento的数据库中。该函数接受要提取的值的属性名称,以及要从中解析的序列化值:

CREATE FUNCTION extractValueFromSerializedPhpString(
    @attribute_name nvarchar(100), @serialized_value nvarchar(max)
) RETURNS nvarchar(1024)
BEGIN
    DECLARE @attribute_value_length int, @attribute_value_length_string nvarchar(5)
    SELECT @attribute_value_length_string = REPLACE(REPLACE(SUBSTRING(@serialized_value,CHARINDEX(@attribute_name,@serialized_value)+LEN(@attribute_name)+4,3),':',''),'"','')
    IF ISNUMERIC(@attribute_value_length_string) = 1 BEGIN
        SELECT @attribute_value_length = CAST(@attribute_value_length_string AS int)
        RETURN SUBSTRING(@serialized_value,CHARINDEX(@attribute_name,@serialized_value)+LEN(@attribute_name)+CASE WHEN @attribute_value_length > 99 THEN 9 WHEN @attribute_value_length > 9 THEN 8 ELSE 7 END,@attribute_value_length)
    END
    RETURN NULL
END

请注意,这仅适用于长度为1到999的字符串值,并且可能还有各种其他恼人的边缘情况,但它适用于我的(简单)目的。