将二进制解释为2位对

时间:2014-03-12 13:34:57

标签: vb.net binary

我有代码将文本(从Result.Text)转换为8行的二进制文件:

Dim Resultconvert As String = String.Empty
For Each C As Char In Result.Text
    Dim s As String = System.Convert.ToString(AscW(C), 2).PadLeft(8, "0")
    Debug.Print(s)
    Resultconvert &= s
Next

输出:

00110101
01000111
01010100
00111111
00101111
01111010
01100100
00111011
00101010

但是,现在我需要它将字符串解释为2位对,

00=A, 01=T, 10=G, 11=C

所以上面的内容将被解释并输出到控制台

ACTT
TATC
TTTA
ACCC
AGCC
TCGG
TGTA
ACGC
AGGG

因此,基本上将0到1之间的实数转换为四元系统(AGCT)

如果您有任何想法,请告诉我。任何帮助表示赞赏。提前谢谢。

2 个答案:

答案 0 :(得分:1)

啊,DNA。凉。关于真正的DNA的一个问题是,这些链是巨大,因此对于真实数据,性能肯定是重要的。考虑到这一点,我认为最好的办法是在从System.IO.StringReader读取的开关/状态机中构建自己的。避免使用ReadLine()或Split(),因为这些将涉及读取两次相同的数据。我在想这样的事情(警告:未经测试/直接输入回复框):

Function ConvertToIUPAC(Byval data As String) As String
    Dim result As New StringBuilder(data.Length/10 * 6) 'assumes 2-byte line endings

    Dim pair() as Character
    Dim rdr As New StringReader(data)
    Do
        For i As Integer = 0 To 3

            rdr.ReadBlock(pair, 0, 2)
            If pair[0] = "0"c  Then
                If pair[1] = "0"c Then
                    result.Append("A"c)
                Else
                    result.Append("T"c)
                End If
            Else
                If pair[1] = "0"c Then
                    result.Append("G"c)
                Else
                    result.Append("C"c)
                End If 
            End If
        Next
        result.Append(VbCrLf)
        rdr.ReadLine()

    Loop Until rdr.Peek() = -1 
    Return result.ToString()
End Function

答案 1 :(得分:1)

首先,您需要将包含二进制数的字符串拆分固定长度(2)。不幸的是,没有内置函数可以将字符串拆分为固定的列宽。但是,编写自己的方法相当容易,例如:

Public Function SplitStringByLength(value As String, length As Integer) As String()
    Dim result((value.Length \ length) - 1) As String
    For i As Integer = 0 To result.Length - 1
        result(i) = value.Substring((i * length), length)
    Next
    Return result
End Function

然后,您可以调用该方法并转换所有二进制对,如下所示:

Dim Resultconvert As String = String.Empty
For Each C As Char In Result.Text
    Dim s As String = System.Convert.ToString(AscW(C), 2).PadLeft(8, "0")
    Dim quaternary As String = ""
    For Each pair As String In SplitStringByLength(s, 2)
        Select Case pair
            Case "00": quaternary &= "A"
            Case "01": quaternary &= "T"
            Case "10": quaternary &= "G"
            Case "11": quaternary &= "C"
        End Select
    Next
    Debug.Print(s)
    Resultconvert &= s
Next

然而,所有这些都将数字转换为二进制数的字符串表示然后解析该字符串是相当低效和不必要的。该数字已经以二进制形式存储在内存中,因此只需稍加使用逐位操作,您就可以执行相同的操作而无需转换为字符串。例如,如果您有这样的方法:

Public Function ToQuaternary(value As Integer) As String
    Select Case value
        Case 0 : Return "A"  ' binary 00
        Case 1 : Return "T"  ' binary 01
        Case 2 : Return "G"  ' binary 10
        Case 3 : Return "C"  ' binary 11
        Case Else : Return Nothing
    End Select
End Function

然后你可以做这样的事情:

Dim builder As New StringBuilder()
For Each c As Char In Result.Text
    Dim charValue As Integer = AscW(c)
    builder.Append(ToQuaternary((charValue >> 6) And 3))
    builder.Append(ToQuaternary(charValue >> 4) And 3))
    builder.Append(ToQuaternary(charValue >> 2) And 3))
    builder.Append(ToQuaternary(charValue And 3))
    builder.AppendLine()
Next
Debug.Print(builder.ToString())
相关问题