将二进制数据批量转换为十进制数

时间:2018-02-07 15:06:13

标签: cmd binary decimal data-conversion

有没有人知道某种方式 - 比如cmd脚本 - 能够将1 gb和10 gb的二进制数据转换为十进制数据?

以下是前100个字符: 1000110101100100111101011000010100110001011110001101101110000011001000011010110111100110100100111111 ...

1 - 我需要的是首先选择4个字符串并删除稍后输出0或转换后大于9的数字的方法。

1000 -> (8)
1101 -> (13) = delete
0110 -> (6)
0100 -> (4)
1111 -> (15) = delete
0101 -> (5)
1000 -> (8)
0101 -> (5)
0011 -> (3)
0001 -> (1)
0111 -> (7)
1000 -> (8)
1101 -> (13) = delete
1011 -> (11) = delete
1000 -> (8)
0011 -> (3)
0010 -> (2)
0001 -> (1)
1010 -> (10) = delete
1101 -> (13) = delete
1110 -> (14) = delete
0110 -> (6)
1001 -> (9)
0011 -> (3)
1111 -> (15) = delete
...

在此过程之后,文件original_binary_data.txt必须不包含输出0的4个字符串或大于9的数字:

10000110010001011000010100110001011110001000001100100001011010010011 ...

2 - 在该步骤之后,我想将二进制数据转换为十进制数据。 如果我们采用上面的例子,那么结果将是:

1000 -> 8
0110 -> 6
0100 -> 4
0101 -> 5
1000 -> 8
0101 -> 5
0011 -> 3
0001 -> 1
0111 -> 7
1000 -> 8
1000 -> 8
0011 -> 3
0010 -> 2
0001 -> 1
0110 -> 6
1001 -> 9
0011 -> 3
...

这应该导致文件converted_decimal_data.txt包含如下内容:

86458531788321693 ...

注意:二进制数据文件中除“0”或“1”外没有其他字符。

我需要这样做的原因是因为我需要在1-9之间理解大量的随机数据来进行重要的实验。

3 个答案:

答案 0 :(得分:2)

PowerShell的经过测试和运行的解决方案(比批处理解决方案快约200倍):

#################################################################################################################
#
# Converts 4 digit binary strings to decimal while sorting out all strings which equal 0 or are greater than 9.
#
# Adjust the source directory and input and output file names (files don't have to be .txt files).
#
$source = "C:\adjust\path"
$input_file = "file_name.extension"
$dec_output = "dec_output.txt"
$bin_output = "bin_output.txt"
#
#
# Using Get-Content on an input file with a size of 1GB or more will cause System.OutOfMemoryExceptions,
# therefore a large file gets temporarily split up.
#
$split_size = 100MB
$echo_ss = $split_size/1MB
#
#
# This adds carriage returns to the temporary file after each 16'384th character. Although the sweet spot is
# somewhere around 18'000 characters, the line length needs to be dividable by 4 and at best fit exactly n times
# into the temporary file; using 16'384 or characters (that is exactly 16 KB) ensures that.
#
$line_length = 16384
#
#
# Thanks @BenN (https://superuser.com/a/1292916/868077)
# Thanks @Bob (https://superuser.com/a/1295082/868077)
#################################################################################################################

Set-Location $source

if (Test-Path $bin_output) {

    $name = (Get-Item $bin_output).Basename
    $ext = (Get-Item $bin_output).Extension
    $num = 1

    while ($num -le 9999) {

        $test = $name+"_"+$num+$ext

        if (Test-Path $test) {

            $num += 1

        } else {

            break

        }

    }

    Rename-Item $bin_output $test
    $a = "`n`n Renamed 'bin_output'!"

}

if (Test-Path $dec_output) {

    $name = (Get-Item $dec_output).Basename
    $ext = (Get-Item $dec_output).Extension
    $num = 1

    while ($num -le 9999) {

        $test = $name+"_"+$num+$ext

        if (Test-Path $test) {

            $num += 1

        } else {

            break

        }

    }

    Rename-Item $dec_output $test
    $b = "`n`n Renamed 'dec_output'!"

}

if (Test-Path ".\_temp") {

    "`n"

    while ($overwrite -ne "true" -and $overwrite -ne "false") {

        $overwrite = Read-Host ' Splitted files already/still exists! Delete and recreate?'

        if ($overwrite -match "y") {

            $overwrite = "true"
            Remove-Item .\_temp -force -recurse
            $c = " Deleted existing splitted files and creating new ones!"

        } elseif ($overwrite -match "n") {

            $overwrite = "false"

        } elseif ($overwrite -match "c") {

            exit

        } else {

            "`n"
            Write-Host " Error: Invalid input!" "`n" " Type 'y' for 'yes'." " Type 'n' for 'no'." " Type 'c' for 'cancel'."
            "`n"
            "`n"

        }

    }

}

Clear-Host

"`n"

while ($delete -ne "true" -and $delete -ne "false") {

    $delete = Read-Host ' Delete splitted files afterwards?'

    if ($delete -match "y") {

        $delete = "true"
        $d = "`n`n Splitted files will be deleted afterwards!"

    } elseif ($delete -match "n") {

        $delete = "false"
        $d = "`n`n Splitted files will not be deleted afterwards!"

    } elseif ($delete -match "c") {

        exit

    } else {

            "`n"
            Write-Host " Error: Invalid input!" "`n" " Type 'y' for 'yes'." " Type 'n' for 'no'." " Type 'c' for 'cancel'."
            "`n"
            "`n"

    }

}

Clear-Host

"`n"; "`n"; "`n"; "`n"; "`n"; "`n"

$a
$b
$d

$start_o = (Get-Date)

if ($overwrite -ne "false") {

    $c
    "`n"
    $start = Get-Date
    New-Item -ItemType directory -Path ".\_temp" >$null 2>&1
    [Environment]::CurrentDirectory = Get-Location
    $bytes = New-Object byte[] 4096
    $in_file = [System.IO.File]::OpenRead($input_file)
    $file_count = 0
    $finished = $false

    if ((Get-Item $input_file).length -gt $split_size) {

        Write-Host " Input file larger than $echo_ss MB!"
        Write-Host "     Splitting input file and inserting carriage returns..."
        $v=([MATH]::Floor([decimal]((Get-Item $input_file).Length/100MB)))
        $sec_rem = -1

        while (!$finished) {

            $perc = [MATH]::Round($file_count/$v*100)
            $file_count++
            $bytes_to_read = $split_size
            $out_file = New-Object System.IO.FileStream ".\_temp\_temp_$file_count.tmp",CreateNew,Write,None

            while ($bytes_to_read) {

                $bytes_read = $in_file.Read($bytes, 0, [Math]::Min($bytes.Length, $bytes_to_read))

                if (!$bytes_read) {

                    $finished = $true
                    break

                }

                $bytes_to_read -= $bytes_read
                $out_file.Write($bytes, 0, $bytes_read)

            }

            if (($i = $file_count-1) -gt 0) {

                (Get-Content ".\_temp\_temp_$i.tmp") -Replace ".{$line_length}", "$&`r`n" | Set-Content ".\_temp\_temp_$i.tmp"

            }

            $out_file.Dispose()
            $sec_elaps = (Get-Date) - $start
            $sec_rem = ($sec_elaps.TotalSeconds/$file_count) * ($v-$file_count+1)
            Write-Progress -Id 1 -Activity "Splitting input file and inserting carriage returns..." -Status "Progress ($perc%):" -PercentComplete ($perc) -SecondsRemaining $sec_rem

        }

        $in_file.Dispose()
        (Get-Content ".\_temp\_temp_$file_count.tmp") -Replace ".{$line_length}", "$&`r`n" | Set-Content ".\_temp\_temp_$file_count.tmp"
        Write-Progress -Id 1 -Activity null -Completed

    } else {

        if ((Get-Item $input_file).length -lt $split_size) {

            " Input file smaller than $echo_ss MB!"

        } else {

            " Input file exactly $echo_ss MB!"

        }

        Write-Host "  Inserting carriage returns..."
        (Get-Content $input_file) -Replace ".{$line_length}", "$&`r`n" | Set-Content ".\_temp\_temp_1.tmp"; $file_count = 1

    }

    $dur = (Get-Date) - $start
    Write-Host "`n     Done! Duration:"$dur.ToString("hh\:mm\:ss\.fff")
    "`n"

} else {

    "`n"
    Write-Host " Continuing with existing files..."
    "`n"
    Get-ChildItem ".\_temp\*" -File -Include *.tmp | ForEach-Object -Process {$file_count++}

}

Write-Host " Converting binary into decimal..."
$sec_rem = -1
$start = Get-Date

Get-ChildItem ".\_temp\*" -File -Include *.tmp | ForEach-object -Process {

    $cur_file++
    $line_count = (Get-Content ".\_temp\_temp_$cur_file.tmp").count

    ForEach ($line in Get-Content ".\_temp\_temp_$cur_file.tmp") {

        $cur_line++
        $perc = [MATH]::Round(($cur_file-1+($cur_line/$line_count))/$file_count*100)
        $n = 0

        if ($line.length -ge 4) {

            while ($n -lt $line.length) {

                $dec = 0
                $bin = $line.substring($n,4)
                $dec = ([Convert]::ToInt32($bin,2))

                if ($dec -gt 0 -and $dec -le 9) {

                    $temp_dec = "$temp_dec$dec"
                    $temp_bin = "$temp_bin$bin"

                }

                $n += 4

            }

        $temp_dec | Add-Content $dec_output -Encoding ascii -NoNewline
        $temp_bin | Add-Content $bin_output -Encoding ascii -NoNewline
        Clear-Variable -Name "temp_dec", "temp_bin"

        }

        $sec_elaps = (Get-Date) - $start
        $sec_rem = ($sec_elaps.TotalSeconds/($cur_file-1+($cur_line/$line_count))) * ($file_count-($cur_file-1+($cur_line/$line_count)))
        Write-Progress -ID 2 -Activity "Converting binary into decimal..." -Status "Progress ($perc%):" -PercentComplete ($perc) -SecondsRemaining $sec_rem -CurrentOperation "Current file: '_temp_$cur_file.tmp'"

    }

    Clear-Variable -Name "cur_line"

}

Write-Progress -Activity null -Completed
$dur = (Get-Date) - $start
Write-Host "`n     Done! Duration:"$dur.ToString("hh\:mm\:ss\.fff")
"`n"

if ($delete -eq "true") {

    Remove-Item ".\_temp" -Force -Recurse

}

"`n"
"`n"
Write-Host " Script finished!" 
Write-Host "     Start time:   "$start_o.ToString("dd\.MM\.yyyy\ hh\:mm\:ss\.fff")
Write-Host "     End time:     "(Get-Date).ToString("dd\.MM\.yyyy\ hh\:mm\:ss\.fff")
$dur = (Get-Date) - $start_o
Write-Host "`n     Duration:     "$dur.ToString("hh\:mm\:ss\.fff")
"`n`n`n"

Pause
Exit

此脚本需要1分钟10 MB或大约100分钟1 GB(比批处理解决方案快<200>)

----------------------------------------------- -------------------------------------------------- -------

经过测试,工作且不实用,但批量解决方案速度提高约3倍(比之前的批量版本):

@ECHO OFF
SETLOCAL EnableDelayedExpansion
TITLE Converting binary to decimal...
COLOR 0B

REM *********************************************

REM Set source directory!
SET "source=C:\adjust\path"

REM Set source file
SET "file_name=adjust_name.extension"

REM *********************************************


CD %source%

IF EXIST binary_output.txt SET "bin_exist=binary_output.txt " && SET "exist_and=and "
IF EXIST decimal_output.txt SET "dec_exist=decimal_output.txt" && SET "dec_exist_i=%exist_and%decimal_output.txt "

IF NOT "%bin_exist%"=="" (CALL :choice) ELSE (IF NOT "%dec_exist%"=="" CALL :choice)
CLS

powershell -Command "& {$B=$Env:file_name; (gc $B) -replace '.{4}' , """"$&`r`n"""" | sc temp.txt}"


SET time_short=%TIME:~0,2%:%TIME:~3,2%:%TIME:~6,2%
ECHO.
ECHO  %time_short%:
ECHO  Converting binary to decimal...
SET "startTime=%time: =0%"

FOR /F "tokens=*" %%G IN (temp.txt) DO (
    SET "line=%%G"
    CALL :check_line
)

CALL :log_dec
CALL :duration
DEL temp.txt >nul
ECHO.
ECHO  Done^^! & ECHO. & ECHO  Duration: %hh:~1%%time:~2,1%%mm:~1%%time:~2,1%%ss:~1% & ECHO.
PAUSE
EXIT


:check_line
IF "!line!"=="" EXIT /B

SET "char1=!line:~0,1!
SET "char2=!line:~1,1!
SET "char3=!line:~2,1!
SET "char4=!line:~3,1!

SET "decimal=0"
IF %char4%==1 SET /A "decimal=1"
IF %char3%==1 SET /A "decimal=%decimal%+2"
IF %char2%==1 SET /A "decimal=%decimal%+4"
IF %char1%==1 SET /A "decimal=%decimal%+8"

IF %decimal% EQU 0 EXIT /B
IF %decimal% GTR 9 EXIT /B

SET "binary_output=!binary_output!%line%"
SET "decimal_output=!decimal_output!%decimal%"

SET /A "line_number=%line_number%+1"
IF !line_number!==2043 CALL :log_bin
IF !line_number!==4086 CALL :log_bin
IF !line_number!==6129 CALL :log_bin
IF !line_number!==8172 CALL :log_dec
EXIT /B

:log_bin
SET /P "=!binary_output!" <nul >> "%source%\binary_output.txt"
SET "binary_output="
EXIT /B

:log_dec
SET /P "=!binary_output!" <nul >> "%source%\binary_output.txt"
SET /P "=!decimal_output!" <nul >> "%source%\decimal_output.txt"
SET "binary_output=" & SET "decimal_output=" & SET "line_number=0"
EXIT /B


:duration
SET "endTime=%time: =0%"
SET "end=!endTime:%time:~8,1%=%%100)*100+1!"  &  SET "start=!startTime:%time:~8,1%=%%100)*100+1!"
SET /A "elap=((((10!end:%time:~2,1%=%%100)*60+1!%%100)-((((10!start:%time:~2,1%=%%100)*60+1!%%100)"
SET /A "cc=elap%%100+100,elap/=100,ss=elap%%60+100,elap/=60,mm=elap%%60+100,hh=elap/60+100"
EXIT /B


:choice
ECHO. & ECHO  %bin_exist%%dec_exist_i%already exists^^! & ECHO.
CHOICE /C RDC /N /M "[R]ename / [D]elete / [C]ancle"
IF ERRORLEVEL ==3 EXIT
IF ERRORLEVEL ==2 DEL %bin_exist%%dec_exist%
IF ERRORLEVEL ==1 CALL :rename
EXIT /B

:rename
IF NOT "%bin_exist%"=="" (
    IF EXIST binary_output_*.txt (
        FOR /F %%A IN ('DIR binary_output_*.txt /B /O:N') DO (
            SET "file_name=%%~nA"
            SET "file_num_1=!file_name:binary_output_=!
            SET /A "file_num_1=!file_num_1!+1"
        )
        REN binary_output.txt binary_output_!file_num_1!.txt
    ) ELSE (REN binary_output.txt binary_output_1.txt)
)

IF "%dec_exist%"=="" EXIT /B
IF EXIST decimal_output_*.txt (
    FOR /F %%B IN ('DIR decimal_output_*.txt /B /O:N') DO (
        SET "file_name=%%~nB"
        SET "file_num_2=!file_name:decimal_output_=!
        SET /A "file_num_2=!file_num_2!+1"
    )
    REN decimal_output.txt decimal_output_!file_num_2!.txt
) ELSE (REN decimal_output.txt decimal_output_1.txt)
EXIT /B

这将创建一个临时文件来读取(temp.txt),二进制输出文件(binary_output.txt)和十进制输出文件(decimal_output.txt)。

当脚本完成时temp.txt将被删除 - 或者我应该说“如果脚本完成”:

我的意思是......对于一个大小为80KB的简单.txt文件,这个脚本需要不到1.5分钟;因此1GB文件大约需要315h - 或 13天!

这可能不是完美的批处理解决方案,但如果您必须转换大小为10GB的文件,那么即使是完美的批处理解决方案也需要几天(如果不是几周甚至几个月)来处理近 110亿01 (10GB正好是10'737'418'240字节)。

我不知道你需要什么,也许你有一台24/7/365运行的机器甚至可以转换10GB文件,但是如果你需要在这十年左右的时间内得到结果,你应该找一个非 - 解决方案......

但是,如果不考虑时间因素,这是一个非常有效的解决方案! :)

----------------------------------------------- -------------------------------------------------- -------

我在闲置时总是使用相同的CPU。在不同的系统上,所有时间可能会有很大差异!

使用的CPU: i7-4820K @ 3.70GHz(四核)

另外:感谢@BenN@Bob帮助我herehere

----------------------------------------------- -------------------------------------------------- -------

编辑(08/02/18):

添加了更快的解决方案和小型界面。

不是将每个4位二进制字符串直接添加到输出文件,而是将它们添加到变量中,直到此变量达到8172个字符,然后将所述变量添加到输出文件中。这导致过程快了近3倍(上面的持续时间已经调整过了)!

为什么选择8172?因为8174是var的限制(在Windows 10上批量设置,不确定其他Windows版本),但是不能被4分割,因此最后的二进制字符串不会被添加到输出中文件。显然,在到达第8172行之前,二进制输出变量将超过其限制的4倍(实际上是4次减去2行),因此所述变量每2043行添加一次。

编辑(12/02/18):

为PowerShell 5.0添加了解决方案。

我还认为,添加哪个CPU是明智的,因为两个脚本(显然)都非常依赖于CPU速度和/或内核。

编辑(12/02/18):

为PowerShell 4.0添加了解决方案,并在PowerShell 5.0中引入了-NoNewline的{​​{1}}选项。

编辑(16/02/18):

在我尝试在大小为1GB(或更多)的文件上使用Out-File时遇到System.OutOfMemoryException后,添加了对大小为1GB或更大的文件的支持。

为PowerShell解决方案添加了一个小接口。

Get-Content更改为Out-File,因为自PowerShell 4.0以来它附带了Add-Content选项,并且删除了1个额外的PowerShell解决方案。

编辑(19/02/18)

修复了在处理文件之前删除文件的问题。

编辑(20/02/18)

改进了界面。添加了进度条,其中包含估计的剩余时间以及转换现有文件的可能性。

将变量-NoNewline更改为$input

编辑(20/02/18)

修正了$input_file无法处理单个文件的错误。

编辑(03/03/18)

添加了保留拆分文件的选项。

改进整体格式。

答案 1 :(得分:1)

您尚未发布任何自己的代码。这似乎是一个有趣的问题,但假装看起来很可疑。在cmd.exe脚本中执行此操作会有问题。最有可能的是,PowerShell,Python,Perl或其他语言更合适。

这可能不是最快的实现,但是这个PowerShell脚本似乎可以工作。

[CmdletBinding()]
param (
    # The path to the file you want to read.
    [Parameter(Mandatory = $true, Position = 0)]
    [ValidateNotNullOrEmpty()]
    [string] $InFile

    ,[Parameter(Mandatory = $true, Position = 1)]
    [ValidateNotNullOrEmpty()]
    [string] $OutFile
)

$digits = "0123456789"

if (Test-Path -Path $InFile) {
    try {
        $resolvedPath = Resolve-Path -Path $InFile

        $fileStream = New-Object -TypeName System.IO.FileStream -ArgumentList ($resolvedPath, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
        $fileReader = New-Object -TypeName System.IO.BinaryReader -ArgumentList $fileStream
        $stream = [System.IO.StreamWriter] $OutFile

        [byte[]]$abytes = $fileReader.ReadBytes(1)
        while ($abytes.length -ne 0) {
            [byte]$abyte = $abytes[0]
            [System.Byte[]]$outbytes = @()

            Write-Verbose "Got byte ===$abyte==="
            $high = $abyte -shr 4
            Write-Verbose "High is ===$high==="
            if ($high -lt 10) { $outbytes += $digits[$high] }

            $low = $abyte -band 0x0F
            Write-Verbose "Low is ===$low==="
            if ($low -lt 10) { $outbytes += $digits[$low] }

            $stream.Write([char[]]$outbytes)

            [byte[]]$abytes = $fileReader.ReadBytes(1)
        }

    }
    catch {
        Write-Warning $_.Exception.Message
    }

    finally {
        $stream.close()
        $fileReader.Dispose()
        $fileStream.Dispose()
    }
} else {
    Write-Warning "$Path not found!"
}

将此代码保存在扩展名为.ps1的文件中。也许是myconvert.ps1。然后,从PowerShell运行它。

.\myconvert.ps1 .\infile.txt .\outfile.txt

如果必须从cmd.exe shell运行它。

powershell -NoProfile -Command ".\myconvert.ps1 .\infile.txt .\outfile.txt"

编辑以下是使用它的示例。

C:>type readnibbles.txt
abcdefghijklmnopqrstuvwxyz

C:>powershell -NoProfile -Command "Format-Hex -Path .\readnibbles.txt"

           Path: C:\src\t\readnibbles.txt

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70  abcdefghijklmnop
00000010   71 72 73 74 75 76 77 78 79 7A 0D 0A              qrstuvwxyz..


C:>powershell -NoProfile -Command ".\readnibbles.ps1 .\readnibbles.txt .\readnibbles-out.txt"

C:>type readnibbles-out.txt
61626364656667686966666670717273747576777879700

答案 2 :(得分:0)

下载并安装Python。 Python是一种轻量级的解释语言,具有易于使用的语法(一旦你超越了Tabs vs. Space的东西),并且它在高效的字节码中运行,它仍然非常快。它的解释性质非常适合在其内置控制台中进行尝试,您可以使用任何纯文本编辑器来编写和编辑更大的程序。

创建一个0MB和1的10MB大“随机”文件后,我以交互方式探索了您的任务所需的内容。这个7线是我最终的结果。它打开一个文件用于输入,一个用于输出(它们不需要使用此语法显式关闭)并从输入中读取4个字符的批次,直到遇到文件结尾。 4个二进制字符将转换为整数,如果它在您想要的值之间,则将其写入输出文件。

$.ajaxSetup({
  async: true
})

用户需要大约5秒的时间来完成我的10MB样本。外推1GB数据¹,需要100 * 5秒或~8分钟。

这次可能会大幅削减;读写更大的缓冲区可能值得一试。但是,我想你需要花费超过8分钟的时间才能做到正确。如果您不期待处理10GB文件(大约需要1.5小时),您可以花一些时间来试用它。

或者在午休期间按原样运行。

¹嗯...假设读取和写入比例O(n),这可能不适用于您的(或任何)系统。我不会创建这么大的虚拟文件只是为了看它是否存在。