numpy数组的背景

Question

我是Python的新手，并不了解.dtype的作用。
例如：

>>> aa
array([1, 2, 3, 4, 5, 6, 7, 8])
>>> aa.dtype = "float64"
>>> aa
array([  4.24399158e-314,   8.48798317e-314,   1.27319747e-313,
     1.69759663e-313])

我认为dtype是aa的属性，应该是int，如果我指定aa.dtype = "float64"，则aa应该变为array([1.0 ,2.0 ,3.0, 4.0, 5.0, 6.0, 7.0, 8.0])。

为什么它会改变它的价值和大小？
这是什么意思？

我实际上是从一段代码中学习的，我应该把它粘贴在这里：

def to_1d(array):
 """prepares an array into a 1d real vector"""
    a = array.copy() # copy the array, to avoid changing global
    orig_dtype = a.dtype
    a.dtype = "float64" # this doubles the size of array
    orig_shape = a.shape
    return a.ravel(), (orig_dtype, orig_shape) #flatten and return

我认为它不应该改变输入数组的值，而只是改变它的大小。对功能如何运作感到困惑

Answer 1

首先，您正在学习的代码存在缺陷。几乎可以肯定的是，根据代码中的注释，原作者认为它没有做到。

作者可能的意思是：

def to_1d(array):
    """prepares an array into a 1d real vector"""
    return array.astype(np.float64).ravel()

但是，如果array总是一个复数数组，那么原始代码就会有所帮助。

查看数组（a.dtype = 'float64'等同于执行a = a.view('float64')）的唯一情况是，如果它是一个复杂的数组（numpy.complex128）或一个128位的浮点数，它的大小会增加一倍阵列。对于任何其他dtype，它没有多大意义。

对于复杂数组的特定情况，原始代码会将np.array([0.5+1j, 9.0+1.33j])之类的内容转换为np.array([0.5, 1.0, 9.0, 1.33])。

更清晰的写作方式是：

def complex_to_iterleaved_real(array):
     """prepares a complex array into an "interleaved" 1d real vector"""
    return array.copy().view('float64').ravel()

（我现在忽略了关于返回原始dtype和形状的部分。）

numpy数组的背景

要解释这里发生了什么，你需要了解numpy数组是什么。

numpy数组由“原始”内存缓冲区组成，通过“views”将其解释为数组。您可以将所有numpy数组视为视图。

在numpy意义上，视图只是一种切片和切割相同内存缓冲区而不进行复制的不同方式。

视图具有形状，数据类型（dtype），偏移和步幅。在可能的情况下，numpy数组上的索引/整形操作将只返回原始内存缓冲区的视图。

这意味着y = x.T或y = x[::2]之类的内容不会使用任何额外的内存，也不会复制x。

所以，如果我们有一个类似的数组：

import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])

我们可以通过以下两种方式重塑它：

x = x.reshape((2, 5))

或

x.shape = (2, 5)

为了便于阅读，第一个选项更好。但它们（几乎）完全相同。两个人都不会制作一个会占用更多内存的副本（第一个会产生一个新的python对象，但现在不在这一点上。）。

Dtypes和views

同样适用于dtype。我们可以通过设置x.dtype或调用x.view(...)将数组视为不同的dtype。

所以我们可以这样做：

import numpy as np
x = np.array([1,2,3], dtype=np.int)

print 'The original array'
print x

print '\n...Viewed as unsigned 8-bit integers (notice the length change!)'
y = x.view(np.uint8)
print y

print '\n...Doing the same thing by setting the dtype'
x.dtype = np.uint8
print x

print '\n...And we can set the dtype again and go back to the original.'
x.dtype = np.int
print x

哪个收益率：

The original array
[1 2 3]

...Viewed as unsigned 8-bit integers (notice the length change!)
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...Doing the same thing by setting the dtype
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...And we can set the dtype again and go back to the original.
[1 2 3]

但请记住，这会让您对内存缓冲区的解释方式进行低级控制。

例如：

import numpy as np
x = np.arange(10, dtype=np.int)

print 'An integer array:', x
print 'But if we view it as a float:', x.view(np.float)
print "...It's probably not what we expected..."

这会产生：

An integer array: [0 1 2 3 4 5 6 7 8 9]
But if we view it as a float: [  0.00000000e+000   4.94065646e-324   
   9.88131292e-324   1.48219694e-323   1.97626258e-323   
   2.47032823e-323   2.96439388e-323   3.45845952e-323
   3.95252517e-323   4.44659081e-323]
...It's probably not what we expected...

因此，在这种情况下，我们将原始内存缓冲区的底层位解释为浮点数。

如果我们想要将整个int重新制作为浮点数，我们会使用x.astype（np.float）。

复数

复数存储（在C，python和numpy中）作为两个浮点数。第一个是实部，第二个是虚部。

所以，如果我们这样做：

import numpy as np
x = np.array([0.5+1j, 1.0+2j, 3.0+0j])

我们可以看到真实的（x.real）和虚构的（x.imag）部分。如果我们将它转换为浮点数，我们会收到关于丢弃虚部的警告，我们将得到一个只包含实部的数组。

print x.real
print x.astype(float)

astype制作副本并将值转换为新类型。

但是，如果我们将此数组视为float，我们将获得item1.real, item1.imag, item2.real, item2.imag, ...的序列。

print x
print x.view(float)

的产率：

[ 0.5+1.j  1.0+2.j  3.0+0.j]
[ 0.5  1.   1.   2.   3.   0. ]

每个复数本质上都是两个浮点数，所以如果我们改变numpy如何解释底层内存缓冲区，我们得到一个两倍长度的数组。

希望这有助于澄清一些事情......

Answer 2

通过以这种方式更改dtype，您正在改变解释固定内存块的方式。

示例：

>>> import numpy as np
>>> a=np.array([1,0,0,0,0,0,0,0],dtype='int8')
>>> a
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
>>> a.dtype='int64'
>>> a
array([1])

注意从int8到int64的更改如何将8元素，8位整数数组更改为1元素，64位数组。然而，它是相同的8字节块。在具有本机endianess的i7机器上，字节模式与int64格式的1相同。

更改1：

的位置

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int8')
>>> a.dtype='int64'
>>> a
array([16777216])

另一个例子：

>>> a=np.array([0,0,0,0,0,0,1,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([0, 0, 0, 1])

更改32字节，32位数组中1的位置：

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([         0, 4294967296,          0,          0])

重新解释的是相同的位块。

Answer 3

在搞乱之后，我认为手动分配dtype会重新解释，而不是你想要的。意思是我认为它直接将数据解释为float而不是将其转换为float。也许你可以试试aa = numpy.array(aa.map(float, aa))。

进一步说明：dtype是数据的类型。从documentation

逐字引用

数据类型对象（numpy.dtype类的实例）描述了如何固定大小的内存块中对应于数组的字节项目应该被解释。

整数和浮点数没有相同的位模式，这意味着您不能仅仅查看int的内存，当您将其视为浮点数时，它将是相同的数字。通过将dtype设置为float64，您只是告诉计算机将该内存读取为float64而不是实际将整数转换为浮点数。

Answer 4

dtype ndarray属性的documentation根本不是很有用。看看你的输出，看起来八个4字节整数的缓冲区被重新解释为四个8字节浮点数。

但你想要的是在数组创建中指定dtype：

array([1, 2, 3, 4, 5, 6, 7, 8], dtype="float64")

什么是.dtype呢？

4 个答案:

numpy数组的背景

Dtypes和views

复数