Question

我对Algorithmia很新，但是我已经使用了scikit-learn了解了一下我知道如何在我用joblib训练它之后坚持我的机器学习模型：< / p>

from sklearn.externals joblib

model = RandomForestRegressor()
# Train the model, etc
joblib.dump(model, "prediction/model/model.pkl")

现在我想托管我的ML模型，并使用Algorithmia将其称为服务，但我无法弄清楚如何阅读模型。我在Algorithmia中创建了一个名为＆＃34; testcollection＆＃34;使用名为＆＃34; model.pkl＆＃34;的文件这是joblib.dump调用的结果。根据文档，这意味着我的文件应位于

数据：//（用户名）/testcollection/model.pkl

我想使用joblib.load从文件中读取该模型。这是我在Algorithmia中的当前算法：

import Algorithmia

def apply(input):
    client = Algorithmia.client()
    f = client.file("data://(username)/testcollection/model.pkl")
    print(f.path)
    print(f.url)
    print(f.getName())
    model = joblib.load(f.url) # Or f.path, both don't work
    return "empty"

这是输出：

(username)/testcollection/model.pkl
/v1/data/(username)/testcollection/model.pkl
model.pkl

它在joblib.load行出错，给出＆＃34;没有这样的文件或目录（无论我放入什么路径）＆＃34;

这是我在调用joblib.load时试过的所有路径/网址：

/ V1 /数据/（用户名）/testcollection/model.pkl
数据：//（用户名）/testcollection/model.pkl
（用户名）/testcollection/model.pkl
https://algorithmia.com/v1/data/(username)/testcollection/model.pkl

如何使用joblib从文件加载模型？我是以错误的方式解决这个问题吗？

Answer 1

有几种方法可以访问DataAPI上的数据。

以下是通过Python客户端访问文件的4种不同方法：

import Algorithmia

client = Algorithmia.client("<YOUR_API_KEY>")

dataFile = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getFile()

dataText = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getString()

dataJSON = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getJson()

dataBytes = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getBytes()

由于Sklearn需要模型文件的路径，因此最简单的方法是通过文件对象（也就是数据文件）。

According to the Official Python2.7 Documentation，如果创建的文件对象不是open()函数，则对象属性name通常对应于文件的路径。

在这种情况下，您需要编写如下内容：

import Algorithmia

def apply(input):

    # You don't need to write your API key if you're editing in the web editor
    client = Algorithmia.client()

    modelFile = client.file("data://(username)/testcollection/model.pkl").getFile()

    modelFilePath = modelFile.name

    model = joblib.load(modelFilePath)

    return "empty"

但是according to the Official Sklearn Model Persistence Documentation，您还应该能够传递类文件对象而不是文件名。

因此，我们可以跳过我们尝试获取文件名的部分，然后传递modelFile对象：

import Algorithmia

def apply(input):

    # You don't need to write your API key if you're editing in the web editor
    client = Algorithmia.client()

    modelFile = client.file("data://(username)/testcollection/model.pkl").getFile()

    model = joblib.load(modelFile)

    return "empty"

修改：Here's also an article in the Offical Algorithmia Developer Center talking about Model Persistence in Scikit-Learn。

完整披露：我在Algorithmia担任算法工程师。

具有Sklearn的Algorithmia模型持久性

1 个答案: