django用于生产我的推荐系统

时间:2018-06-20 08:08:03

标签: django

我已经使用mysql数据库中的数据在python3中编写了一个基于content_recommended系统。现在,我必须使用django进行生产,以便每次在数据库中添加新文章时都无需进行输入。如何将此python代码转换为Django生产。我将使用django数据库连接来连接数据库。我真的很困惑如何用Django编写这段代码?

my_recommender_system

import pandas as pd
import re
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
stop = set(stopwords.words('english'))
from string import punctuation
import functools
from matplotlib import pyplot as plt
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english", ignore_stopwords =True)
from tqdm import tqdm_notebook
tqdm_notebook().pandas()
import numpy as np
import math
from sklearn.metrics.pairwise import linear_kernel
#import text

from collections import Counter
df = pd.read_csv('target.csv')
df = df.loc[:,['id','combined_text']].astype(str)
df["combined_text"] = df["combined_text"].apply(lambda x: ' '.join(pd.unique(x.split())))
df.combined_text = df.combined_text.apply(lambda x: x.lower())   
df.combined_text = df.combined_text.str.replace('[^\w\s]',' ')
df['combined_text'] = df['combined_text'].str.replace('\d+', ' ')
df.combined_text = df.combined_text.str.replace('nbsp?' , ' ')
#df.combined_text = df.combined_text.str.replace('nan?' , ' ')
df.combined_text = df.combined_text.str.replace('value?' , ' ')
df = df.dropna(subset = ['combined_text'])
df.combined_text = df.combined_text.str.replace('\s+', ' ') 
#df.combined_text.map(len).hist(figsize=(15, 5), bins=100)
df = df[(df.combined_text.map(len) > 600)]
df.reset_index(inplace=True, drop=True)

#df1 = df[(df.combined_text.map(len) > 7500)]
stop_words = []

f = open('stopwords.txt', 'r')
for l in f.readlines():
    stop_words.append(l.replace('\n', ''))

additional_stop_words = ['t','aah','aap','don','doesn','isn','ve','ll','add', 'ndash','will','nan','q','article','lsquo','rsquo','ldquo','rdquo','personalised','please','read','download','app','here','more','experience','based','explore','bull','fact','myth','ndash','middot','lifestage','entire','collection','articles','reading','website','android','phone','a','zero']
stop_words += additional_stop_words
stop_words = list(filter(None, stop_words))
#print(len(stop_words))

def _removeNonAscii(s): 
    return "".join(i for i in s if ord(i)<128)

def clean_text(text):
    text = text.lower()
    text = re.sub(r"what's", "what is ", text)
    text = text.replace('(ap)', '')
    text = re.sub(r"\'s", " is ", text)
    text = re.sub(r"\'ve", " have ", text)
    text = re.sub(r"can't", "cannot ", text)
    text = re.sub(r"n't", " not ", text)
    text = re.sub(r"i'm", "i am ", text)
    text = re.sub(r"\'re", " are ", text)
    text = re.sub(r"\'d", " would ", text)
    text = re.sub(r"\'ll", " will ", text)
    text = re.sub(r'\W+', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    text = re.sub(r"\\", "", text)
    text = re.sub(r"\'", "", text)    
    text = re.sub(r"\"", "", text)
    text = re.sub('[^a-zA-Z ?!]+', '', text)
    text = _removeNonAscii(text)
    text = text.strip()
    return text

def tokenizer(text):
    text = clean_text(text)    
    tokens = [word_tokenize(sent) for sent in sent_tokenize(text)]
    tokens = list(functools.reduce(lambda x,y: x+y, tokens))
    tokens = list(filter(lambda token: token not in (stop_words + list(punctuation)) , tokens))
    return tokens

#df['combined_text'] = df['combined_text'].map(lambda d: str.encode(d.decode('utf-8')))

df['tokens'] = ''
df['tokens'] = df['combined_text'].progress_map(lambda d: tokenizer(d))
df['text_stemmed']=df['tokens'].apply(lambda x : [stemmer.stem(y) for y in x])
df['text_stemmed_sentence']=df['text_stemmed'].apply(lambda x : " ".join(x))
df['stemmed_tokens'] = df['text_stemmed_sentence'].progress_map(lambda d: tokenizer(d))
df = df[['id','text_stemmed_sentence','stemmed_tokens']]
# =============================================================================
# for descripition, tokens in zip(df['combined_text'].head(5), df['tokens'].head(5)):
#     print('description:', descripition)
#     print('tokens:', tokens)
#     print()
#     
# =============================================================================
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(min_df=5, analyzer='word', ngram_range=(1, 2), stop_words='english')
vz = vectorizer.fit_transform(list(df['stemmed_tokens'].map(lambda tokens: ' '.join(tokens))))
cosine_similarities = linear_kernel(vz,vz)
articlesRecommend = pd.DataFrame(cosine_similarities, columns = df.id, index = df.id)
y = np.array([articlesRecommend[c].nlargest(10).index.values for c in articlesRecommend])
articles_df = pd.DataFrame(data = y, index = articlesRecommend.columns) 

1 个答案:

答案 0 :(得分:0)

这个问题的完整答案将是冗长的,但我可以将其简单地包装为:

  1. 首先制作virtualenv并安装django。另外,您将需要安装在python程序中使用过的所有python软件包,例如pandas等。
  2. 运行此简单命令django-admin startproject <project_name>。接下来,运行django-admin startapp <app_name>这是为了在django项目中制作一个应用,因为django可以有很多应用。
  3. 打开source / source / settings.py,并在INSTALLED_APPS列表中提及您的应用名称。
  4. 您将需要在/views.py中呈现相同的代码。但是应该至少有一个带有请求参数的函数来完成相同的任务。
  5. 类似这样的事情: import pandas # and import other libs def some_func(request): ## your code

  6. 接下来,您将不得不将此功能与urls.py中的url映射,这是您可以在此处找到的内容:mapping the urls to functions in views.py

当然,您将必须使用python manage.py runserver运行服务器,才能在127.0.0.1/8000上找到您的项目。

老实说,如果您了解django的基本架构,这是一件非常容易的事情。 This documentation can be of help to you

出现问题的症结所在: 由于您已解释过,您将在已有文章的基础上建议大多数相关文章。首先,您的Laravel项目中的数据源应以JSON格式流动数据,您可以在views.py的函数中读取该数据,一旦读取数据并运行了已经可以运行的代码,接下来您应该能够通过一些url发送最相关的文章信息,例如id或其他内容。为此,您可以使用Django的rest框架执行此操作,也可以仅从函数中返回JsonResponse。