Matlab中的文档聚类

时间:2014-02-09 13:14:54

标签: matlab cluster-analysis

我正在研究matlab中的文档聚类代码。我的文件是:

'The first step in analyzing the requirements is to construct an object model. 
It describes real world object classes and their relationships to each other. 
Information for the object model comes from the problem statement, expert knowledge of the application domain, and general knowledge of the real world. 


Britvic plc is one of the leading soft drinks manufacturers of soft drinks in the Beverages Sector functioning in Europe with its distribution branches in Great Britain, Ireland and France. '

如图所示,这些段落包含不同类别的数据。以下是我的主要计划:

global n;
n=1;
file1=fopen('doc1.txt','r');
%file 1 is now open
%read data from file 1
text=fileread('doc1.txt');
i=0;

%now text1 has the content of doc1 as a string.Next split the sentences
%into words.For that we are calling the split function

[C1,C2]=clustering(text)

以下是'群集'的代码:

function [C1,C2]=clustering(text)
global C1;
text1=strsplit(text,'.');


[rt1,ct1]=size(text1);


for i=1:(ct1-1)
    var=text1{i};




    vv=strsplit(var,' ');
    text2=setdiff(vv,{'this','you','is','an','with','as','well','like','and','to','it','on','off','of','in','mine','your','yours','these','this','will','would','shall','should','or','a','about','all','also','am','are','but','of','for','by','my','did','do','her','his','the','him','she','he','they','that','when','we','us','not','them','if','in','just','may','not'},'stable');

    [rt2,ct2]=size(text2);
    for r=1:ct2
        tmar=porterStemmer(text2{r});



        mapr{i,r}=tmar;
    end

end
[mr,mc]=size(mapr);


mapr
A=zeros(mr,mr);


for i=1:mr
    for j=1:mc

        for m=i+1:mr
            for k=1:mc
                 if ~isempty(mapr{i,j})  
                %if(~(mapr{i,j}=='[]'))


                    %mapr(i,j)
                    if strcmp(mapr{i,j},mapr{m,k})
                        p=mapr{i,j};
                str=sprintf('Sentences %d and %d match',i,m)
                str;
                str1=sprintf('And the word is : %s ',p)
                str1;
                      A(i,m)=1;
                      A(m,i)=1;
                    end
                end
            end
        end
    end
end
sprintf('Adjacency matrix is:')
 A 

        sprintf('The corresponding diagonnal matrix is:')
        [ar,ac]=size(A);
        for i=1:ar
            B(i)=0;
            for j=1:ac
                B(i)=B(i)+A(i,j);
            end
        end

        [br,bc]=size(B);
        D=zeros(bc,bc);

        for i=1:bc
            D(i,i)=B(i);
        end
        D
        sprintf('The similarity matrix is:')
        C=D-A
   [V,D]=eig(C,'nobalance')
   F=inv(V);
   V*D*F



        %mvar =no of edges/total degree of vertices



 no_of_edges=0;
        for i=1:ar
            for j=1:ac
                if(i<=j)
                    no_of_edges=no_of_edges+A(i,j);
                end
            end
        end
        no_of_edges;
        tdv=0;
        for i=1:bc
            tdv=tdv+B(i);
        end
        tdv;

     mvar=no_of_edges/tdv

     [dr,dc]=size(D);
     temp=abs(D(1,1)-mvar);
     x=D(1,1);
     for i=2:dc
         temp2=abs(D(i,i)-mvar);
         if temp>temp2
             temp=temp2;
             x=D(i,i);
           q=i
         end
     end
     x
     [vr,vc]=size(V);

     for i=1:vr
         V(i,q);
         Track(i)=V(i,q);

     end
     sprintf('Eigen vectors corresponding to the closest value:')
     Track
    j=1;
    m=1;
   C1=' ';
   C2=' ';
    for i=1:vr
        if(Track(i)<0)

           C1=strcat(C1,text1{1,i},'.');





        else

            C2=strcat(C2,text1{1,i},'.');
        end
    end

我可以从文档生成最初的两个集群。但话说回来,我希望聚类过程继续在生成的聚类上产生越来越多的每个子聚类,直到生成的总体没有变化。有人可以帮我实现这个解决方案,这样我不仅可以生成集群,还可以跟踪它们以便进一步处理。提前谢谢。

0 个答案:

没有答案