嘿,我有一个单元格数组,第二列是'XX-> XX'的时间,例如:
'AA->AA' [21] [4.2084]
'AA->AC' [15] [3.0060]
'AA->AG' [ 9] [1.8036]
'AA->AT' [12] [2.4048]
'AC->CA' [14] [2.8056]
'AC->CC' [16] [3.2064]
'AC->CG' [ 5] [1.0020]
'AC->CT' [ 3] [0.6012]
'AG->GA' [11] [2.2044]
'AG->GC' [ 5] [1.0020]
'AG->GG' [ 8] [1.6032]
'AG->GT' [13] [2.6052]
'AT->TA' [10] [2.0040]
'AT->TC' [ 8] [1.6032]
'AT->TG' [ 2] [0.4008]
'AT->TT' [11] [2.2044]
'CA->AA' [17] [3.4068]
'CA->AC' [ 7] [1.4028]
'CA->AG' [ 9] [1.8036]
'CA->AT' [11] [2.2044]
'CC->CA' [15] [3.0060]
'CC->CC' [ 5] [1.0020]
'CC->CG' [ 4] [0.8016]
'CC->CT' [17] [3.4068]
'CG->GA' [ 1] [0.2004]
'CG->GC' [ 2] [0.4008]
'CG->GG' [ 9] [1.8036]
'CG->GT' [ 3] [0.6012]
'CT->TA' [ 7] [1.4028]
'CT->TC' [ 9] [1.8036]
'CT->TG' [ 9] [1.8036]
'CT->TT' [ 2] [0.4008]
'GA->AA' [10] [2.0040]
'GA->AC' [ 4] [0.8016]
'GA->AG' [10] [2.0040]
'GA->AT' [ 2] [0.4008]
'GC->CA' [ 2] [0.4008]
'GC->CC' [ 7] [1.4028]
'GC->CG' [ 6] [1.2024]
'GC->CT' [ 3] [0.6012]
'GG->GA' [ 6] [1.2024]
'GG->GC' [ 6] [1.2024]
'GG->GG' [ 4] [0.8016]
'GG->GT' [ 8] [1.6032]
'GT->TA' [ 6] [1.2024]
'GT->TC' [11] [2.2044]
'GT->TG' [ 8] [1.6032]
'GT->TT' [ 5] [1.0020]
'TA->AA' [ 8] [1.6032]
'TA->AC' [13] [2.6052]
'TA->AG' [ 9] [1.8036]
'TA->AT' [ 6] [1.2024]
'TC->CA' [13] [2.6052]
'TC->CC' [13] [2.6052]
'TC->CT' [ 4] [0.8016]
'TG->GA' [ 8] [1.6032]
'TG->GC' [ 5] [1.0020]
'TG->GG' [ 3] [0.6012]
'TG->GT' [ 6] [1.2024]
'TT->TA' [13] [2.6052]
'TT->TC' [ 2] [0.4008]
'TT->TG' [ 3] [0.6012]
'TT->TT' [ 5] [1.0020]
现在,我试图计算概率:P('AA-> AA')= TIMES('AA-> AA')/ SUM('AA-> AA','AA-> AC','AA-> AG','AA-> AT'),换句话说,P('AA-> AA')=时间('AA-> AA')/ SUM('AA - >任何')。和其他人一样。我想用循环来做到这一点,但
中有一个极端的情况'TC->CA' [13] [2.6052]
'TC->CC' [13] [2.6052]
'TC->CT' [ 4] [0.8016]
很明显,'TC-> CG'的时间显然是0,这也需要考虑,即使我们已经知道概率应该为0.当然,这种极端情况可以在任何其他情况下发生一个人喜欢,有时可能缺少'TT-> TT',或者有时候'TC-> CT'。
任何人都知道如何做到这一点?
感谢。
答案 0 :(得分:1)
试试这个 -
%%// Get the cell data into data1
data1 = INPUT_DATA;
%%// Get the data from columns separately
col1 = data1(:,1);
tag_data = vertcat(col1{:});
col2 = data1(:,2);
times_data = vertcat(col2{:});
col3 = data1(:,3);
col3_data = vertcat(col3{:});
%%// Get full data for tag, times and column3
char_array = ['A' 'C' 'G' 'T'];
full_tag_data = char_array(combinator(4,3,'p','r'));
full_tag_data = [full_tag_data(:,1:2) repmat('->',[size(full_tag_data,1) 1]) full_tag_data(:,2:3)];
present_rows = ismember(full_tag_data,tag_data,'rows');
full_times_data = double(present_rows);
full_times_data(present_rows) = times_data;
full_col3_data = double(present_rows);
full_col3_data(present_rows) = col3_data;
%%// Get the sum values
full_col3_data_summed = sum(reshape(full_col3_data,4,[]),1);
full_col3_data_summed = reshape(repmat(full_col3_data_summed,[4 1]),[],1);
%%// Store the required values into a cell array out_cell1
out_cell1 = cell(size(present_rows,1),2);
out_cell1(:,1) = cellstr(full_tag_data);
out_cell1(:,2) = num2cell(full_times_data);
out_cell1(:,3) = num2cell(full_col3_data);
%%// The probabilities are added into the cell array as the fourth column
out_cell1(:,4) = num2cell(full_times_data./full_col3_data_summed);
注意:以上代码使用的函数combinator
可用here。