将数据从文本文件读入单元格数组

时间:2020-06-21 06:04:41

标签: matlab

我有多个文本文件,其中包含这种格式的数据

File1.txt

subID    imageCondition  trial   textItem    imageFile   response    RT
Participant003   images  7   Is there a refrigerator?    07_targetPresent-refrigerator.jpg   z   1.436971
Participant003   images  6   Is there an oven mitt?  06_targetPresent-ovenmitt.jpg   z   0.519301
Participant003   images  1   Is there a toaster?     01_targetAbsent-toaster.jpg     m   1.110664
Participant003   images  3   Is there a wine bottle?     03_targetAbsent-winebottle.jpg  m   1.278945
Participant003   images  2   Is there a kettle?  02_targetAbsent-kettle.jpg  z   2.672123
Participant003   images  5   Is there a blender?     05_targetPresent-blender.jpg    m   2.633802
Participant003   images  8   Is there a bucket?  08_targetPresent-bucket.jpg     m   2.596154
Participant003   images  4   Is there a surf board?  04_targetAbsent-surfboard.jpg   m   1.072850

File2.txt

subID    imageCondition  trial   textItem    imageFile   response    RT
Participant005   images  1   Is there a toaster?     01_targetAbsent-toaster.jpg         0.000000
Participant005   images  2   Is there a kettle?  02_targetAbsent-kettle.jpg  m   8.213927
Participant005   images  6   Is there an oven mitt?  06_targetPresent-ovenmitt.jpg   z   3.569293
Participant005   images  4   Is there a surf board?  04_targetAbsent-surfboard.jpg       0.000000
Participant005   images  3   Is there a wine bottle?     03_targetAbsent-winebottle.jpg  m   8.538699
Participant005   images  7   Is there a refrigerator?    07_targetPresent-refrigerator.jpg   z   0.857319
Participant005   images  5   Is there a blender?     05_targetPresent-blender.jpg        0.000000
Participant005   images  8   Is there a bucket?  08_targetPresent-bucket.jpg     z   1.967220

我希望能够将此数据读取到单元格数组中,以便可以单独访问其中的值。

我有以下用于读取数据的代码,但它无济于事,因为我无法以某种方式存储数据,因此无法访问各个值。例如,我想要“试验”或“响应”列中的所有值。

function content = load_data(fileName)
fid = fopen(fileName,'r')
if fid > 0
   line_no =1;
   oneline{line_no} = fgetl(fid);
   while ischar(oneline{line_no})
      line_no = line_no +1;
      oneline{line_no} = fgetl(fid);
   endwhile
   fclose(fid)
   content = oneline;
endif
endfunction


for i= 1:size(txtFiles,2)
   data{i} = load_data(txtFiles{1,i});
end

for i=1:1:length(data)
   dataMat = cell2mat(data(i));
   for j=1:1:length(dataMat)
      line = dataMat{1,j};
      % Here I'm only able to fetch lines of data as strings that are separated by more than one space characters, making it more difficult access the required data 
   endfor            
endfor

我正在寻找一种将文本文件中的数据读取到单元格数组或矩阵中的方法,这样我可以轻松访问所需的值,但是我只能使用传统的从文本中导入数据的方法文件。或者,如果我只是以一种可以访问所需内容的方式来获取解析数据方面的帮助。

注意:有多个这样的文本文件。如果您可以显示如何访问各个列中的值(例如, “响应”列。

1 个答案:

答案 0 :(得分:1)

这很容易做到,例如strsplit可以根据空间分割数据;除了textItem字段中包含空格。所以我建议使用正则表达式。当您一次查找多个单独的片段时,使用named tokens是组织结果的一种便捷方法。我意识到,如果您不熟悉正则表达式,那么跳进去就很难了。请查看regex101.com以获得信息,以及一个非常有用的在线工具来测试您的正则表达式。请参阅regex101上的this specific example。也就是说,这是对您的数据有效的我的答案:

text = fileread(filename);
data = regexp(data,'^(?<subID>\w+)\s+(?<imageCondition>\w+)\s+(?<trial>\d+)\s+(?<textItem>.*?\?)\s+(?<imageFile>[-\.\w]+)\s+(?<response>\w)\s+(?<RT>[\d\.]+)','names','lineanchors')

或者您可以将其变成表格:

dataTable = struct2table(data)

结果如下:

      subID           imageCondition    trial              textItem                            imageFile                  response         RT     
__________________    ______________    _____    ____________________________    _____________________________________    ________    ____________

{'Participant003'}      {'images'}      {'7'}    {'Is there a refrigerator?'}    {'07_targetPresent-refrigerator.jpg'}     {'z'}      {'1.436971'}
{'Participant003'}      {'images'}      {'6'}    {'Is there an oven mitt?'  }    {'06_targetPresent-ovenmitt.jpg'    }     {'z'}      {'0.519301'}
{'Participant003'}      {'images'}      {'1'}    {'Is there a toaster?'     }    {'01_targetAbsent-toaster.jpg'      }     {'m'}      {'1.110664'}
{'Participant003'}      {'images'}      {'3'}    {'Is there a wine bottle?' }    {'03_targetAbsent-winebottle.jpg'   }     {'m'}      {'1.278945'}
{'Participant003'}      {'images'}      {'2'}    {'Is there a kettle?'      }    {'02_targetAbsent-kettle.jpg'       }     {'z'}      {'2.672123'}
{'Participant003'}      {'images'}      {'5'}    {'Is there a blender?'     }    {'05_targetPresent-blender.jpg'     }     {'m'}      {'2.633802'}
{'Participant003'}      {'images'}      {'8'}    {'Is there a bucket?'      }    {'08_targetPresent-bucket.jpg'      }     {'m'}      {'2.596154'}
{'Participant003'}      {'images'}      {'4'}    {'Is there a surf board?'  }    {'04_targetAbsent-surfboard.jpg'    }     {'m'}      {'1.072850'}

如果要将数字字段转换为数字:

dataTable.trial = str2double(dataTable.trial);
dataTable.RT = str2double(dataTable.RT);

然后给出:

      subID           imageCondition    trial              textItem                            imageFile                  response      RT  
__________________    ______________    _____    ____________________________    _____________________________________    ________    ______

{'Participant003'}      {'images'}        7      {'Is there a refrigerator?'}    {'07_targetPresent-refrigerator.jpg'}     {'z'}       1.437
{'Participant003'}      {'images'}        6      {'Is there an oven mitt?'  }    {'06_targetPresent-ovenmitt.jpg'    }     {'z'}      0.5193
{'Participant003'}      {'images'}        1      {'Is there a toaster?'     }    {'01_targetAbsent-toaster.jpg'      }     {'m'}      1.1107
{'Participant003'}      {'images'}        3      {'Is there a wine bottle?' }    {'03_targetAbsent-winebottle.jpg'   }     {'m'}      1.2789
{'Participant003'}      {'images'}        2      {'Is there a kettle?'      }    {'02_targetAbsent-kettle.jpg'       }     {'z'}      2.6721
{'Participant003'}      {'images'}        5      {'Is there a blender?'     }    {'05_targetPresent-blender.jpg'     }     {'m'}      2.6338
{'Participant003'}      {'images'}        8      {'Is there a bucket?'      }    {'08_targetPresent-bucket.jpg'      }     {'m'}      2.5962
{'Participant003'}      {'images'}        4      {'Is there a surf board?'  }    {'04_targetAbsent-surfboard.jpg'    }     {'m'}      1.0729

您还询问了如何访问它。从表中获取第三个“响应”:

dataTable.response{3}

或者从结构中:

data(3).response