Question

假设有些字符串包含不同格式的名称（每行都是可能的用户输入）：

'Guilcher, G.M., Harvey, M. & Hand, J.P.'
'Ri Liesner, Peter Tom Collins, Michael Richards'
'Manco-Johnson M, Santagostino E, Ljung R.'

我需要转换这些名称以获得格式Lastname ABC。因此，应将每个surename转换为其姓氏的首字母。

该示例应该导致

Guilcher GM, Harvey M, Hand JP
Liesner R, Collins PT, Richards M
Manco-Johnson M, Santagostino E, Ljung R

问题是不同的（可能的）输入格式。我认为我的尝试不是很聪明，所以我要求

优化转化代码的一些提示
我如何将它们放在一个功能中？我想首先我要测试字符串的格式...... ??

让我解释一下我试图解决的问题：

第一个示例字符串

在第一个例子中，首字母后跟一个点。应删除这些点，并删除名称和初始值之间的逗号。

firstString
  .replace('.', '')
  .replace(' &', ', ')

我想我确实需要一个正则表达式来获取名称之后和缩写之前的逗号。

第二个示例字符串

在第二个示例中，名称应按空格分割，最后一个元素作为姓氏处理：

const elm = secondString.split(/\s+/)
const lastname = elm[elm.length - 1]
const initials = elm.map((n,i) => {
  if (i !== elm.length - 1) return capitalizeFirstLetter(n)
})

return lastname + ' ' + initals.join('')

......不是很优雅

第三个示例字符串

第三个示例已经具有正确的格式 - 只需要删除末尾的点。因此，输入无需其他任何工作。

Answer 1

根据您的示例数据，我会尝试根据名称部分计数= 2进行猜测，因为很难依赖任何[0.0]，,或& - 这意味着将它们全部视为\n。

针对您的数据尝试此操作，并告诉我任何失败的用例，因为我非常有信心此脚本会在某些时候出现更多数据失败：）

＆＃13;

Answer 2

如果不调用多个replace()方法，就不可能实现。提供的解决方案中的步骤如下：

删除缩写名称中的所有点
用名字替换姓氏
将姓氏替换为首字母
删除不需要的字符

演示：

＆＃13;

var s = `Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.`

// Remove all dots in abbreviated names
var b = s.replace(/\b([A-Z])\./g, '$1')
// Substitute first names and lastnames
.replace(/([A-Z][\w-]+(?: +[A-Z][\w-]+)*) +([A-Z][\w-]+)\b/g, ($0, $1, $2) => {
    // Replace full lastnames with their first letter
    return $2 + " " + $1.replace(/\b([A-Z])\w+ */g, '$1');
})
// Remove unwanted preceding / following commas and ampersands 
.replace(/(,) +([A-Z]+)\b *[,&]?/g, ' $2$1');

console.log(b);

＆＃13;

Answer 3

这是我的方法。我试图保持简短，但复杂性令人惊讶地高，以获得优势。

首先，我要格式化输入，替换&的{{1}}，然后删除,。
然后，我将输入分为.，然后是\n，最后是,（空格）。
接下来我处理这些块。在每个新细分（由分隔）上，我处理上一个细分。我这样做是因为我需要确保当前段不是初始段。如果是这样的话，我会尽力跳过那个仅限初始的段并处理前一个段。前一个将具有正确的姓氏和姓氏，因为我有我需要的所有信息。
如果有的话，我会收到该片段的首字母。这将用于下一个段的开始以处理当前段。
完成每一行后，我再次处理最后一段，否则不会被称为。

我理解在不使用regexp的情况下复杂性很高，而且使用状态机来解析输入可能会更好。

＆＃13;

const isInitial = s => [...s].every(c => c === c.toUpperCase());
const generateInitial = arr => arr.reduce((a, c, i) => a + (i < arr.length - 1 ? c[0].toUpperCase() : ''), '');
const formatSegment = (words, initial) => {
  if (!initial) {
    initial = generateInitial(words);
  }
  const surname = words[words.length - 1];
  return {initial, surname};
}

const doDisplay = x => x.map(x => x.surname + ' ' + x.initial).join(', ');

const doProcess = _ => {
  const formatted = input.value.replace(/\./g, '').replace(/&/g, ',');
  const chunks = formatted.split('\n').map(x => x.split(',').map(x => x.trim().split(' ')));
  const peoples = [];
  chunks.forEach(line => {
    let lastSegment = null;
    let lastInitial = null;
    let lastInitialOnly = false;
    line.forEach(segment => {
      if (lastSegment) {
        // if segment only contains an initial, it's the initial corresponding
        // to the previous segment
        const initialOnly = segment.length === 1 && isInitial(segment[0]);
        if (initialOnly) {
          lastInitial = segment[0];
        }
        // avoid processing last segments that were only initials
        // this prevents adding a segment twice
        if (!lastInitialOnly) {
          // if segment isn't an initial, we need to generate an initial
          // for the previous segment, if it doesn't already have one
          const people = formatSegment(lastSegment, lastInitial);
          peoples.push(people);
        }
        lastInitialOnly = initialOnly;
        
        // Skip initial only segments
        if (initialOnly) {
          return;
        }
      }
      lastInitial = null;
      
      // Remove the initial from the words
      // to avoid getting the initial calculated for the initial
      segment = segment.filter(word => {
        if (isInitial(word)) {
          lastInitial = word;
          return false;
        }
        return true;
      });
      lastSegment = segment;
    });
    
    // Process last segment
    if (!lastInitialOnly) {
      const people = formatSegment(lastSegment, lastInitial);
      peoples.push(people);
    }
  });
  return peoples;
}
process.addEventListener('click', _ => {
  const peoples = doProcess();
  const display = doDisplay(peoples);
  output.value = display;
});

＆＃13;

.row {
  display: flex;
}

.row > * {
  flex: 1 0;
}

＆＃13;

从包含名称的字符串中获取首字母和完整姓氏

3 个答案: