我如何解析MediaWiki的Special:UnusedFiles

时间:2019-04-23 02:35:14

标签: bash

我的mediawiki网站在Special:UnusedFiles页面中列出了成千上万个孤立文件。没有批量删除它们的功能。一次经历一次这些操作将花费很长时间,因此我们的想法是将文件列表转储到文本文件中,然后运行cronjob来调用deleteBatch.php文件。

我可以像这样在页面上运行一个wget并获得一个很好的文件列表,但是在列表的末尾有一些不属于的项目:

IEnumerable<IDropArea> IOverlayWindowHost.GetDropAreas( LayoutFloatingWindowControl draggingWindow )
{
  if( _areas != null )
    return _areas;

  bool isDraggingDocuments = draggingWindow.Model is LayoutDocumentFloatingWindow;

  _areas = new List<IDropArea>();

  if( !isDraggingDocuments )
  {
    _areas.Add( new DropArea<DockingManager>(
        this,
        DropAreaType.DockingManager ) );

    foreach( var areaHost in this.FindVisualChildren<LayoutAnchorablePaneControl>() )
    {
      if( areaHost.Model.Descendents().Any() )
      {
        _areas.Add( new DropArea<LayoutAnchorablePaneControl>(
            areaHost,
            DropAreaType.AnchorablePane ) );
      }
    }
  }

  // Determine if floatingWindow is configured to dock as document or not
  bool dockAsDocument = true;
  if (isDraggingDocuments == false)
  {
    var toolWindow = draggingWindow.Model as LayoutAnchorableFloatingWindow;
    if (toolWindow != null)
    {
      foreach (var item in GetAnchorableInFloatingWindow(draggingWindow))
      {
        if (item.CanDockAsTabbedDocument == false)
        {
          dockAsDocument = false;
          break;
        }
      }
    }
  }

  // Dock only documents and tools in DocumentPane if configuration does allow that
  if (dockAsDocument == true)
  {
    foreach( var areaHost in this.FindVisualChildren<LayoutDocumentPaneControl>() )
    {
      _areas.Add( new DropArea<LayoutDocumentPaneControl>(
          areaHost,
          DropAreaType.DocumentPane ) );
    }
  }

  foreach( var areaHost in this.FindVisualChildren<LayoutDocumentPaneGroupControl>() )
  {
    var documentGroupModel = areaHost.Model as LayoutDocumentPaneGroup;
    if( documentGroupModel.Children.Where( c => c.IsVisible ).Count() == 0 )
    {
      _areas.Add( new DropArea<LayoutDocumentPaneGroupControl>(
          areaHost,
          DropAreaType.DocumentPaneGroup ) );
    }
  }

  return _areas;
}

/// <summary>
/// Finds all <see cref="LayoutAnchorable"/> objects (toolwindows) within a
/// <see cref="LayoutFloatingWindow"/> (if any) and return them.
/// </summary>
/// <param name="draggingWindow"></param>
/// <returns></returns>
private IEnumerable<LayoutAnchorable> GetAnchorableInFloatingWindow(LayoutFloatingWindowControl draggingWindow)
{
  var layoutAnchorableFloatingWindow = draggingWindow.Model as LayoutAnchorableFloatingWindow;
  if (layoutAnchorableFloatingWindow != null)
  {
      //big part of code for getting type
      var layoutAnchorablePane = layoutAnchorableFloatingWindow.SinglePane as LayoutAnchorablePane;

      if (layoutAnchorablePane != null
          && (layoutAnchorableFloatingWindow.IsSinglePane
          && layoutAnchorablePane.SelectedContent != null))
      {
          var layoutAnchorable = ((LayoutAnchorablePane)layoutAnchorableFloatingWindow.SinglePane).SelectedContent as LayoutAnchorable;
          yield return layoutAnchorable;
      }
      else
      {
        foreach (var item in GetLayoutAnchorable(layoutAnchorableFloatingWindow.RootPanel))
        {
          yield return item;
        }
      }
  }
}

/// <summary>
/// Finds all <see cref="LayoutAnchorable"/> objects (toolwindows) within a
/// <see cref="LayoutAnchorablePaneGroup"/> (if any) and return them.
/// </summary>
/// <param name="layoutAnchPaneGroup"></param>
/// <returns></returns>
internal IEnumerable<LayoutAnchorable> GetLayoutAnchorable(LayoutAnchorablePaneGroup layoutAnchPaneGroup)
{
  if (layoutAnchPaneGroup != null)
  {
    foreach (var anchorable in layoutAnchPaneGroup.Descendents().OfType<LayoutAnchorable>())
    {
      yield return anchorable;
    }
  }
}

列表的末尾有这个,显然我只需要File:file_name.jpg,我需要将其吐到文本文件中。

    wget -q -O - http://www.example.com/wiki/Special:UnusedFiles | replace 
'/' '
' '\"' '
' | grep 'File' - | sort -u -

1 个答案:

答案 0 :(得分:0)

您可以进一步grep来限制命令的结果并将结果重定向到文件中:

wget -q -O - http://www.example.com/wiki/Special:UnusedFiles | replace 
'/' '
' '\"' '
' | grep 'File' - | sort -u - | grep ^File > files_to_delete.txt

grep命令中的^意味着后面的File应该在该行的开头,>将命令的输出重定向到一个文件。