对mongodb服务崩溃进行故障排除

时间:2019-01-16 07:35:16

标签: python database mongodb raspberry-pi pymongo

我正在树莓派(Linux)上运行python程序,该程序将数据记录到mongodb中(使用pymongo模块)。我无法理解mongodb服务何时停止运行或为什么它将停止。

现在,我已经设置了程序功能,以便如果它们无法访问mongodb(获取pymongo连接异常),它们将尝试重新启动服务,等待十秒钟,然后重新尝试该操作。这些函数是递归的,如下所示:

def get_database_collection():
     try:
          # code to get document
          return document
     except Exception:
          # code to log exception in my log files
          start_mongo_service()
          get_database_collection()

这就是start_mongo_service()函数的样子:

def start_mongo_service():
     try:
          subprocess.call(["sudo", "service", "mongodb", "start"])
          time.sleep(10)
          return True
     except Exception:
          # code to log exception in my log files (Could not start_mongo_service)
          database_logger = logging.getLogger('database_thread')
          database_logger.exception("Could not start_mongo_service")
          time.sleep(10)
          return False 

现在,我知道捕获所有异常是一种不好的做法,但是我这样做是因为我不希望我的代码崩溃,而且我记录了发生的任何异常,以便我可以检查代码的行为。

所以昨天我的程序崩溃了,控制台上给出的错误是:MaximumRecursionDepth超出了,我认为这意味着它循环了1000次,仍然无法逃脱其异常。该程序的日志如下所示:

2019-01-15 18:12:50,000 - ERROR - database_thread - Could not start_mongo_service
Traceback (most recent call last):
  File "gateway-embedded-code/database.py", line 89, in update_status_collection
AttributeError: 'NoneType' object has no attribute 'update'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pymongo/mongo_client.py", line 266, in __init__
  File "pymongo/mongo_client.py", line 641, in __find_node
pymongo.errors.AutoReconnect: could not connect to localhost:27017: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "gateway-embedded-code/database.py", line 35, in get_database_collection
  File "pymongo/mongo_client.py", line 269, in __init__
pymongo.errors.ConnectionFailure: could not connect to localhost:27017: [Errno 110] Connection timed out

上面的pymongo异常发生了很多次,我将它们缩短了以便在此处发布。然后,我尝试在/ var / log / mongodb中查看mongodb自己的日志,最后一个条目是在1月15日17:51:54!之后,当天的日志中没有任何内容...我猜该服务已停止,我的程序也无法重新启动它,因此它在18:12:50崩溃了...

Tue Jan 15 17:51:54.391 [conn11125] end connection 127.0.0.1:53052 (1 connection now open)
Tue Jan 15 17:51:54.393 [initandlisten] connection accepted from 127.0.0.1:53054 #11126 (2 connections now open)
Tue Jan 15 17:51:54.408 [conn11126] end connection 127.0.0.1:53054 (1 connection now open)
Tue Jan 15 17:51:54.410 [initandlisten] connection accepted from 127.0.0.1:53056 #11127 (2 connections now open)
Wed Jan 16 04:33:04.994 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Wed Jan 16 04:33:04.994 [signalProcessingThread] now exiting
Wed Jan 16 04:33:04.994 dbexit:
Wed Jan 16 04:33:04.994 [signalProcessingThread] shutdown: going to close listening sockets...
Wed Jan 16 04:33:04.994 [signalProcessingThread] closing listening socket: 9
Wed Jan 16 04:33:04.994 [signalProcessingThread] closing listening socket: 10
Wed Jan 16 04:33:04.994 [signalProcessingThread] closing listening socket: 11
Wed Jan 16 04:33:04.994 [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
Wed Jan 16 04:33:04.994 [signalProcessingThread] shutdown: going to flush diaglog...
Wed Jan 16 04:33:04.994 [signalProcessingThread] shutdown: going to close sockets...
Wed Jan 16 04:33:04.994 [signalProcessingThread] shutdown: waiting for fs preallocator...
Wed Jan 16 04:33:04.994 [signalProcessingThread] shutdown: lock for final commit...
Wed Jan 16 04:33:04.994 [signalProcessingThread] shutdown: final commit...
Wed Jan 16 04:33:04.995 [signalProcessingThread] shutdown: closing all files...
Wed Jan 16 04:33:04.995 [conn11127] end connection 127.0.0.1:53056 (1 connection now open)
Wed Jan 16 04:33:04.997 [signalProcessingThread] closeAllFiles() finished
Wed Jan 16 04:33:04.997 [signalProcessingThread] journalCleanup...
Wed Jan 16 04:33:04.997 [signalProcessingThread] removeJournalFiles
Wed Jan 16 04:33:05.053 [conn4] end connection 127.0.0.1:56470 (0 connections now open)
Wed Jan 16 04:33:05.223 [signalProcessingThread] shutdown: removing fs lock...
Wed Jan 16 04:33:05.224 dbexit: really exiting now


***** SERVER RESTARTED *****
      # Everything works fine from this point onwards

今天是1月16日,当我重新启动raspberry pi时,所有1月16日的消息都被记录下来,该程序现在可以正常工作了……但是当我让它运行并检查第二天时,此问题仍然存在。

我的问题是,为什么会这样? mongo服务什么时候停止?为什么不能用我的功能重启它?谁能解释一下可以从日志中推断出什么?意外的电源断开会导致mongodb服务无法在启动时运行吗?请帮助我解决可能发生的情况以及如何处理它,我不希望我的程序崩溃!

很抱歉,我可以提供您需要的更多详细信息。

感谢您阅读。

编辑:只想弄清楚AttributeError:Nonetype的来源。请记住,所谓收集只是指数据库中的文档。 我有一个名为update_status_collection()的函数:

 def update_status_collection(the_update):
       try:
            document = get_database_collection(collection_name='status_collection')
            document.update(the_update)
       except Exception:
            database_logger = logging.getLogger('database_thread')
            database_logger.exception('Could not update_status_collection')
            start_mongo_service()
            update_status_collection(the_update)

现在由于某种原因,get_database_collection函数向文档变量返回了一个Nonetype,这是引发AttributeError异常的地方,因为您无法更新Nonetype。虽然我很好奇,当get_database_collection()也是递归的时,如何返回Nonetype .....它必须在达到MaximumRecursionDepth正确之后返回None?这是我尚未研究的东西。

更新:好的,我正在搜索系统日志以查找可疑的东西,我认为我发现了linux停止mongo的意义,以下是1月15日的系统日志(/ var / log / syslog):

Jan 15 12:12:07 raspberrypi systemd[1]: Stopping An object/document-oriented database...
Jan 15 12:12:08 raspberrypi systemd[1]: Stopped An object/document-oriented database.
Jan 15 12:12:08 raspberrypi rc.local[463]: [967] Failed to execute script __main__
Jan 15 12:12:11 raspberrypi systemd[1]: Started An object/document-oriented database.
Jan 15 12:12:11 raspberrypi mongod[2336]: all output going to: /var/log/mongodb/mongodb.log
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped target Timers.
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped Daily apt upgrade and clean activities.
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped target Bluetooth.
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped Daily apt download activities.
Jan 15 12:14:22 raspberrypi systemd[1]: Stopping User Manager for UID 1000...
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped target System Time Synchronized.
Jan 15 12:14:22 raspberrypi vncserver-x11-serviced[453]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
Jan 15 12:14:22 raspberrypi vncserver-x11-serviced[453]:       after 14426 requests (14426 known processed) with 0 events remaining.
Jan 15 12:14:22 raspberrypi bluetoothd[524]: Terminating
Jan 15 12:14:22 raspberrypi systemd[1]: Stopping Bluetooth service...
Jan 15 12:14:22 raspberrypi watchdog[562]: stopping daemon (5.15)
Jan 15 12:14:22 raspberrypi systemd[1]: Stopping Disk Manager...
Jan 15 12:14:22 raspberrypi udisksd[883]: udisks daemon version 2.1.8 exiting
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped target Sound Card.
Jan 15 12:14:22 raspberrypi bluetoothd[524]: Stopping SDP server
Jan 15 12:14:22 raspberrypi systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Jan 15 12:14:22 raspberrypi bluetoothd[524]: Exit
Jan 15 12:14:22 raspberrypi systemd[1]: Stopping watchdog daemon...
Jan 15 12:14:22 raspberrypi systemd[1]: Stopping Save/Restore Sound Card State...
Jan 15 12:14:22 raspberrypi systemd[1]: Unmounting RPC Pipe File System...
Jan 15 12:14:22 raspberrypi systemd[1]: Stopping Authorization Manager...
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Jan 15 12:14:22 raspberrypi systemd[1]: Stopping Session c1 of user pi.
Jan 15 12:14:22 raspberrypi systemd[1]: Stopped Getty on tty1.
Jan 15 12:14:22 raspberrypi vncserver-x11-serviced[453]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
Jan 15 12:14:28 raspberrypi systemd-modules-load[111]: Inserted module 'i2c_dev'
Jan 15 12:14:28 raspberrypi systemd[1]: Started Apply Kernel Variables.
Jan 15 12:14:28 raspberrypi fake-hwclock[112]: Tue 15 Jan 12:14:24 UTC 2019
Jan 15 12:14:28 raspberrypi systemd[1]: Time has been changed
Jan 15 12:14:28 raspberrypi systemd[1]: Started Restore / save the current clock.
Jan 15 12:14:28 raspberrypi systemd-fsck[113]: e2fsck 1.43.4 (31-Jan-2017)
Jan 15 12:14:28 raspberrypi systemd[1]: Started Create Static Device Nodes in /dev.
Jan 15 12:14:28 raspberrypi systemd[1]: Starting udev Kernel Device Manager...
Jan 15 12:14:28 raspberrypi systemd-fsck[113]: /dev/mmcblk0p2: clean, 137720/939744 files, 1384356/3809792 blocks
Jan 15 12:14:28 raspberrypi systemd[1]: Started File System Check on Root Device.
Jan 15 12:14:28 raspberrypi systemd[1]: Starting Remount Root and Kernel File Systems...
Jan 15 12:14:28 raspberrypi systemd[1]: Started Remount Root and Kernel File Systems.
Jan 15 12:14:28 raspberrypi systemd[1]: Starting Load/Save Random Seed...
Jan 15 12:14:28 raspberrypi systemd[1]: Starting udev Coldplug all Devices...
Jan 15 12:14:28 raspberrypi systemd[1]: Starting Flush Journal to Persistent Storage...
Jan 15 12:14:28 raspberrypi systemd[1]: Started Load/Save Random Seed.
Jan 15 12:14:28 raspberrypi systemd[1]: Started Flush Journal to Persistent Storage.
Jan 15 12:14:28 raspberrypi systemd[1]: Started Set the console keyboard layout.
Jan 15 12:14:28 raspberrypi systemd[1]: Reached target Local File Systems (Pre).
Jan 15 12:14:28 raspberrypi systemd[1]: Started udev Kernel Device Manager.
Jan 15 12:14:28 raspberrypi systemd[1]: Started udev Coldplug all Devices.
Jan 15 12:14:28 raspberrypi systemd[1]: Starting Show Plymouth Boot Screen...
Jan 15 12:14:28 raspberrypi systemd[1]: Found device /dev/serial1.
Jan 15 12:14:28 raspberrypi systemd[1]: Started Show Plymouth Boot Screen.
Jan 15 12:14:28 raspberrypi systemd[1]: Reached target Encrypted Volumes.
Jan 15 12:14:28 raspberrypi systemd[1]: Reached target Paths.
Jan 15 12:14:28 raspberrypi systemd[1]: Started Forward Password Requests to Plymouth Directory Watch.
Jan 15 12:14:28 raspberrypi mtp-probe: checking bus 1, device 3: "/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.1"
Jan 15 12:14:28 raspberrypi mtp-probe: checking bus 1, device 4: "/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.2"
Jan 15 12:14:28 raspberrypi mtp-probe: checking bus 1, device 5: "/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4"
Jan 15 12:14:28 raspberrypi mtp-probe: bus: 1, device: 4 was not an MTP device
Jan 15 12:14:28 raspberrypi mtp-probe: bus: 1, device: 5 was not an MTP device
Jan 15 12:14:28 raspberrypi mtp-probe: bus: 1, device: 3 was not an MTP device
Jan 15 12:14:28 raspberrypi systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Jan 15 12:14:28 raspberrypi systemd[1]: Reached target Sound Card.
Jan 15 12:14:28 raspberrypi systemd[1]: Found device /dev/disk/by-partuuid/f143b93d-01.
Jan 15 12:14:28 raspberrypi systemd[1]: Starting File System Check on /dev/disk/by-partuuid/f143b93d-01...

前两行停止数据库,第三行“无法执行脚本 main ”来自我的程序!但是紧接着它会重新启动数据库。...有人可以理解发生了什么吗?似乎有很多服务被停止然后重新启动...

1 个答案:

答案 0 :(得分:0)

此问题现已解决,当使用SPI在树莓派上读取我的温度传感器时,未正确调用spi.close(),因此每个spi.open()都会生成一个新的spidev文件开了一个进程可以打开多少个文件有一个限制,在我的情况下,大约6-7个小时就达到了该限制,此后脚本崩溃了。

同样重要的是要注意,在代码的此区域中未实现正确的日志记录,也未正确捕获异常,因此未捕获或记录异常“ OSError:打开的文件太多”,这是一个非常神秘的问题。

一旦捕获到异常,通过在每个打开spi的位置正确添加spi.close()即可轻松解决此问题。

您可以使用以下方法检查进程已打开多少文件:

Linked Frameworks and Libraries

我的Linux发行版默认未安装lsof命令,所以我使用了:

lsof -p <pid_of_your_process>

现在与我的pymongo连接异常相关的原因是,为了建立连接,pymongo模块在套接字等各种东西上调用open(),依次打开一个文件。我以为我的程序六个小时后无法打开这些文件,因为它已经达到了可以打开的最大文件数量,因此引发了pymongo异常。

希望这对某人有帮助!