Remus Avram Posted February 3, 2017 Report Share Posted February 3, 2017 Hi, we have a python script which runs in a loop and creates a new ftrack session every few minutes. After a few hours the script crashes with the error: IOError: [Errno 24] Too many open files: '/tmp/ftrack_api_schema_cache.json'. Is it because the session is not clouded properly? If we don't create a new session and the ftrack server is restarted, it will recconect the session to the server? Can we keep a session open for days/weeks/months? Link to comment Share on other sites More sharing options...
Lucas Correia Posted February 3, 2017 Report Share Posted February 3, 2017 Hi, That is odd. I haven't seen anything like this before. When the API session is started we load cached schemas from disk. The file should be closed after reading its contents. We also catch IOError if there is any issues with reading the file, so I am not sure what is causing the issue. Do you have any more logging statements or stack trace when the error occurs? You can send them to support@ftrack.com if you don't want to paste them here. It should in general be possible to keep a session for a long period of time. You might experience issues with cached values with run-longing sessions depending on how it is being used. Regards, Lucas Link to comment Share on other sites More sharing options...
Remus Avram Posted February 7, 2017 Author Report Share Posted February 7, 2017 Hi Lucas, thanks for the answer. The stack trace when the error occurs: Exception in thread Thread-1005: Traceback (most recent call last): File ".../python/2.7/centos-7_x86-64/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File ".../python/2.7/centos-7_x86-64/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File ".../python-packages/ftrack-python-api/1.0.3/centos-7_x86-64/lib/python2.7/site-packages/ftrack_python_api-1.0.3-py2.7.egg/ftrack_api/event/hub.py", line 178, in connect .format(self.get_server_url()) EventHubConnectionError: Failed to connect to event server at http://ftrack ftrack_api.session.Session> ERROR | 2017/02/01 11:41:15.738 (MainThread)|Schema cache could not be loaded from '/tmp/ftrack_api_schema_cache.json' File ".../python/2.7/centos-7_x86-64/lib/python2.7/logging/__init__.py", line 1192, in exception File ".../python-packages/ftrack-python-api/1.0.3/centos-7_x86-64/lib/python2.7/site-packages/ftrack_python_api-1.0.3-py2.7.egg/ftrack_api/session.py", line 1316, in _load_schemas File ".../python-packages/ftrack-python-api/1.0.3/centos-7_x86-64/lib/python2.7/site-packages/ftrack_python_api-1.0.3-py2.7.egg/ftrack_api/session.py", line 254, in __init__ File ".../event.py", line 192, in run Traceback (most recent call last): File ".../python-packages/ftrack-python-api/1.0.3/centos-7_x86-64/lib/python2.7/site-packages/ftrack_python_api-1.0.3-py2.7.egg/ftrack_api/session.py", line 1309, in _load_schemas File ".../python-packages/ftrack-python-api/1.0.3/centos-7_x86-64/lib/python2.7/site-packages/ftrack_python_api-1.0.3-py2.7.egg/ftrack_api/session.py", line 1270, in _read_schemas_from_cache IOError: [Errno 24] Too many open files: '/tmp/ftrack_api_schema_cache.json' ftrack_api.session.Session> ERROR | 2017/02/01 11:41:15.915 (MainThread)|Failed to update schema cache '/tmp/ftrack_api_schema_cache.json'. At the moment we set in the session schema_cache_path key argument to False. Link to comment Share on other sites More sharing options...
Remus Avram Posted February 13, 2017 Author Report Share Posted February 13, 2017 Hi Lucas, I found the real problem at the end. All the time when a new ftrack session is created, a new sub-process is created. Even if we delete the session object with the 'del()' python command, the sub-process still exists until the running script terminates. If the script is used as a daemon, at some point there are hundreds of sub-processes and it crashes with "too many open files". Even if we don't use the cache file, at some point we are running in other problems. #>pstree -p | grep python 307: | |-bash(27567)---python(30190)-+-{python}(30217) 308: | | |-{python}(30226) 309: | | |-{python}(30233) 310: | | |-{python}(30609) 311: | | |-{python}(30611) 312: | | |-{python}(30613) 313: | | |-{python}(30616) 314: | | |-{python}(30620) 315: | | |-{python}(30773) 316: | | |-{python}(30775) 317: | | |-{python}(30778) 318: | | |-{python}(30784) 319: | | |-{python}(30908) 320: | | |-{python}(31190) 321: | | |-{python}(31271) 322: | | |-{python}(31275) 323: | | |-{python}(31277) 324: | | |-{python}(31324) 325: | | |-{python}(31365) 326: | | |-{python}(31368) 327: | | |-{python}(31370) 328: | | |-{python}(31372) 329: | | |-{python}(31374) 330: | | |-{python}(31376) 331: | | |-{python}(31381) 332: | | |-{python}(31446) 333: | | |-{python}(31448) 334: | | |-{python}(31450) 335: | | |-{python}(31453) 336: | | |-{python}(31455) 337: | | |-{python}(31457) 338: | | |-{python}(31568) 339: | | |-{python}(31570) 340: | | |-{python}(31572) 341: | | |-{python}(31575) 342: | | |-{python}(31577) 343: | | |-{python}(31581) 344: | | |-{python}(31950) 345: | | |-{python}(31953) 346: | | |-{python}(31956) 347: | | |-{python}(31958) 348: | | |-{python}(31961) 349: | | |-{python}(31968) 350: | | |-{python}(32051) 351: | | |-{python}(32053) 352: | | `-{python}(32083) Link to comment Share on other sites More sharing options...
Remus Avram Posted February 14, 2017 Author Report Share Posted February 14, 2017 Hi Lucas, now we have a different problem. If we don't recreate the session, instead we clear the cache with session.cache.clear() and query the object again, we got the following error message randomly: int() argument must be a string or a number, not 'Symbol' Do you know way this could happen? Unfortunately, we can not check everywhere where we fetch the data if the return value is not a Symbol. Best, Remus Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted February 15, 2017 Report Share Posted February 15, 2017 On 14/02/2017 at 3:45 PM, Remus Avram said: now we have a different problem. If we don't recreate the session, instead we clear the cache with session.cache.clear() and query the object again, we got the following error message randomly: int() argument must be a string or a number, not 'Symbol' Do you know way this could happen? Unfortunately, we can not check everywhere where we fetch the data if the return value is not a Symbol. Could you come up with a small script that reproduces this issue - or is it completely random? It always helps when debugging things. Link to comment Share on other sites More sharing options...
Remus Avram Posted February 15, 2017 Author Report Share Posted February 15, 2017 Unfortunately it's completely random. It's hard to debug it from our side as well. I will describe a bit in which context we are getting the issue. Fortunately, our Event System is up and running. We have a listener action which spool all the events in an internal DB, and a dispatcher which gets event by event and process it. The process is happening almost instant. Workflow example: create a shot in Ftrack -> Ftrack event generated -> listener spool event to custom DB -> dispatcher gets event -> if conditions pass run action -> ftrack_session.query("Shot where id is <ID>".one() Link to comment Share on other sites More sharing options...
Lucas Correia Posted February 16, 2017 Report Share Posted February 16, 2017 Thanks for raising the issues. Are you using components or locations anywhere in your code? They need to be rediscovered if the cache is cleared. This is done automatically by Session.reset(). Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted February 16, 2017 Report Share Posted February 16, 2017 @Remus Avram are you doing any threading where you are sharing same the Session between threads? Link to comment Share on other sites More sharing options...
Remus Avram Posted February 16, 2017 Author Report Share Posted February 16, 2017 32 minutes ago, Lucas Correia said: Are you using components or locations anywhere in your code? They need to be rediscovered if the cache is cleared. This is done automatically by Session.reset(). We don't use components or locations in the script where the error is raising. But I will reset the session after clearing the cache as well. We clean the cache everytime before processing the event. 39 minutes ago, Mattias Lagergren said: @Remus Avram are you doing any threading where you are sharing same the Session between threads? No threading at the moment. The session is not share. Is it possible that the event is generated before the changes in the UI are fully saved? Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted February 17, 2017 Report Share Posted February 17, 2017 23 hours ago, Remus Avram said: Is it possible that the event is generated before the changes in the UI are fully saved? No, those events are emitted after the database transaction has completed. So should not be an issue if you do not have a some master/slave replication setup where slave may have old data. Quote Unfortunately it's completely random. It's hard to debug it from our side as well. Sorry if I've already asked this but are there are any particular attributes or relations where this happens? E.g. custom attributes has recently been mentioned by another client as problematic Link to comment Share on other sites More sharing options...
Remus Avram Posted February 17, 2017 Author Report Share Posted February 17, 2017 25 minutes ago, Mattias Lagergren said: Sorry if I've already asked this but are there are any particular attributes or relations where this happens? E.g. custom attributes has recently been mentioned by another client as problematic Yes, this happens with the custom attributes. Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted February 20, 2017 Report Share Posted February 20, 2017 On 17/02/2017 at 11:13 AM, Remus Avram said: Yes, this happens with the custom attributes. We're just about to release a major update on the backend for custom attributes that among other things will allow you to filter on custom attribute and also should improve the performance on them. As part of this we will change some of the implementation on the python client side. I'm not 100% but I suspect that these changes will solve this as well, as we change from the "special" ftrack_api.attribute.CustomAttributeCollectionAttribute to the one we use for metadata (ftrack_api.attribute.KeyValueMappedCollectionAttribute). Link to comment Share on other sites More sharing options...
Remus Avram Posted February 20, 2017 Author Report Share Posted February 20, 2017 Thanks Mattias for the updates! We are waiting for the release. @Lucas Correia unfortunately session.reset() doesn't fix the problem. Link to comment Share on other sites More sharing options...
Remus Avram Posted April 7, 2017 Author Report Share Posted April 7, 2017 Hi Mattias! Any updates regarding this issue? int() argument must be a string or a number, not 'Symbol' Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted April 8, 2017 Report Share Posted April 8, 2017 I will check with the team and get back to you shortly. Link to comment Share on other sites More sharing options...
Remus Avram Posted April 21, 2017 Author Report Share Posted April 21, 2017 We did more investigations in order to find what exactly the problem is with the Symbol(NOT_SET). When it returns a ftrack_api.symbol.Symbol type instead of the right value of the custom attribute of a ftrack entity, we tried: - clearing the cache: session.cache.clear() - resetting the session: session.reset() - getting the ftrack entity again: ftrack_entity = ftrack_session.get('TypedContext', ftrack_entity['id']) - querying again the DB: ftrack_entity = ftrack_session.query('{0} where id is {1}'.format(ftrack_entity.entity_type, ftrack_entity['id']) None of the above commands fix the problem. The only solution is to create a new session. If we recreate a new session, then the old session is not closed until the daemon (python scrip) finish running. In this case, at some point is crashing because of hundred of python sub-processes still running. Is there any workaround for this? Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted April 21, 2017 Report Share Posted April 21, 2017 I see, there are some substantial changes in 3.5.0 to custom attributes - hopefully the issue will be resolved then. I've contacted you in support-case #30782 about this. 31 minutes ago, Remus Avram said: If we recreate a new session, then the old session is not closed until the daemon (python scrip) finish running. In this case, at some point is crashing because of hundred of python sub-processes still running. This would be interesting to solve though. We keep the requests session as _request on the ftrack session object: session._request Perhaps calling close on it before destroying the session helps? We also have the event hub, maybe worth trying to disconnect it. session.event_hub.disconnect() If those helps and you can confirm it we could introduce something like: session.close() to disconnect everything. Link to comment Share on other sites More sharing options...
Remus Avram Posted April 21, 2017 Author Report Share Posted April 21, 2017 16 minutes ago, Mattias Lagergren said: Perhaps calling close on it before destroying the session helps? We also have the event hub, maybe worth trying to disconnect it. session.event_hub.disconnect() This helps! Thanks! Still it would be awesome to have session.close() fore disconnection everything; it's more common. Link to comment Share on other sites More sharing options...
Martin Pengelly-Phillips Posted April 24, 2017 Report Share Posted April 24, 2017 For reference, in our local wrapper of the API we have introduced a `close` method that disconnects the event hub and does some cleanup. We have also made the `Session` a context manager so you can do: with session: ... Seems to work well. If desired, we can put together a pull request. Link to comment Share on other sites More sharing options...
Remus Avram Posted April 24, 2017 Author Report Share Posted April 24, 2017 1 hour ago, Martin Pengelly-Phillips said: For reference, in our local wrapper of the API we have introduced a `close` method that disconnects the event hub and does some cleanup. Nice! Thanks! 1 hour ago, Martin Pengelly-Phillips said: We have also made the `Session` a context manager so you can do: with session: ... Seems to work well. I think this can be used for small scripts. At the moment we are using a session globally. Link to comment Share on other sites More sharing options...
Remus Avram Posted April 25, 2017 Author Report Share Posted April 25, 2017 I just rechecked session.event_hub.disconnect() in production and it seems that it doesn't fix the problem with closing the session properly. All the time when I recreate the session (disconnect and create a new session) it start a python subprocess. At the moment it's ok as we receive an Symbol Type Error around three times per day, and we restart the dispatcher ones per week. But it would be nice to be fixed in the future. Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted April 26, 2017 Report Share Posted April 26, 2017 Hi Remus, could you just provide a small reproducible script + what close session code that you are using. Also how you grep/see the non exited sub-process that is spawned. Link to comment Share on other sites More sharing options...
Mattias Lagergren Posted July 17, 2017 Report Share Posted July 17, 2017 @Remus Avram, thanks to a pull-request by @Martin Pengelly-Phillips there is now a close method on the Session and it is also a context manager. This is released in 1.3.0: http://ftrack-python-api.rtd.ftrack.com/en/stable/release/release_notes.html Link to comment Share on other sites More sharing options...
Remus Avram Posted July 17, 2017 Author Report Share Posted July 17, 2017 Thanks @Mattias Lagergren for the info! We will test it and let you know if there are any issues. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.