Questions (with an example) about usage of QNetworkAccessManager

sairun

I'm developing a small software which will use Qt6 Widgets. The goal is to build a GUI utility to fetch information from GenBank (a store of DNA sequences of living organisms) and to analyze them afterwards. I'm more of a "C with Classes" developer, but obviously this project involves OOP. Hence, my doubts on the whole thing. The code is hosted on GitHub for those of you who want to take a look. Right now, it is just a proof of concept. It's executed from the command line and takes an organism's scientific name and a gene or genetic marker label as parameters and it fetches all DNA sequences in GenBank using NCBI's API (a free REST API).

The code is complex because fetching sequences from NCBI (GenBank) is a two-step process. Two different HTTP requests are made (in sequence) to the REST API with results coming in XML format: 1) the first request gets the IDs of the matching records and 2) using the IDs obtained previously, the second request gets the records themselves. To avoid overloading the REST server, requests are paged which adds complexity. The whole thing is explained in the GitHub code (the code is heavily commented and documented).

Since HTTP requests made through QNetworkAccessManager are asynchronous, responses are not guaranteed to come in the order they were made. Because the program is a command line executable I had to implement a convoluted way to terminate it when all expected records are retrieved. Plus, a delay had to be added to the requests to not exceed the maximum number of requests per second when API keys are not provided (more on that below).

My concerns are mostly related with memory management given the use of QNetworkAccessManager. What happens if a request is not completed (e.g., network is interrupted during requests)? Does the QNetworkReply finishes with an error? Are the deleteLater() calls correctly placed? Valgrind results seem OK to me:

==4726==
==4726== HEAP SUMMARY:
==4726==     in use at exit: 24,770 bytes in 38 blocks
==4726==   total heap usage: 33,294 allocs, 33,256 frees, 69,383,160 bytes allocated
==4726==
==4726== LEAK SUMMARY:
==4726==    definitely lost: 0 bytes in 0 blocks
==4726==    indirectly lost: 0 bytes in 0 blocks
==4726==      possibly lost: 0 bytes in 0 blocks
==4726==    still reachable: 24,770 bytes in 38 blocks
==4726==         suppressed: 0 bytes in 0 blocks
==4726== Rerun with --leak-check=full to see details of leaked memory
==4726==
==4726== For lists of detected and suppressed errors, rerun with: -s
==4726== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

My other major concern is related to the delay that had to be added to the code in order to slow down the number of requests to NCBI. Without an API Key you can do no more than 3 requests per second. With a Key you can do up to 10 requests per second. Take the example for the common bee (the scientific name should be provided with quotes since it's binomial)

 ./ncbiquery "Apis mellifera" CO1

which returns 115 records. With a default paging of 20 records per request (implemented in the software), the above command should fetch the 115 records in 6 requests! It works okay for a non NCBI-registered user but hangs on the 6th request for someone that provides an API Key! I'm still investigating that...

Thanks in advance to anyone who cares to comment or try the software

SGaist

Hi,

There's an error signal that you can connect to and from there trigger the reply deletion.

As for the number of request sent per second, rather than adding delays, I would look into implementing something like a "rate limiter". QNAM handles at most 6 requests in parallel and then queues them.
You could implement your own queue that takes into account the time elapsed since the last sent and the number of request sent. And each time a request is done, get the next one.

sairun

First, thanks for the reply.

When you say that "there's an error signal that you can connect to and from there trigger the reply deletion" you mean that the deleteLater() calls are not well positioned in the code (inside the SLOTS triggered by QNetworkReply::finished)?

I'm not sure I fully understand the "rate limiter" thing. Can you provide an example on that?

Regarding the hanging for registered user requests, I now understand why the software fails for them but still works for non-registered users. The rate limit for the latter is 3 requests per second. I've implemented a delay of one second which is fine (I can change it down to 0.333 seconds). For a registered user it's 10 requests per second. The problem is that in the example that I provided (bees) it finds 115 records. The program has to do 6 "NCBI requests" (chunks of 20 records), which actually translate into 12 HTTP requests (it's a two-step process). The 6th one fails due to the rate limit! The thing is that apparently in this case NCBI doesn't return a XML result but something else and that's why the program hangs because it expects to parse a returned XML...