add Space character every 3 letters
-
@kshegunov
Your code is small and effective.
My and your variant is probably the same in time?@Taz742 said in add Space character every 3 letters:
My and your variant is probably the same in time?
I'd even speculate mine may be faster, even though it uses a regular expression. The problem with your piece of code is that at each insert of a new space you're copying the data after that position - the data has to be shifted, which might be rather heavy. The regular expression code (assuming it can optimize the expression well internally) can do it with a single memory allocation. In fact your code can be modified so it uses one allocation, by just using a resulting byte array and copying the data in chunks of 3 bytes, then setting a space, and then repeating.
Edit: My view hadn't updated, basically what @VRonin wrote is what I was talking about.
-
Hi
Fast test. Might have logical issues. Just for fun.using namespace std::chrono; void MainWindow::on_pushButton_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { int cnt = 0; QByteArray seq = "ACGTATAGTACGTACG"; for(int i = 3; i < seq.size() - 3; i++) { if(i % 3 == 0) { seq.insert(i + cnt++, ' '); } } } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time: " << duration ; } void MainWindow::on_pushButton_2_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 "); } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time QRegularExpression: " << duration ; }
Result:
time: 9001
time: 8002
time: 8001
time: 8004
time: 8001
time: 8001
time: 7995
time: 8001
time: 8001
time: 8001
time QRegularExpression: 161033
time QRegularExpression: 162033
time QRegularExpression: 161032
time QRegularExpression: 161032
time QRegularExpression: 162032
time QRegularExpression: 162032
time QRegularExpression: 162033 -
Hi
Fast test. Might have logical issues. Just for fun.using namespace std::chrono; void MainWindow::on_pushButton_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { int cnt = 0; QByteArray seq = "ACGTATAGTACGTACG"; for(int i = 3; i < seq.size() - 3; i++) { if(i % 3 == 0) { seq.insert(i + cnt++, ' '); } } } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time: " << duration ; } void MainWindow::on_pushButton_2_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 "); } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time QRegularExpression: " << duration ; }
Result:
time: 9001
time: 8002
time: 8001
time: 8004
time: 8001
time: 8001
time: 7995
time: 8001
time: 8001
time: 8001
time QRegularExpression: 161033
time QRegularExpression: 162033
time QRegularExpression: 161032
time QRegularExpression: 161032
time QRegularExpression: 162032
time QRegularExpression: 162032
time QRegularExpression: 162033 -
Hi
Fast test. Might have logical issues. Just for fun.using namespace std::chrono; void MainWindow::on_pushButton_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { int cnt = 0; QByteArray seq = "ACGTATAGTACGTACG"; for(int i = 3; i < seq.size() - 3; i++) { if(i % 3 == 0) { seq.insert(i + cnt++, ' '); } } } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time: " << duration ; } void MainWindow::on_pushButton_2_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 "); } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time QRegularExpression: " << duration ; }
Result:
time: 9001
time: 8002
time: 8001
time: 8004
time: 8001
time: 8001
time: 7995
time: 8001
time: 8001
time: 8001
time QRegularExpression: 161033
time QRegularExpression: 162033
time QRegularExpression: 161032
time QRegularExpression: 161032
time QRegularExpression: 162032
time QRegularExpression: 162032
time QRegularExpression: 162033 -
@mrjj I think it's fair to optimize a bit:
QRegularExpression re{"(.{3})"}; high_resolution_clock::time_point t1 = high_resolution_clock::now(); ... ...QString("ACGTATAGTACGTACG").replace(re, "\\1 ");
-
Designing benchmarking tests isn't exactly trivial, but I'd suggest something too (probably the raw
insert
will outperform the rx, but still for the sake of argument):Don't use the same fixed size input string; use input that ranges from very short to very long. And do the benchmarking in batches e.g. run the same benchmark for at least 30-40 times and record the time for each run, then you'd get data that can be put into a histogram and you can work it statistically.
-
Designing benchmarking tests isn't exactly trivial, but I'd suggest something too (probably the raw
insert
will outperform the rx, but still for the sake of argument):Don't use the same fixed size input string; use input that ranges from very short to very long. And do the benchmarking in batches e.g. run the same benchmark for at least 30-40 times and record the time for each run, then you'd get data that can be put into a histogram and you can work it statistically.
@kshegunov
Yep varying input lengths might alter the result significantly so will try that too. -
Oh, I was not notify by email of all your answers ! Thanks a lot ! I will try it .
By the way, you can join the team for this small project !
https://github.com/labsquare/cuteFasta
Preview on twitter : https://twitter.com/labsquare/status/884146483406266368 -
try this : just modify ur for loop
i<seq.size()
thats it :- enjoy
QByteArray seq = "ACGTATAGTACGTACG";int cnt = 0; for(int i = 3; i < seq.size(); i++){ if(i % 3 == 0){ seq.insert(i + cnt++, ' '); } } qDebug() << seq; = "ACG TAT AGT ACG TAC G"