add Space character every 3 letters
-
Hi,
I am making a DNA viewer which display sequence like a hexadecimal viewer.
So, I would like to add space every 3 letters in my QByteArray.QByteArray seq = "ACGTATAGTACGTACG" seq = transform(seq,3) seq = "ACG TAT AGT ACG TAC"
What the most efficient way to do that ? QString / QByteArray have many methods
-
Hi,
I am making a DNA viewer which display sequence like a hexadecimal viewer.
So, I would like to add space every 3 letters in my QByteArray.QByteArray seq = "ACGTATAGTACGTACG" seq = transform(seq,3) seq = "ACG TAT AGT ACG TAC"
What the most efficient way to do that ? QString / QByteArray have many methods
-
Hi,
I am making a DNA viewer which display sequence like a hexadecimal viewer.
So, I would like to add space every 3 letters in my QByteArray.QByteArray seq = "ACGTATAGTACGTACG" seq = transform(seq,3) seq = "ACG TAT AGT ACG TAC"
What the most efficient way to do that ? QString / QByteArray have many methods
I don't know if it's efficient enough, but it's certainly a one-liner:
QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 ");
-
I don't know if it's efficient enough, but it's certainly a one-liner:
QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 ");
@kshegunov
Your code is small and effective.
My and your variant is probably the same in time? -
since you are using QByteArray (i.e. 1 character is 1 byte) you can probably optimise it using std::memcpy on the
data()
pointer.QByteArray tarnsform(const QByteArray& seq, int span){ if(seq.isEmpty() || span<=0) return QByteArray(); const int oldArrSize = seq.size(); QByteArray result(oldArrSize + (oldArrSize /span) - (oldArrSize %span==0),' '); auto sourceIter = seq.cbegin(); auto destIter = result.data(); const auto srcEnd=seq.cend(); for(int dstnc = std::distance(sourceIter,srcEnd);dstnc>0;dstnc-=span){ std::memcpy(destIter,sourceIter,qMin(dstnc,span)); destIter+=span+1; sourceIter+=span; } return result; }
EDIT:
The code I had before broke memory if
seq.size()%span!=0
-
@kshegunov
Your code is small and effective.
My and your variant is probably the same in time?@Taz742 said in add Space character every 3 letters:
My and your variant is probably the same in time?
I'd even speculate mine may be faster, even though it uses a regular expression. The problem with your piece of code is that at each insert of a new space you're copying the data after that position - the data has to be shifted, which might be rather heavy. The regular expression code (assuming it can optimize the expression well internally) can do it with a single memory allocation. In fact your code can be modified so it uses one allocation, by just using a resulting byte array and copying the data in chunks of 3 bytes, then setting a space, and then repeating.
Edit: My view hadn't updated, basically what @VRonin wrote is what I was talking about.
-
Hi
Fast test. Might have logical issues. Just for fun.using namespace std::chrono; void MainWindow::on_pushButton_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { int cnt = 0; QByteArray seq = "ACGTATAGTACGTACG"; for(int i = 3; i < seq.size() - 3; i++) { if(i % 3 == 0) { seq.insert(i + cnt++, ' '); } } } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time: " << duration ; } void MainWindow::on_pushButton_2_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 "); } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time QRegularExpression: " << duration ; }
Result:
time: 9001
time: 8002
time: 8001
time: 8004
time: 8001
time: 8001
time: 7995
time: 8001
time: 8001
time: 8001
time QRegularExpression: 161033
time QRegularExpression: 162033
time QRegularExpression: 161032
time QRegularExpression: 161032
time QRegularExpression: 162032
time QRegularExpression: 162032
time QRegularExpression: 162033 -
Hi
Fast test. Might have logical issues. Just for fun.using namespace std::chrono; void MainWindow::on_pushButton_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { int cnt = 0; QByteArray seq = "ACGTATAGTACGTACG"; for(int i = 3; i < seq.size() - 3; i++) { if(i % 3 == 0) { seq.insert(i + cnt++, ' '); } } } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time: " << duration ; } void MainWindow::on_pushButton_2_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 "); } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time QRegularExpression: " << duration ; }
Result:
time: 9001
time: 8002
time: 8001
time: 8004
time: 8001
time: 8001
time: 7995
time: 8001
time: 8001
time: 8001
time QRegularExpression: 161033
time QRegularExpression: 162033
time QRegularExpression: 161032
time QRegularExpression: 161032
time QRegularExpression: 162032
time QRegularExpression: 162032
time QRegularExpression: 162033 -
Hi
Fast test. Might have logical issues. Just for fun.using namespace std::chrono; void MainWindow::on_pushButton_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { int cnt = 0; QByteArray seq = "ACGTATAGTACGTACG"; for(int i = 3; i < seq.size() - 3; i++) { if(i % 3 == 0) { seq.insert(i + cnt++, ' '); } } } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time: " << duration ; } void MainWindow::on_pushButton_2_clicked() { high_resolution_clock::time_point t1 = high_resolution_clock::now(); for (int var = 0; var < 10000; ++var) { QString split = QString("ACGTATAGTACGTACG").replace(QRegularExpression("(.{3})"), "\\1 "); } high_resolution_clock::time_point t2 = high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count(); qDebug() << "time QRegularExpression: " << duration ; }
Result:
time: 9001
time: 8002
time: 8001
time: 8004
time: 8001
time: 8001
time: 7995
time: 8001
time: 8001
time: 8001
time QRegularExpression: 161033
time QRegularExpression: 162033
time QRegularExpression: 161032
time QRegularExpression: 161032
time QRegularExpression: 162032
time QRegularExpression: 162032
time QRegularExpression: 162033 -
@mrjj I think it's fair to optimize a bit:
QRegularExpression re{"(.{3})"}; high_resolution_clock::time_point t1 = high_resolution_clock::now(); ... ...QString("ACGTATAGTACGTACG").replace(re, "\\1 ");
-
Designing benchmarking tests isn't exactly trivial, but I'd suggest something too (probably the raw
insert
will outperform the rx, but still for the sake of argument):Don't use the same fixed size input string; use input that ranges from very short to very long. And do the benchmarking in batches e.g. run the same benchmark for at least 30-40 times and record the time for each run, then you'd get data that can be put into a histogram and you can work it statistically.
-
Designing benchmarking tests isn't exactly trivial, but I'd suggest something too (probably the raw
insert
will outperform the rx, but still for the sake of argument):Don't use the same fixed size input string; use input that ranges from very short to very long. And do the benchmarking in batches e.g. run the same benchmark for at least 30-40 times and record the time for each run, then you'd get data that can be put into a histogram and you can work it statistically.
@kshegunov
Yep varying input lengths might alter the result significantly so will try that too. -
Oh, I was not notify by email of all your answers ! Thanks a lot ! I will try it .
By the way, you can join the team for this small project !
https://github.com/labsquare/cuteFasta
Preview on twitter : https://twitter.com/labsquare/status/884146483406266368 -
try this : just modify ur for loop
i<seq.size()
thats it :- enjoy
QByteArray seq = "ACGTATAGTACGTACG";int cnt = 0; for(int i = 3; i < seq.size(); i++){ if(i % 3 == 0){ seq.insert(i + cnt++, ' '); } } qDebug() << seq; = "ACG TAT AGT ACG TAC G"