Download and regex parse an url source code

JonB

@Mr-Gisa
Then you should make that myhtml in your post a link to wherever it is, to help others. Thanks.

Mr Gisa

@JonB I was going to do that but due the heavy amount of things I forgot, thanks for pointing it out.

    QString html = "<html><head></head><body><div><span>HTML</span><a href=\"http://www.google.com\">a</a><a href=\"ohyeah.com\">b</a></div></body></html>";
    QByteArray chtml = html.toUtf8().constData();

    // basic init
    myhtml_t* myhtml = myhtml_create();
    myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);

    // first tree init
    myhtml_tree_t* tree = myhtml_tree_create();
    myhtml_tree_init(tree, myhtml);

    // parse html
    myhtml_parse(tree, MyENCODING_UTF_8, chtml, strlen(chtml));

    // get the A collection
    myhtml_collection_t *collection = myhtml_get_nodes_by_tag_id(tree, NULL, MyHTML_TAG_A, NULL);

    for(size_t i = 0; i < collection->length; i++) {
        myhtml_tree_attr_t *gets_attr = myhtml_attribute_by_key(collection->list[i], "href", 4);

        if (gets_attr) {
            const char *attr_char = myhtml_attribute_value(gets_attr, NULL);
            qDebug() << attr_char;
        }
    }

    // release resources
    myhtml_collection_destroy(collection);
    myhtml_tree_destroy(tree);
    myhtml_destroy(myhtml);

JonB

@Mr-Gisa
No, you misunderstand! I want to know: what is this "myhtml" thing? Is it a package? Source code? I want the hyperlink to wherever it is, so that I can look at/download it like you have done!

Mr Gisa

MyHTML is a fast HTML Parser using Threads implemented as a pure C99 library with no outside dependencies. https://github.com/lexborisov/myhtml

Gojir4

@JonB Ok, I see, sorry for my misunderstood. I agree with that. My point was that if you know in advance which format you will have to parse (as for Doxygen), regex and xquery can becomes a solution. Anyway, the problem has been solved :).

Mr Gisa

That is okay, you helped a lot