Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Thursday, January 28, 2016

PHP Jquery:Convert HTML to JSON from given url and create a tree view of html elements

PHP code:

loadHTML($html);          return element_to_obj($dom->documentElement);      }        function element_to_obj($element) {          //print_r($element);          $obj = array();          $attr = array();          $arr = array();          $name = $element->tagName;          foreach ($element->attributes as $attribute) {              $attr[$attribute->name] = $attribute->value;              if ($attribute->name == 'id') {                  $name .= '#'.$attribute->value;              }          }          if (!empty($attr)) {              $arr["attributes"] = $attr;          }          if ($element->nodeValue != '') {              $arr["value"] = $element->nodeValue;          }            foreach ($element->childNodes as $subElement) {                       if ($subElement->nodeType == XML_TEXT_NODE) {                }              elseif ($subElement->nodeType == XML_CDATA_SECTION_NODE) {                }              else {                  $arr["child_nodes"][] = element_to_obj($subElement);              }          }          $obj[$name] = $arr;          return $obj;      }      $json = json_encode(html_to_obj($html));      $fp = fopen('results.json', 'w');      fwrite($fp,$json);      fclose($fp);      echo $html;exit();  ?>  

JSON tree output:

enter image description here

JSON Result:

    {      "html": {          "attributes": {              "lang": "en"          },          "value": "Test Development Test\r\n            *{\r\n                box-sizing:border-box;\r\n            }\r\n            body {\r\n                margin:0;\r\n                font-family: sans-serif;\r\n                color: #999;\r\n            }\r\n            a, a:visited {\r\n                text-decoration:none;\r\n            }\r\n            .movie-list .movie{\r\n                width:250px;\r\n                float:left;\r\n                margin-right:25px;\r\n            }\r\n            .movie-list .movie img{\r\n                width:100%;\r\n            }\r\n            .movie-list .movie a.title{\r\n                text-decoration:none;\r\n                color:#999;\r\n                font-weight:bold;\r\n                font-size:18px;\r\n                line-height:25px;\r\n            }\r\n            .movie-list .movie .synopsis{\r\n                font-size:14px;\r\n                line-height:20px;\r\n            }\r\n",          "child_nodes": {              "head": {                  "child_nodes": {                      "meta": {                          "attributes": {                              "name": "description",                              "content": "A ast of animated movies"                          }                      },                      "title": {                          "value": "Test Development Test"                      },                      "style": {                          "attributes": {                              "type": "text/css"                          },                          "value": "\r\n            *{\r\n                box-sizing:border-box;\r\n            }\r\n            body {\r\n                margin:0;\r\n                font-family: sans-serif;\r\n                color: #999;\r\n            }\r\n            a, a:visited {\r\n                text-decoration:none;\r\n            }\r\n            .movie-list .movie{\r\n                width:250px;\r\n                float:left;\r\n                margin-right:25px;\r\n            }\r\n            .movie-list .movie img{\r\n                width:100%;\r\n            }\r\n            .movie-list .movie a.title{\r\n                text-decoration:none;\r\n                color:#999;\r\n                font-weight:bold;\r\n                font-size:18px;\r\n                line-height:25px;\r\n            }\r\n            .movie-list .movie .synopsis{\r\n                font-size:14px;\r\n                line-height:20px;\r\n            }\r\n"                      }                  }              },              "body": {                  "child_nodes": {                      "h1": {                          "value": "List of animated movies"                      },                      "div": {                          "attributes": {                              "class": "movie-list"                          },                          "child_nodes": {                              "div#bh_6": {                                  "attributes": {                                      "class": "movie",                                      "id": "bh_6",                                      "data-year": "2014"                                  },                                  "child_nodes": {                                      "img": {                                          "attributes": {                                              "src": "http://ia.media-imdb.com/images/M/MV5BMjI4MTIzODU2NV5BMl5BanBnXkFtZTgwMjE0NDAwMjE@._V1_SY317_CR0,0,214,317_AL_.jpg"                                          }                                      },                                      "a": {                                          "attributes": {                                              "class": "title",                                              "href": "http://www.imdb.com/title/tt2245084/"                                          },                                          "value": "Big Hero 6"                                      },                                      "div": {                                          "attributes": {                                              "class": "synopsis"                                          },                                          "value": "The special bond that develops between plus-sized inflatable robot Baymax, and prodigy Hiro Hamada, who team up with a group of friends to form a band of high-tech heroes."                                      }                                  }                              },                              "div#tlm": {                                  "attributes": {                                      "class": "movie",                                      "id": "tlm",                                      "data-year": "2014"                                  },                                  "child_nodes": {                                      "img": {                                          "attributes": {                                              "src": "http://ia.media-imdb.com/images/M/MV5BMTg4MDk1ODExN15BMl5BanBnXkFtZTgwNzIyNjg3MDE@._V1_SX214_AL_.jpg"                                          }                                      },                                      "a": {                                          "attributes": {                                              "class": "title",                                              "href": "http://www.imdb.com/title/tt1490017/"                                          },                                          "value": "The Lego Movie"                                      },                                      "div": {                                          "attributes": {                                              "class": "synopsis"                                          },                                          "value": "An ordinary Lego construction worker, thought to be the prophesied 'Special', is recruited to join a quest to stop an evil tyrant from gluing the Lego universe into eternal stasis."                                      }                                  }                              },                              "div#httyd": {                                  "attributes": {                                      "class": "movie",                                      "id": "httyd",                                      "data-year": "2010"                                  },                                  "child_nodes": {                                      "img": {                                          "attributes": {                                              "src": "http://ia.media-imdb.com/images/M/MV5BMjA5NDQyMjc2NF5BMl5BanBnXkFtZTcwMjg5ODcyMw@@._V1_SX214_AL_.jpg"                                          }                                      },                                      "a": {                                          "attributes": {                                              "class": "title",                                              "href": "http://www.imdb.com/title/tt0892769/"                                          },                                          "value": "How to Train Your Dragon"                                      },                                      "div": {                                          "attributes": {                                              "class": "synopsis"                                          },                                          "value": "A hapless young Viking who aspires to hunt dragons becomes the unlikely friend of a young dragon himself, and learns there may be more to the creatures than he assumed."                                      }                                  }                              },                              "div#up": {                                  "attributes": {                                      "class": "movie",                                      "id": "up",                                      "data-year": "2009"                                  },                                  "child_nodes": {                                      "img": {                                          "attributes": {                                              "src": "http://ia.media-imdb.com/images/M/MV5BMTk3NDE2NzI4NF5BMl5BanBnXkFtZTgwNzE1MzEyMTE@._V1_SX214_AL_.jpg"                                          }                                      },                                      "a": {                                          "attributes": {                                              "class": "title",                                              "href": "http://www.imdb.com/title/tt1049413/"                                          },                                          "value": "Up"                                      },                                      "div": {                                          "attributes": {                                              "class": "synopsis"                                          },                                          "value": "By tying thousands of balloons to his home, 78-year-old Carl sets out to fulfill his lifelong dream to see the wilds of South America. Russell, a wilderness explorer 70 years younger, inadvertently becomes a stowaway."                                      }                                  }                              },                              "div#mi": {                                  "attributes": {                                      "class": "movie",                                      "id": "mi",                                      "data-year": "2001"                                  },                                  "child_nodes": {                                      "img": {                                          "attributes": {                                              "src": "http://ia.media-imdb.com/images/M/MV5BMTY1NTI0ODUyOF5BMl5BanBnXkFtZTgwNTEyNjQ0MDE@._V1_SX214_AL_.jpg"                                          }                                      },                                      "a": {                                          "attributes": {                                              "class": "title",                                              "href": "http://www.imdb.com/title/tt0198781/"                                          },                                          "value": "Monsters, Inc."                                      },                                      "div": {                                          "attributes": {                                              "class": "synopsis"                                          },                                          "value": "Monsters generate their city's power by scaring children, but they are terribly afraid themselves of being contaminated by children, so when one enters Monstropolis, top scarer Sulley finds his world disrupted."                                      }                                  }                              }                          }                      }                  }              }          }      }  }  

Answer by fijas for PHP Jquery:Convert HTML to JSON from given url and create a tree view of html elements


As per your question, the part where you traverse the returned json object and create the tree is problematic. In your code, the recursive function to traverse the json data had a few minor issues with the generate ul code. The structure of the return object made it a bit challenging.

I was able to modify your html/javascript code a bit (without changing it too much) to print out the tree. The relevant code below:

CSS:

div#tree-sec ul ul{      margin-left: 25px;  }  div#tree-sec ul li{      color: #666;  }  div#tree-sec ul a{      color: #111;      text-decoration: underline;      cursor: pointer;  }     div#tree-sec ul a:hover {      text-decoration: none;  }  div#tree-sec ul.collapsible{      /* Custom parent styles here... */      /* Such as a folder icon or 'plus' sign */  }  

HTML & JS:

...  ...  
    .... .... .... ....

    This should provide a properly nested ul based tree. If creating an image of the tree is a hard requirement, you're best bet is to properly style the generated ul code fragment, create an html page with it on the server and then use a server side tool such as wkhtmltoimage from the wkhtmltopdf package that can be used to render the html document into an image.

    Also, one other thing I would like to mention is that instead of loading the retrieved html into a div, I would recommend that you use an iframe as then, the retrieved html would not interfere with your current page. In my example above, I have added an iframe in the preview div. In such a case, you can use php to only output the json data and setting the iframe to preview the url would be as simple as assigning the url as the src attribute of the iframe. Like this: $("#preview").prop("src", $("#url").val()).

    Edit: Updated code with a fix. Also added a new js function makeCollapsible() to retro-actively convert the ul into a clickable, collapsible tree structure as per OP's comment. Also added relevant CSS styles to style the tree structure. The tree now looks like the below picture for me:

    Collapsible, Clickable HTML Tree!

    Answer by Touqeer Shafi for PHP Jquery:Convert HTML to JSON from given url and create a tree view of html elements


    Check out this Library which is wrote by Jack.

    https://github.com/Jxck/html2json

    Hope it helps you.

    Answer by gfullam for PHP Jquery:Convert HTML to JSON from given url and create a tree view of html elements


    Addendum: This is a long answer, but it addresses specific problems and solutions for the code snippets you provided. I hope you and others will find it worth the time to compare. :)


    First, modify your PHP to make cleaner JSON

    When parsing the DOM, I recommend setting the element names from the returned object as the associative keys in $arr['child_nodes'] using array_merge() instead of pushing them onto the array as indexed items. To do this, $arr['child_nodes'] must be defined as an array first. Later, if no items get merged into it, you simply unset it before $arr gets added to the main object.

    This makes the final JSON result simpler to parse by precluding the need to use a nested loop in your javascript when building the tree.

    I also recommend inserting conditional checks for ->length before doing foreach loops. Your existing code was throwing "Warning" messages when zero-length elements entered into the loop.

    Lastly, you may choose to simplify your logic for handling node types by replacing your current if, else if, else statement with a single if checking for $subElement->nodeType === XML_ELEMENT_NODE, which I think is what you're trying to accomplish.

    loadHTML($html);          return element_to_obj($dom->documentElement);      }      function element_to_obj($element) {          $obj = $attr = $arr = array();          $name = $element->tagName;          if ($element->attributes->length) {              foreach ($element->attributes as $attribute) {                  $attr[$attribute->name] = $attribute->value;                  if ($attribute->name == 'id') {                      $name .= '#'.$attribute->value;                  }              }          }          if (!empty($attr)) {              $arr["attributes"] = $attr;          }          if ($element->nodeValue != '') {              $arr["value"] = $element->nodeValue;          }          if ($element->childNodes->length) {              $arr["child_nodes"] = array();              foreach ($element->childNodes as $subElement) {                           if ($subElement->nodeType === XML_ELEMENT_NODE) {                      $arr["child_nodes"] = array_merge($arr["child_nodes"], element_to_obj($subElement));                  }              }              if (!count($arr["child_nodes"])) {                  unset($arr["child_nodes"]);              }          }          $obj[$name] = $arr;          return $obj;      }      $json = json_encode(html_to_obj($html));      $fp   = fopen('results.json', 'w');      fwrite($fp, $json);      fclose($fp);  ?>  

    Use an iframe

    Insert an empty iframe into which you will load your target site. Inserting the markup from another site into yours can (and will likely) cause conflicts with your own code.

    Simplify the traverse function, reorder async calls

    The traverse function was suffering from three flaws:

    1. The use of a counter to create ids on the fly and then use jQuery to find previously made elements with those ids on which to append list items was a performance drain and confusing to debug.

    2. The use of .find() resulted in jQuery redundantly entering the recursive call and appending multi-redundant child nodes to the tree.

    3. Because it is the callback of a separate asynchronous call, it could execute before the first asynchronous call to getHTML.php had finished.

    Move the async call to get the JSON into the callback function on the first async call to prevent it from fetching incomplete or old JSON from the server.

    You should also use this first callback to set the iframe src and empty the #tree-sec container, so that subsequent actions don't append more than one tree. You could accomplish the same thing by using .replace() instead of .empty() followed by .append().

    To build the tree, I recommend the following simpler approach, which recursively builds the list as a string so that the .append method is only called once. For larger trees, this will dramatically improve performance.

    You may introduce a counter and dynamically assigned ids to this function if you want to, but I left it out to demonstrate more clearly that they are not needed to build the tree.

    I also recommend checking for the existence of child nodes before entering a recursive call. Doing this check allows you to pass in only the child nodes object, which ? because of the new JSON resulting from changes made to the PHP script ? now contains tag names as the keys instead of indexed keys with the elements as children. If we hadn't simplified the JSON, a second loop would have been required at this point to retrieve each element.

    You'll also notice the inclusion of aria- attributes and role attributes. This sets you up to be fully accessible if you choose.

    See: Using the WAI-ARIA aria-expanded state to mark expandable and collapsible regions (w3.org)

    It also provides you with a convenient and semantic way to control CSS and toggle state, which you can see demonstrated in the additional click handler added at the bottom of the script and the CSS example at the bottom of this answer.

    $(document).ready(function () {      function traverse(data, firstTime) {          if (typeof data === 'object') {                var ul = '
      '; $.each(data, function (key, val) { if (key !== 'attributes' && key !== 'value') { if (val['child_nodes']) { ul += '
    • '; ul += key; ul += traverse(val['child_nodes']); ul += '
    • '; } else { ul += '
    • '; ul += key; ul += '
    • '; } } }); ul += '
    '; return ul; } } $('.btn-search').on('click', function () { var url = $('#url').val(); if (url) { $.get( 'getHTML.php', { url: url }, function () { $('#preview-sec iframe').attr('src', url); $('#tree-sec').empty(); $.get( 'results.json', function (json) { $('#tree-sec').append(traverse(json, true)); }, 'json' ); }, 'html' ); } }); $('#tree-sec').on('click', 'li[aria-expanded]', function (e) { e.stopPropagation(); $(this) .attr('aria-expanded', function (i, attr) { return !(attr === 'true'); }) .children('ul') .attr('aria-hidden', function (i, attr) { return !(attr === 'true'); }) .toggle(); }); });

    Bonus: Use attribute selectors in the CSS

    Finally, as mentioned above, the existence of the aria- and role attributes provides a semantic and convenient way to control the styles.

    ul[role='tree'] {      margin-left: 1em;      padding-left: 0;  }  ul[role='tree'] li {      cursor: default;      margin: 0;      padding: 0 0 0 20px;      font: normal 1em sans-serif;      color: #333;  }  ul[role='tree'] li[aria-expanded] {      cursor: pointer;      font-weight: bold;      color: #111;      background: transparent 0 0 no-repeat url('images/arrow-sprite.png');  }  ul[role='tree'] li[aria-expanded="true"] {      background-position: 0 0;  }  ul[role='tree'] li[aria-expanded="false"] {      background-position: 0 20px;  }  

    Answer by user1587368 for PHP Jquery:Convert HTML to JSON from given url and create a tree view of html elements


    Look at XSLT processing. works fine with much less code effort

                

    title

    title 2

    title 3

    • t1
    • t2
    • t3

    0 comments:

    Post a Comment

    Popular Posts

    Powered by Blogger.