Quantcast
Channel: VBForums - Visual Basic .NET
Viewing all articles
Browse latest Browse all 27329

VS 2010 Is there a better way to extract HTML data using DOM?

$
0
0
HTML Code:

<table>
 <tr><td>T1R1C1</td><td>T1R1C2</td></tr>
 <tr><td>T1R2C1</td><td>T1R2C2</td></tr>
</table>
<table>
 <tr><td>T2R1C1</td><td>T2R1C2</td></tr>
 <tr><td>T2R2C1</td><td>T2R2C2</td></tr>
</table>

vb.net Code:
  1. Dim strDesiredCellContents As String = ""
  2. Private Sub Extract()
  3.   Dim hecTables As HtmlElementCollection = Me.wbMain.Document.Body.GetElementsByTagName("table")
  4.   Dim hecRows As HtmlElementCollection = hecTables(1).GetElementsByTagName("tr")
  5.   Dim hecCells As HtmlElementCollection = hecRows(1).GetElementsByTagName("td")
  6.   strDesiredCellContents = hecCells(1).InnerText
  7. End Sub
I often have to extract data from a specific location in an HTML table. I have been using RegEx with success but when there are few unique tags in large tables it can get cumbersome. As I wrote in another post I’d like to use the structure of the DOM to extract text specific cells. In the simplified example below my goal was to extract the text from the 2nd cell of the 2nd row of the 2nd table. In the real world I would be retrieving many more fields in larger tables so this appeals to me as elegant. But is this the best way? Can anyone recommend a better tack?

Viewing all articles
Browse latest Browse all 27329

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>