c# - How do I find specific matches using regex and put them in a string array? -
i have html file i'm trying extract data from. regex i'm using is
"<tr.+?>.+?<td class=\"table_row_col2\"><b>(.+?)&.+?</b>.+?<td class=\"table_row_col5\">(.+?)</td>.+?<td class=\"table_row_col6\">(.+?)</td>.+?</tr>"
it works in python not in c#. here's sample data:
<tr class="table_row" style="background-color: #d3d3d3;"> <td class="table_row_col1">271</td> <td class="table_row_col2"><b>16/09/2015 05:28 pm</b></font></small></sup></td> <td class="table_row_col3"><span style="color:#e30613">14.3</span></td> <td class="table_row_col4">-</td> <td class="table_row_col5">8</td> <td class="table_row_col6">-</td> <td class="table_row_col7">-</td> <td class="table_row_col8">before dinner</td> <td class="table_row_col9">-</td> <td class="table_row_col10">-</td> <td class="table_row_col11">-</td> </tr> <tr class="table_row" style="background-color: #ffffff;"> <td class="table_row_col1">272</td> <td class="table_row_col2"><b>16/09/2015 02:54 pm</b></font></small></sup></td> <td class="table_row_col3"><span style="color:#e30613">17.6</span></td> <td class="table_row_col4">-</td> <td class="table_row_col5">20</td> <td class="table_row_col6">32</td> <td class="table_row_col7">-</td> <td class="table_row_col8">other</td> <td class="table_row_col9">-</td> <td class="table_row_col10">-</td> <td class="table_row_col11">-</td> </tr> <tr class="table_row" style="background-color: #d3d3d3;"> <td class="table_row_col1">273</td> <td class="table_row_col2"><b>15/09/2015 11:09 pm</b></font></small></sup></td> <td class="table_row_col3">-</td> <td class="table_row_col4">-</td> <td class="table_row_col5">-</td> <td class="table_row_col6">34</td> <td class="table_row_col7">-</td> <td class="table_row_col8">before bed</td> <td class="table_row_col9">-</td> <td class="table_row_col10">-</td> <td class="table_row_col11">-</td> </tr>
i'm trying extract date table_row_col2 , numbers table_row_col5 , table_row_col6
if know html never changes can adding class split:
list<string> rows = split.extract(htmlstring, "class=\"table_row\"", "</tr>"); foreach (string row in rows) { string col2 = split.extract(row, "class=\"table_row_col2\"><b>", "</b>")[0]; string col5 = split.extract(row, "class=\"table_row_col5\">", "</td>")[0]; string col6 = split.extract(row, "class=\"table_row_col6\">", "</td>")[0]; console.writeline(col2 + ", " + col5 + ", " + col6); }
additional class split
:
public class split { public static list<string> extract(string source, string splitstart, string splitend) { try { var results = new list<string>(); string[] start = new string[] { splitstart }; string[] end = new string[] { splitend }; string[] temp = source.split(start, stringsplitoptions.none); (int = 1; < temp.length; i++) { results.add(temp[i].split(end, stringsplitoptions.none)[0]); } return results; } catch (exception e) { throw new exception(e.message); } } }
Comments
Post a Comment