Fork me on GitHub

The AngleSharp Project

AngleSharp v0.10.0 released

Florian Rappl • Jan 6, 2019 • news, release, major

After more than 2 years AngleSharp v0.10 has finally become a reality. Jointly with AngleSharp.Io and AngleSharp.Css we now enter the v0.10 release cycle - an important milestone before hitting v1. The next step is an intermediate v0.11 (potentially of all of these libraries) to include feedback, suggestions, fixes, and other features before becoming stable.

AngleSharp started to be an HTML5 parser with a fully implemented (W3C compatible) DOM in mind. Now AngleSharp includes hooks to allow anything from custom requesters, cookie management, URL patterns, web extensions, styling engines, script engines, media handlers, and document handlers. If you want to handle some web resource in .NET (e.g., an HTML document, a CSS stylesheet, …) AngleSharp should be in your mind.

I want to take the chance to say thank you to all contributors, users, and fans of AngleSharp. It is truly an outstanding community effort and hopefully the beginning of a on ongoing effort to build the best possible eco-system for web technology analysis / usage in .NET. Thanks for all the patience and continuous feedback!

We already started with a renewed documentation currently available via the GitHub repository. This will soon be also available on the website and distributed to all repositories.

Especially interesting is the available migration guide. It is currently not 100% complete, but we try to extend it soon. While some namespaces and namings may have changed (slightly) the biggest change is the removal of the old .NET 4 and Silverlight targets. Yes, this will certainly hurt some people, but keeping these targets alive has become more and more difficult and time-consuming. It was one of the major points that prevented AngleSharp v0.10 to be released earlier. Now the time is right to fully adhere to .NET Standard.

Besides some interesting features in the CSS value model we now want to focus on the all-new AngleSharp.Js. For this we need to jointly improve (and extend) Jint as Jint v3 seems to be not only the future of Jint, but also a great improvement (e.g., inlcuding ES6) over good old Jint v2. Any help is appreciated (either directly on AngleSharp, a related project such as AngleSharp.Js, or Jint). Let’s build this together!

Taking AngleSharp to the Next Level

Florian Rappl • Mar 9, 2018 • news, organization

In the recent two months some advancements have been made to ensure that AngleSharp will see active development also in the future. Today I am happy to announce that plans for joining the .NET Foundation are becoming more solid.

The current progress of this endeavour can be trakced in the GitHub issue of the AngleSharp.Core repository. Many steps are missing, but its a start.

Also to give an update on our current timetable. Currently, the roadmap looks as follows:

  • A second hotfix for AngleSharp v0.9.9 will be released within the next (; dealing with the ConditionalWeakTable (thanks for the PR!)
  • The big new v0.10 release should be out end of March / beginning of April (keep our fingers crossed) - together with it the first release of AngleSharp.Css is expected
  • AngleSharp.Scripting will see much dedication towards the middle of the year; expect to run Angular.js, React, Angular, any jQuery version etc. with it - we will try to make this all possible
  • Potentially towards EO Q3 2018 AngleSharp v0.11 will happen
  • Hopefully towards the end of the year we finally have v1

Just some remarks on v0.11: Yes, the plan was to release v1 directly after v0.10, however, that plan is not realistic. First of all, there may be some issues that go beyond a hotfix (API changes). v0.10 has (unfortunately way too many) API changes, which need to be carefully assessed first. However, that alone does not justify a v0.11.

The most important aspect for v0.11 will be the drop of all legacy systems. Right now AngleSharp is delivered with multiple libraries inside the same NuGet package (one lib targetting .NET 4, another one for SL, PCL, …). This will change - we will only target .NET Standard (either 1.0 or 1.3). The decision for the specific .NET Standard version is not yet done, but there are many arguments in favor of 1.3.

Ultimately, our goal is to make development of AngleSharp simpler and to provide a more robust and streamlined version to our users. Going for .NET Standard (1.3) is not only done for cosmetics, but involves an internal rewrite with Span<T> and Memory<T>. All in all we expect some serious performance improvements in parsing scenarios.

AngleSharp v0.9.9.1 released

Florian Rappl • Jan 3, 2018 • news, release, hotfix

For over a year the development of AngleSharp was stale. But the project is not dead! Today I am proud to release a hotfix to the very successful version v0.9.9, called v0.9.9.1. This hotfix contains some critical fixes and improvements. It also paves the way for the v0.10 release, which is stuck with an observer problem (namely how to attach dynamic listeners to attribute changes for - from the perspective of AngleSharp.Core - unknown attributes), which is solved in this hotfix. In the long term this also enables users to disable the dynamic DOM, resulting in even better performance for static analysis when a fully dynamic DOM is not needed.

One of the things that kept AngleSharp stale was the issue of the broken tool chain. Essentially, .NET Core broke earlier behavior and as there was no global.json available the latest SDK was used by AppVeyor. Thus our build was not working (on AppVeyor, or many newer platforms that missed the required SDK) anymore. Now we specify the right SDK and also fully support Linux builds (at least WSL compatible), which had some minor issues previously.

AngleSharp is already quite in use and popular these days. The NuGet statistic of downloads per day also confirms this.

AngleSharp downloads per day

More importantly, the overall number of downloads is pretty much constantly increasing. Thus we do not see any decline in AngleSharp’s usage. However, increasing the number of downloads per day (i.e., stepping back in the arena with HtmlAgilityPack) will be an important goal for 2018.

AngleSharp downloads in total

To achieve this goal we will try to finally publish v0.10. If this can be done in the first half of 2018 then the way for v1 is definitely free. For v1 we will try to improve / realize

  • a better JS engine integration (e.g., capable of Angular, React, …),
  • finished AngleSharp.Css (at least API-wise),
  • more complete AngleSharp.Io package (including file and directory upload options etc.),
  • a very simple renderer, and
  • a very complete NuGet package that hosts everything combined (AngleSharp.Browser).

Let’s see what can be done! Happy new year everyone and thanks for using AngleSharp.

AngleSharp Scripting v0.5.0 released

Florian Rappl • Oct 3, 2016 • news, release, scripting

Just now the latest version of the JavaScript scripting extension AngleSharp.Scripting.JavaScript was released. In the new release with the version 0.5.0 a couple of bug fixes and improvements have been placed. All in all this is certainly leading the way forward.

The biggest internal change is the placement of the correct prototype chain. This was on the list for quite some time. The change allows calling, e.g., Object.getPrototypeOf(document) with the result being the HTMLDocument.prototype element. Also objects such as HTMLDocument are now available on the window context. These objects are the constructor functions of their matching DOM interfaces. In most cases the constructors are just dummies, which cannot be used directly.

A quite nice addition is the first API extension from the library. The ExecuteScript extension method works on document instances. It allows direct evaluation of a JavaScript snippet. The result is a .NET object, e.g., document.ExecuteScript("2+3") would result in a boxed System.Double instance (5). Similarly, a call to document.ExecuteScript("document.querySelector('div')") would return the instance of the first div element in the document.

Finally, a couple of improvements to the type casting abilities have been integrated. Functions are now also accepted in form of strings (e.g., for setTimeout or setInterval). Furthermore, for primitive types (strings, booleans, …) the corresponding JavaScript conversion is applied. This is all done as it should be.

For the way forward we will improve the extensibility and standard complience of AngleSharp.Scripting.JavaScript. With the v0.6 release a set of nasty tests needs to be passed, which all work in standard browsers. The ultimate goal is to reach the v1 milestone shortly after the AngleSharp.Core library reaches it.

AngleSharp v0.9.8 released

Florian Rappl • Sep 4, 2016 • news, release

Yesterday version 0.9.8 of AngleSharp has been released. There are not too many API changes - and barely any breaking change so one should not notice anything.

The biggest changes are the improvements in the parser, the cookie handling, and the requester. Here we managed to come up with a lot of useful improvements. Furthermore, more of the API has been opened. With the v0.9.9 release we will introduce some breaking changes that are necessary to prepare the v1.0 release. It is yet unclear if there won’t be a v0.9.10. Right now, the tendency is to come up with the first stable and production ready release soon.

In order to be production ready AngleSharp will separate most of the CSS into its own project (with its own versioning and maintenance). This will help in many aspects. However, we fully understand that AngleSharp is that much loved due to the fact of being complete, i.e., integrated so tightly with CSS. Therefore, a tiny CSS core will remain, which is responsible for parsing and building the CSSOM. Furthermore, the selectors will also remain in the core. This should be the best of both worlds.

So what will transition to its own CSS base? Essentially, all properties. Additionally, some special rules and declarations. Finally, new stuff such as the rendering device or anything related to specific properties or media queries will be placed there. Overall this should simplify the structure in AngleSharp and reduce its size.

The CSS parser will remain in AngleSharp. However, it will only return a generalized version of the CSSOM, which should be much easier to handle. Also in most cases it should be the best one to work against.

The exact changes are all listed for v0.9.8 in the changelog.

AngleSharp v0.9.7 released

Florian Rappl • Jul 16, 2016 • news, release

Too much time has past since the last update. The current release v0.9.7 marks the beginning of a dramatic change in the AngleSharp philosophy. For all related projects the target will change with the upcoming version. AngleSharp will be fully targeting .NET core. This change is an investment in the future of .NET core. Furthermore, it should simplify the development process in the long run - as the target platform is clearly defined.

Another upcoming change is related to the DOM interfacing. Right now AngleSharp tries to portray the IDL from the W3C in form of .NET interfaces. To make some things (e.g., different naming conventions) possible attributes have been inserted. However, the major problem with these interfaces is the coupling they are introducing. It’s not the most plugin friendly mechanism, as the overlap has always to be defined specifically.

Therefore a new approach will be added, which will be applied to partial interfaces (usually marked as DomNoInterfaceObject by AngleSharp): We will use static classes with extension methods for such interfaces. These classes can then be defined in external dependencies. In C# they are easily resolved via the compiler (once we have referenced the extension library and included the namespace we can use the methods). In, e.g., JavaScript (AngleSharp.Scripting.JavaScript) they can be resolved easily via an extension cache that is prepared with the configuration that collects all known extensions. In the long run this approach should provide a more agile basis.

Now its easy to guess what is about to come: AngleSharp.Core will be split into two libraries. This is a good compromise between the previously suggested three libraries and the current state of a single library. In the end we will obtain AngleSharp.Core and AngleSharp.Css. The query selector capability will still be part of AngleSharp.Core, however, everything else related to the CSSOM will be outsourced.

With the upcoming changes in mind the changelog of v0.9.7 is not really sensational:

  • Fixed some bugs (#343, #325, #341, #347, #355, #358)
  • Improved cookie handling (#280, #274, #365)
  • Added a document factory (#331)
  • EventNames, AttributeNames and others are available (#330)
  • Allow setting the active document (#281)
  • Improved Xamarin.iOS build (#85)
  • Changed service API slightly (#157)
  • Enhanced CoreCLR support (#270, #362)

Especially, #157 took a longer time. Here many evaluations have been done and the previously mentioned planned changes took shape. In the future the goal is to open AngleSharp even more and allow hooking into the whole pipeline. Making something like CSS work without being integrated into the core will certainly help to find the right level of customization.

Thanks to every contributor for the input and hard work - much appreciated as always!

AngleSharp v0.9.6 released

Florian Rappl • May 6, 2016 • news, release

More than two months after the release of AngleSharp v0.9.5 the new version v0.9.6 has been published. Besides some bug fixes the version focused on starting some changes regarding the final API touches and modifications.

The version may contain breaking changes to anyone using AngleSharp. Two of things that will potentially break builds are concerned with API renaming (e.g., Submit becomes SubmitAsync) and the removal of the IEventAggregator. While the former can be solved easily, the second one may actually require a little bit more time. Dropping the IEventAggregator was necessary to provide a uniform API that can also be exposed in JavaScript.

Additionally, the usage of new Configuration() is highly discouraged. If you create plain new configurations no factories will be available to parser, requesters, … - a lot of components. This will result in ill-behavior. I advise everyone to use Configuration.Default as the basis for any configuration adjustments.

Finally, the scripting library (AngleSharp.Scripting.JavaScript) has been updated - supporting the latest version of the core AngleSharp library and bringing some fixes and improvements. Together with some changes in the core library the experience should be much more complete right now, but there is still some way to go.

The samples, demo projects, and the AngleSharp.Io extension library have been updated accordingly. They all work seamlessly together and will be unified with the release of v1.0.0, planned later this year.

Thanks to everyone for their contribution!

AngleSharp v0.9.5 released

Florian Rappl • Mar 17, 2016 • news, release

It has been quite a while until the last release, but better late than never: AngleSharp v0.9.5 is now available. There have been some bug fixes and improvements. Most importantly (and thanks to Jeremy Meng from Microsoft after discussions with the core Nancy developers) we now support the CoreCLR (dotnet) target via NuGet.

The focus of v0.9.5 was to fix critical bugs, improve parts of the API, and rewrite the internal CSSOM representation. The internal CSSOM changes are not complete yet, but sufficient for v0.9.5 to be released. The critical bugs involved mostly encoding related issues.

Previously, a virtual file system for AngleSharp has been announced. This feature was essentially abandoned as it would be more appropriate in AngleSharp.Io. AngleSharp provides everything to keep track of requests / responses. The internal system has been adjusted and rewritten to cover all cases.

AngleSharp v0.9.5 comes with a set of new helpers. Some of these helpers are CSSOM related, others are there to distinguish between the different types of stylesheets. There are also helpers that are similar to special jQuery filters (some of which did not have any LINQ-counterpart already).

The development focus in the next couple of weeks will be of course on v0.9.6, however, the extension libraries will be adjusted to v0.9.5 first. Especially AngleSharp.Scripting.JavaScript did not get the (dev) attention it requires and deserves. This will be addressed first.

Thanks to everyone for their contribution!

AngleSharp v0.9.4 released

Florian Rappl • Dec 31, 2015 • news, release

Finally, after weeks of delays and many discussions, AngleSharp v0.9.4 is available. There have been some bug fixes and improvements. Most importantly, these are encoding and insertion pointer fixes.

However, AngleSharp v0.9.4 is in fact more than just a minor release. It could be considered a bridge release, as many internal things have been changed for the better. The upcoming version(s) will continue to walk the path enabled by v0.9.4.

Features, such as the behavior of OpenAsync (delayed until embedded resources finished loading) or the virtual file system are already partially available. The CSSOM will also see more updates and will be enhanced with further helpers to modify the OM with objects instead of raw strings.

On the API side a lot new things will come up. Many internal concepts will be made public and the parsing can be expected to become even more flexible.

Last but not least, the performance will be improved. While HTML is already quite decent (but could be improved in some scenarios), CSS has plenty of room for improvement.

AngleSharp v0.9.3 released

Florian Rappl • Oct 12, 2015 • news, release

AngleSharp v0.9.3 is another round of minor updates. Besides a few bug fixes the CompareDocumentPosition method has been improved. It now passes all tests and works reliably.

The most interesting new feature is the ability to define custom handling of entities. This can be done via the IEntityService. The GetSymbol is usually called with an entity like gt for XML or gt; for HTML. The difference between XML and HTML lies in the way that HTML handles entity errors. HTML has the possibility to use non-semicolon-terminated entities.

The simplest way would be to use, e.g., XmlEntityService.Resolver, in a custom implementation. That way the common entities would be resolved by the already available service.

The next release will most probably be a feature release again. Hopefully features such as the CSSOM improvements, factory extensions, or a virtual file system, will be integrated.

AngleSharp IO

Florian Rappl • Oct 4, 2015 • news, project

In the last days one of the remaining projects was officially launched: AngleSharp.Io. This library will provide many essential IO classes, helper methods, and DOM interfaces. Most importantly it will bring new / improved requesters, such as a much better HTTP/HTTPS requester build on top of the HttpClient class. As a consequence this library will unfortunately not be released as a PCL. In the long run more requesters will be integrated.

AngleSharp.Io is also the library that will finally offer a WebSocket implementation. Also the Storage interface will be made available, which can then be instantied as localStorage or sessionStorage. Under the hood the library should also be able to handle caching or resources.

Right now the work has just begun. The first work items only focus on the requester side, with a few interesting items, e.g., WebSocket, being on the way. As with the (JavaScript) scripting library the exact roadmap is unclear at this point in time, however, we will try to come up with one in the next couple of months.

AngleSharp v0.9.2 released

Florian Rappl • Sep 24, 2015 • news, release

This week’s minor update was only a small patch that fixed a bug in the tokenizer and improved the XML parser’s performance. It also features the brandnew application/json encoding type for form submission. The form submission process internals have been redesigned to be much easier to extend and use. The FormDataSet and FormDataSetEntry classes are now public. This forms the basis for sending forms without requiring a webpage or valid <form> element at all.

The renewed CSSOM is not part of v0.9.2. Also the previously mentioned update to the service model and the event aggregator change could not make it. They will (at least partially, hopefully) find their way into v0.9.3 or some update afterwards.

Until this point more and more updates are being done via feature branches. From now on this will be the only model. Every new feature or update has to be merged to devel from a specific feature branch, which is associated with an issue. The idea is to make the development as transparent and open as possible. Also future contributions, discussions, and general user engagement should be boosted.

AngleSharp v0.9.1 released

Florian Rappl • Sep 16, 2015 • news, release

A week ago the the first patch for AngleSharp v0.9.1 has been released. Besides fixing some issues the event loop model has been reworked. This will not be the last update to this mechanism. The next update v0.9.2 will focus on closing some existing issues, such as the proposed CssNode.

One of the most important additions to AngleSharp v0.9.1 is the ability to filter (http, data, …) requests. The standard requester service has been extended to provide the ability for this integration. It is therefore possible to stop unwanted requests directly without having to provide a custom IRequester implementation.

Finally a remark for the upcoming releases: The service model will definitely change until v1. Right now there are too many interfaces and layers required to extend AngleSharp. Also the whole approach won’t scale well once loosely coupled abilities, such as drawing or performance capturing, arrive. I believe that methods may be happier if they can just send a message, which may or may not find a listener. These messages will be very loosely coupled.

At the moment I don’t know if such a potential change will also affect the event aggregator, but only using DOM events may actually be a good thing. In the end the BrowsingContext itself may end up as an EventTarget, which would bring consistency across the entire library.

AngleSharp Documentation

Florian Rappl • Aug 29, 2015 • information, documentation

An important aspect that is still missing (on the new homepage) is documentation. There should be two directly linked pages: “Get Started” and “Documentation”. Right now the documentation can only be found in the Wiki, with a little code being displayed in the AngleSharp/AngleSharp repository. My plan is to keep both, but to update / sync them from a larger (more dedicated) source.

I am actively working on a documentation system, which allows deployment in several formats. Here we have HTML, Markdown and LaTeX. However, it is not that the markup system converts contents into other markup systems, which are directly or indirectly bound to output devices, but rather that this system allows very structured and programmatic access. It has LaTeX like features (more lightweight of course) in a RST-like syntax. The syntax has not yet stabilized, which is why I am using XML (parsed with AngleSharp of course) for tests. Finally the idea is to have one AngleSharp/Documentation repository containing the documentation code, which is then produced and placed (probably automatically) in the Wiki and on the page. Also a PDF (kind of like a book) will be produced from the documentation.

One of the most important features of this documentation system will be the ability to embed code from external files. These files may be given by URLs. The idea is to place the snippets shown in the documentation in unit tests. The unit tests are run in their respective repository (tests of core functionality in AngleSharp.Core, tests of scripting functionality in AngleSharp.Scripting). That ensures that the code shown in the documentation is always up to date and compilable. Of course one can specify certain lines, ranges or delimiters to limit the code shown in the documentation.

The only remaining question is: When will this documentation system be available? Well, I can’t give an exact date, but it has to be prior to v1.0. I hope to finish the system’s software in the end of October and to finish the (full!) documentation itself in the end of November.

AngleSharp v0.9 released

Florian Rappl • Aug 27, 2015 • news, release

Yesterday the latest version of Anglesharp has been released. This release marks the v0.9 milestone. Besides providing skeleton implementations for, e.g., the recent shadow DOM API draft, the picture element, etc., this version fixes some bugs that may appear in conjunction with using scripts.

Scripts are one of the success stories for AngleSharp. They make this library so useful. Therefore the AngleSharp.Scripting project is moving forward as well. The AngleSharp.Scripting.JavaScript library was yesterday released in with version 0.3. Here we will now try to align with the versioning of AngleSharp.Core.

AngleSharp.Core will definitely be split up. The library is already too huge and contains too many features to be considered lightweight. Let’s have a look at the SLOC (taken two months ago) of the DOM part alone:

AngleSharp Core DOM SLOC Distribution

From this picture alone we can already estimate that splitting the library could be benificial. We have roughly 1/3 SLOC spent on general, HTML and CSS functionality. The splitting could therefore result in three or four parts:

  • AngleSharp.Core.Common, containing the basic infrastructure and definitions [no dependency]
  • AngleSharp.Core.Html, containing the HTML parser and DOM implementation [depending on Common]
  • AngleSharp.Core.Css, containing the CSS parser and CSSOM implementation [depending on Common]
  • AngleSharp.Core.Complete, aggregating the Core and providing further helpers [depending on the former three]

Experiments with a proper dissection will begin soon. Also the renderer part will then be discussed. Plans have already been made and it seems likely that a renderer will be published within this year (experimental stage). Here a new project, AngleSharp.Renderer will be opened. The renderer itself will contain many libraries, specifically to make the renderer common infrastructure a PCL again with specific platform libraries that contain the actual drawing code.