Fork me on GitHub

The AngleSharp Project


AngleSharp Scripting v0.5.0 released

Florian Rappl • Oct 3, 2016 • news, release, scripting

Just now the latest version of the JavaScript scripting extension AngleSharp.Scripting.JavaScript was released. In the new release with the version 0.5.0 a couple of bug fixes and improvements have been placed. All in all this is certainly leading the way forward.

The biggest internal change is the placement of the correct prototype chain. This was on the list for quite some time. The change allows calling, e.g., Object.getPrototypeOf(document) with the result being the HTMLDocument.prototype element. Also objects such as HTMLDocument are now available on the window context. These objects are the constructor functions of their matching DOM interfaces. In most cases the constructors are just dummies, which cannot be used directly.

A quite nice addition is the first API extension from the library. The ExecuteScript extension method works on document instances. It allows direct evaluation of a JavaScript snippet. The result is a .NET object, e.g., document.ExecuteScript("2+3") would result in a boxed System.Double instance (5). Similarly, a call to document.ExecuteScript("document.querySelector('div')") would return the instance of the first div element in the document.

Finally, a couple of improvements to the type casting abilities have been integrated. Functions are now also accepted in form of strings (e.g., for setTimeout or setInterval). Furthermore, for primitive types (strings, booleans, …) the corresponding JavaScript conversion is applied. This is all done as it should be.

For the way forward we will improve the extensibility and standard complience of AngleSharp.Scripting.JavaScript. With the v0.6 release a set of nasty tests needs to be passed, which all work in standard browsers. The ultimate goal is to reach the v1 milestone shortly after the AngleSharp.Core library reaches it.

AngleSharp v0.9.8 released

Florian Rappl • Sep 4, 2016 • news, release

Yesterday version 0.9.8 of AngleSharp has been released. There are not too many API changes - and barely any breaking change so one should not notice anything.

The biggest changes are the improvements in the parser, the cookie handling, and the requester. Here we managed to come up with a lot of useful improvements. Furthermore, more of the API has been opened. With the v0.9.9 release we will introduce some breaking changes that are necessary to prepare the v1.0 release. It is yet unclear if there won’t be a v0.9.10. Right now, the tendency is to come up with the first stable and production ready release soon.

In order to be production ready AngleSharp will separate most of the CSS into its own project (with its own versioning and maintenance). This will help in many aspects. However, we fully understand that AngleSharp is that much loved due to the fact of being complete, i.e., integrated so tightly with CSS. Therefore, a tiny CSS core will remain, which is responsible for parsing and building the CSSOM. Furthermore, the selectors will also remain in the core. This should be the best of both worlds.

So what will transition to its own CSS base? Essentially, all properties. Additionally, some special rules and declarations. Finally, new stuff such as the rendering device or anything related to specific properties or media queries will be placed there. Overall this should simplify the structure in AngleSharp and reduce its size.

The CSS parser will remain in AngleSharp. However, it will only return a generalized version of the CSSOM, which should be much easier to handle. Also in most cases it should be the best one to work against.

The exact changes are all listed for v0.9.8 in the changelog.

AngleSharp v0.9.7 released

Florian Rappl • Jul 16, 2016 • news, release

Too much time has past since the last update. The current release v0.9.7 marks the beginning of a dramatic change in the AngleSharp philosophy. For all related projects the target will change with the upcoming version. AngleSharp will be fully targeting .NET core. This change is an investment in the future of .NET core. Furthermore, it should simplify the development process in the long run - as the target platform is clearly defined.

Another upcoming change is related to the DOM interfacing. Right now AngleSharp tries to portray the IDL from the W3C in form of .NET interfaces. To make some things (e.g., different naming conventions) possible attributes have been inserted. However, the major problem with these interfaces is the coupling they are introducing. It’s not the most plugin friendly mechanism, as the overlap has always to be defined specifically.

Therefore a new approach will be added, which will be applied to partial interfaces (usually marked as DomNoInterfaceObject by AngleSharp): We will use static classes with extension methods for such interfaces. These classes can then be defined in external dependencies. In C# they are easily resolved via the compiler (once we have referenced the extension library and included the namespace we can use the methods). In, e.g., JavaScript (AngleSharp.Scripting.JavaScript) they can be resolved easily via an extension cache that is prepared with the configuration that collects all known extensions. In the long run this approach should provide a more agile basis.

Now its easy to guess what is about to come: AngleSharp.Core will be split into two libraries. This is a good compromise between the previously suggested three libraries and the current state of a single library. In the end we will obtain AngleSharp.Core and AngleSharp.Css. The query selector capability will still be part of AngleSharp.Core, however, everything else related to the CSSOM will be outsourced.

With the upcoming changes in mind the changelog of v0.9.7 is not really sensational:

  • Fixed some bugs (#343, #325, #341, #347, #355, #358)
  • Improved cookie handling (#280, #274, #365)
  • Added a document factory (#331)
  • EventNames, AttributeNames and others are available (#330)
  • Allow setting the active document (#281)
  • Improved Xamarin.iOS build (#85)
  • Changed service API slightly (#157)
  • Enhanced CoreCLR support (#270, #362)

Especially, #157 took a longer time. Here many evaluations have been done and the previously mentioned planned changes took shape. In the future the goal is to open AngleSharp even more and allow hooking into the whole pipeline. Making something like CSS work without being integrated into the core will certainly help to find the right level of customization.

Thanks to every contributor for the input and hard work - much appreciated as always!

AngleSharp v0.9.6 released

Florian Rappl • May 6, 2016 • news, release

More than two months after the release of AngleSharp v0.9.5 the new version v0.9.6 has been published. Besides some bug fixes the version focused on starting some changes regarding the final API touches and modifications.

The version may contain breaking changes to anyone using AngleSharp. Two of things that will potentially break builds are concerned with API renaming (e.g., Submit becomes SubmitAsync) and the removal of the IEventAggregator. While the former can be solved easily, the second one may actually require a little bit more time. Dropping the IEventAggregator was necessary to provide a uniform API that can also be exposed in JavaScript.

Additionally, the usage of new Configuration() is highly discouraged. If you create plain new configurations no factories will be available to parser, requesters, … - a lot of components. This will result in ill-behavior. I advise everyone to use Configuration.Default as the basis for any configuration adjustments.

Finally, the scripting library (AngleSharp.Scripting.JavaScript) has been updated - supporting the latest version of the core AngleSharp library and bringing some fixes and improvements. Together with some changes in the core library the experience should be much more complete right now, but there is still some way to go.

The samples, demo projects, and the AngleSharp.Io extension library have been updated accordingly. They all work seamlessly together and will be unified with the release of v1.0.0, planned later this year.

Thanks to everyone for their contribution!

AngleSharp v0.9.5 released

Florian Rappl • Mar 17, 2016 • news, release

It has been quite a while until the last release, but better late than never: AngleSharp v0.9.5 is now available. There have been some bug fixes and improvements. Most importantly (and thanks to Jeremy Meng from Microsoft after discussions with the core Nancy developers) we now support the CoreCLR (dotnet) target via NuGet.

The focus of v0.9.5 was to fix critical bugs, improve parts of the API, and rewrite the internal CSSOM representation. The internal CSSOM changes are not complete yet, but sufficient for v0.9.5 to be released. The critical bugs involved mostly encoding related issues.

Previously, a virtual file system for AngleSharp has been announced. This feature was essentially abandoned as it would be more appropriate in AngleSharp.Io. AngleSharp provides everything to keep track of requests / responses. The internal system has been adjusted and rewritten to cover all cases.

AngleSharp v0.9.5 comes with a set of new helpers. Some of these helpers are CSSOM related, others are there to distinguish between the different types of stylesheets. There are also helpers that are similar to special jQuery filters (some of which did not have any LINQ-counterpart already).

The development focus in the next couple of weeks will be of course on v0.9.6, however, the extension libraries will be adjusted to v0.9.5 first. Especially AngleSharp.Scripting.JavaScript did not get the (dev) attention it requires and deserves. This will be addressed first.

Thanks to everyone for their contribution!

AngleSharp v0.9.4 released

Florian Rappl • Dec 31, 2015 • news, release

Finally, after weeks of delays and many discussions, AngleSharp v0.9.4 is available. There have been some bug fixes and improvements. Most importantly, these are encoding and insertion pointer fixes.

However, AngleSharp v0.9.4 is in fact more than just a minor release. It could be considered a bridge release, as many internal things have been changed for the better. The upcoming version(s) will continue to walk the path enabled by v0.9.4.

Features, such as the behavior of OpenAsync (delayed until embedded resources finished loading) or the virtual file system are already partially available. The CSSOM will also see more updates and will be enhanced with further helpers to modify the OM with objects instead of raw strings.

On the API side a lot new things will come up. Many internal concepts will be made public and the parsing can be expected to become even more flexible.

Last but not least, the performance will be improved. While HTML is already quite decent (but could be improved in some scenarios), CSS has plenty of room for improvement.

AngleSharp v0.9.3 released

Florian Rappl • Oct 12, 2015 • news, release

AngleSharp v0.9.3 is another round of minor updates. Besides a few bug fixes the CompareDocumentPosition method has been improved. It now passes all tests and works reliably.

The most interesting new feature is the ability to define custom handling of entities. This can be done via the IEntityService. The GetSymbol is usually called with an entity like gt for XML or gt; for HTML. The difference between XML and HTML lies in the way that HTML handles entity errors. HTML has the possibility to use non-semicolon-terminated entities.

The simplest way would be to use, e.g., XmlEntityService.Resolver, in a custom implementation. That way the common entities would be resolved by the already available service.

The next release will most probably be a feature release again. Hopefully features such as the CSSOM improvements, factory extensions, or a virtual file system, will be integrated.

AngleSharp IO

Florian Rappl • Oct 4, 2015 • news, project

In the last days one of the remaining projects was officially launched: AngleSharp.Io. This library will provide many essential IO classes, helper methods, and DOM interfaces. Most importantly it will bring new / improved requesters, such as a much better HTTP/HTTPS requester build on top of the HttpClient class. As a consequence this library will unfortunately not be released as a PCL. In the long run more requesters will be integrated.

AngleSharp.Io is also the library that will finally offer a WebSocket implementation. Also the Storage interface will be made available, which can then be instantied as localStorage or sessionStorage. Under the hood the library should also be able to handle caching or resources.

Right now the work has just begun. The first work items only focus on the requester side, with a few interesting items, e.g., WebSocket, being on the way. As with the (JavaScript) scripting library the exact roadmap is unclear at this point in time, however, we will try to come up with one in the next couple of months.

AngleSharp v0.9.2 released

Florian Rappl • Sep 24, 2015 • news, release

This week’s minor update was only a small patch that fixed a bug in the tokenizer and improved the XML parser’s performance. It also features the brandnew application/json encoding type for form submission. The form submission process internals have been redesigned to be much easier to extend and use. The FormDataSet and FormDataSetEntry classes are now public. This forms the basis for sending forms without requiring a webpage or valid <form> element at all.

The renewed CSSOM is not part of v0.9.2. Also the previously mentioned update to the service model and the event aggregator change could not make it. They will (at least partially, hopefully) find their way into v0.9.3 or some update afterwards.

Until this point more and more updates are being done via feature branches. From now on this will be the only model. Every new feature or update has to be merged to devel from a specific feature branch, which is associated with an issue. The idea is to make the development as transparent and open as possible. Also future contributions, discussions, and general user engagement should be boosted.

AngleSharp v0.9.1 released

Florian Rappl • Sep 16, 2015 • news, release

A week ago the the first patch for AngleSharp v0.9.1 has been released. Besides fixing some issues the event loop model has been reworked. This will not be the last update to this mechanism. The next update v0.9.2 will focus on closing some existing issues, such as the proposed CssNode.

One of the most important additions to AngleSharp v0.9.1 is the ability to filter (http, data, …) requests. The standard requester service has been extended to provide the ability for this integration. It is therefore possible to stop unwanted requests directly without having to provide a custom IRequester implementation.

Finally a remark for the upcoming releases: The service model will definitely change until v1. Right now there are too many interfaces and layers required to extend AngleSharp. Also the whole approach won’t scale well once loosely coupled abilities, such as drawing or performance capturing, arrive. I believe that methods may be happier if they can just send a message, which may or may not find a listener. These messages will be very loosely coupled.

At the moment I don’t know if such a potential change will also affect the event aggregator, but only using DOM events may actually be a good thing. In the end the BrowsingContext itself may end up as an EventTarget, which would bring consistency across the entire library.

AngleSharp Documentation

Florian Rappl • Aug 29, 2015 • information, documentation

An important aspect that is still missing (on the new homepage) is documentation. There should be two directly linked pages: “Get Started” and “Documentation”. Right now the documentation can only be found in the Wiki, with a little code being displayed in the AngleSharp/AngleSharp repository. My plan is to keep both, but to update / sync them from a larger (more dedicated) source.

I am actively working on a documentation system, which allows deployment in several formats. Here we have HTML, Markdown and LaTeX. However, it is not that the markup system converts contents into other markup systems, which are directly or indirectly bound to output devices, but rather that this system allows very structured and programmatic access. It has LaTeX like features (more lightweight of course) in a RST-like syntax. The syntax has not yet stabilized, which is why I am using XML (parsed with AngleSharp of course) for tests. Finally the idea is to have one AngleSharp/Documentation repository containing the documentation code, which is then produced and placed (probably automatically) in the Wiki and on the page. Also a PDF (kind of like a book) will be produced from the documentation.

One of the most important features of this documentation system will be the ability to embed code from external files. These files may be given by URLs. The idea is to place the snippets shown in the documentation in unit tests. The unit tests are run in their respective repository (tests of core functionality in AngleSharp.Core, tests of scripting functionality in AngleSharp.Scripting). That ensures that the code shown in the documentation is always up to date and compilable. Of course one can specify certain lines, ranges or delimiters to limit the code shown in the documentation.

The only remaining question is: When will this documentation system be available? Well, I can’t give an exact date, but it has to be prior to v1.0. I hope to finish the system’s software in the end of October and to finish the (full!) documentation itself in the end of November.

AngleSharp v0.9 released

Florian Rappl • Aug 27, 2015 • news, release

Yesterday the latest version of Anglesharp has been released. This release marks the v0.9 milestone. Besides providing skeleton implementations for, e.g., the recent shadow DOM API draft, the picture element, etc., this version fixes some bugs that may appear in conjunction with using scripts.

Scripts are one of the success stories for AngleSharp. They make this library so useful. Therefore the AngleSharp.Scripting project is moving forward as well. The AngleSharp.Scripting.JavaScript library was yesterday released in with version 0.3. Here we will now try to align with the versioning of AngleSharp.Core.

AngleSharp.Core will definitely be split up. The library is already too huge and contains too many features to be considered lightweight. Let’s have a look at the SLOC (taken two months ago) of the DOM part alone:

AngleSharp Core DOM SLOC Distribution

From this picture alone we can already estimate that splitting the library could be benificial. We have roughly 1/3 SLOC spent on general, HTML and CSS functionality. The splitting could therefore result in three or four parts:

  • AngleSharp.Core.Common, containing the basic infrastructure and definitions [no dependency]
  • AngleSharp.Core.Html, containing the HTML parser and DOM implementation [depending on Common]
  • AngleSharp.Core.Css, containing the CSS parser and CSSOM implementation [depending on Common]
  • AngleSharp.Core.Complete, aggregating the Core and providing further helpers [depending on the former three]

Experiments with a proper dissection will begin soon. Also the renderer part will then be discussed. Plans have already been made and it seems likely that a renderer will be published within this year (experimental stage). Here a new project, AngleSharp.Renderer will be opened. The renderer itself will contain many libraries, specifically to make the renderer common infrastructure a PCL again with specific platform libraries that contain the actual drawing code.