Dublin Core in HTML pages

Dublin Core is sometimes inserted into in the HTML header for search engine optimization purposes. I am very curious to know which search engine are being optimized for with the inclusion of DC metadata in the HTML header. Google clearly sates they don't use keywords anymore. Some argue that dublin core tags are different than keywords and therefore google might still be using them. As far as I know the specifics are a trade secret that Google hasn't made public. If anyone knows more on this please let me know in the comments.

I do know that Google's search engine scholar.google.com runs via a different bot and crawl process and does use some DC tags for identification. They have a sub-dialect of tags and have added some non-standard (not true dublin core) tags to what they expect. — how rude and presumptuous of Google... But Google Scholar is the only search engine I know about looking for Dublin Core metadata in HTML. If anyone knows of another one I'm very keen to know about it.

Bing sunset their academic/scholar service. My understanding is that when it was running, it was just one bot that crawled the data and then they filtered the single crawl to create the academic materials product this is a different approach than Google is taking.

Here are some interesting links on Dublin Core in the headers:


Is HTML5 a subset of SGML?

One of the lectures in my courses had me asking the following question: Is HTML5 a subset of SGML?

I did a little googling. Here are the links I found most relevant. Maybe someday I will write my own opinion on the topic... The short answer is that HTML5 became an abstract language with two modes of instantiation. One is XML valid (and hence SGML), the other is not SGML valid.

  • https://www.w3.org/2008/Talks/04-24-smith/index.html
  • https://mathiasbynens.be/notes/xhtml5
  • https://html.spec.whatwg.org/#html-vs-xhtml
  • https://stackoverflow.com/questions/5558502/is-html5-valid-xml/39560454#39560454
  • https://stackoverflow.com/questions/8460993/p-end-tag-p-is-not-needed-in-html
  • https://stackoverflow.com/questions/1946426/html-5-is-it-br-br-or-br
  • https://stackoverflow.com/questions/5558502/is-html5-valid-xml

Write it once Share it twice

For some time I have been challenged by learning Wiki Markup. I learned HTML 4.0 then I took on xHtml 1.1 and the market keeps evolving. I help to maintain a few wiki pages on the digital archival of language based materials on the company I work at’s intranet. Way cool that we have a wiki, but I haven’t written much their because I like to compose in WordPress (xHtml) and use the full screen mode to block out distractions. Most of what I write comes from various internet sources. I feel a certain obligation to the sources to acknowledge them publicly, if I am going to use their content privately too. Therefore, I prefer to share those things externally as well as internally. The result is that I usually post what I write to my personal blog before I post to the company intranet. In the past I have had to rework the markup syntax when I move things from WordPress to the wiki.

However, I recently found an HTML to Wiki syntax converter: http://labs.seapine.com/htmltowiki.cgi. This tool allows me to compose in WordPress, convert to Wiki syntax and then repost to the corporate wiki.