【hexo专栏】hexo如何进行文章渲染的

一、介绍

使用 hexo 做博客系统的时候，我们是通过编写 markdown 文件，并且在 markdown 中还会使用 hexo 的 Tag 语法进行扩展。那么本文介绍一下它是如何实现的。

二、渲染原理

在 Hexo 中，文章内容区域的渲染原理主要有以下几个步骤：

用户使用 Markdown 编写博客文章，将其保存在 _posts 目录下。
Hexo 会读取 _posts 目录下的 Markdown 文件，并将其转换成 HTML 文件。Hexo 使用了 hexo-renderer-marked 插件来实现 Markdown 到 HTML 的转换。
在转换过程中，Hexo 会将 Markdown 标记解析成 HTML 标签，并应用主题的样式。Hexo 主题通常会定义一些 CSS 样式，用于美化博客的页面。
在渲染 HTML 页面时，Hexo 会将文章内容区域放在 article 标签中。article 标签是 HTML5 中新增的语义化标签，用于表示文章的主体内容。
Hexo 会将生成的 HTML 文件输出到 public 目录下，这些 HTML 文件就是最终的静态页面。
当用户访问博客时，Hexo 会读取 public 目录下的 HTML 文件，并将其发送给浏览器。浏览器会解析 HTML，应用 CSS 样式，并将其呈现在页面中。

总之，Hexo 的文章内容区域渲染原理主要是通过将 Markdown 转换成 HTML，并应用主题的样式，最终生成静态页面的方式来实现的。Hexo 提供了非常灵活的主题机制，用户可以根据自己的需求自定义主题的样式和布局。

三、hexo-renderer-marked原理

hexo-renderer-marked 是 Hexo 中的一个插件，它用于将 Markdown 格式的博客文章转换成 HTML 格式。它的实现原理如下：

当用户写好一篇 Markdown 格式的文章，并保存在 Hexo 的博客目录中，Hexo 会检测到该文件的变化。
Hexo 将 Markdown 文件读取到内存中，并使用 hexo-renderer-marked 插件解析该文件。hexo-renderer-marked 使用了 marked 库来将 Markdown 转换成 HTML。
marked 库将 Markdown 文本转换成一个抽象语法树（AST）。AST 是一个数据结构，用于表示代码或文本的结构。在 AST 中，每个节点表示一个语法结构，例如段落、标题、列表、代码块等。
hexo-renderer-marked 插件遍历 AST，将其转换成 HTML 标记。在转换的过程中，hexo-renderer-marked 插件还会应用一些 Hexo 的标记语法，例如
{% asset_img %} 和 {% blockquote %} 等。
最后，hexo-renderer-marked 将生成的 HTML 标记输出到内存中，并返回给 Hexo。Hexo 将这些标记保存到一个 HTML 文件中，并将该文件输出到 public 目录中。

总之，hexo-renderer-marked 的实现原理是将 Markdown 转换成 HTML 标记。它使用了 marked 库来将 Markdown 转换成 AST，并使用一些插件来将 AST 转换成 HTML 标记。通过这种方式，hexo-renderer-marked 可以将用户写的 Markdown 文本转换成静态页面，从而实现博客的渲染。

3.1 hexo-renderer-marked代码

代码路径: /lib/renderer.js，简化代码:

const { marked } = require('marked');

marked.setOptions({
  langPrefix: ''
});
const extensions = [];
marked.use({extensions});
marked(text, Object.assign({
  renderer,
  tokenizer
}, markedCfg, options, { postPath }))

3.2 marked跑一个例子

首先最简单的方式:

const {marked} = require('marked')

let res = marked('# hello');

console.log(res);

运行后，吐出结果:

1	<h1 id="hello">hello</h1>

3.3 renderer

然后我们还看到用了个renderer，文档描述：https://marked.js.org/using_pro#renderer ，看样子是格式化特定markdown标签用的。

demo:

// Create reference instance
import { marked } from 'marked';

// Override function
const renderer = {
  heading(text, level) {
    const escapedText = text.toLowerCase().replace(/[^\w]+/g, '-');

    return `
            <h${level}>
              <a name="${escapedText}" class="anchor" href="#${escapedText}">
                <span class="header-link"></span>
              </a>
              ${text}
            </h${level}>`;
  }
};

marked.use({ renderer });

// Run marked
console.log(marked.parse('# heading+'));

结果:

<h1>
  <a name="heading-" class="anchor" href="#heading-">
    <span class="header-link"></span>
  </a>
  heading+
</h1>

这个在 hexo-renderer-marked里面用了:

const MarkedRenderer = marked.Renderer;

class Renderer extends MarkedRenderer {
    constructor(hexo) {...}
    heading(text, level) {...}
    link(href, title, text){...}
    paragraph(text) {...}
    image(href, title, text) {...}
}

3.3 tokenizer

marked 文档: https://marked.js.org/using_pro#tokenizer

const MarkedTokenizer = marked.Tokenizer;

class Tokenizer extends MarkedTokenizer {
    url(src, mangle) {...}
    inlineText(src) {...}
}

3.4 dompurify

在 hexo-markdown-render 中有个代码是:

if (dompurify) {
  if (createDOMPurify === undefined && JSDOM === undefined) {
    createDOMPurify = require('dompurify');
    JSDOM = require('jsdom').JSDOM;
  }
  const window = new JSDOM('').window;
  const DOMPurify = createDOMPurify(window);
  let param = {};
  if (dompurify !== true) {
    param = dompurify;
  }
  sanitizer = function(html) { return DOMPurify.sanitize(html, param); };
}
return sanitizer(marked(text, Object.assign({
  renderer,
  tokenizer
}, markedCfg, options, { postPath })));

默认这个 hexo-markdown-render配置里面 dompurify 是false.

Dompurify是一个用于客户端JavaScript的DOMPurify库，可以用于对HTML进行安全过滤和净化，以防止跨站脚本攻击（XSS攻击）。Dompurify使用白名单机制，只允许安全的HTML标签和属性，过滤掉所有不安全的标签和属性，以确保HTML内容不会包含恶意代码或脚本。因此，Dompurify通常用于需要将用户提交的HTML内容进行渲染的应用程序，例如博客评论或在线编辑器等场景。

对应的npm地址：https://www.npmjs.com/package/dompurify

我们因为是自己写的博客，所以不需要，但是如果是用户发表，那就需要做这种机制了。

四、文章内容

我们前面讲过hexo通过watch文件,一共watch了source和主题，代码在lib/hexo/index.js:

watch(callback){
    ...
    return loadDatabase(this).then(() => {
      this.log.info('Start processing');

      return Promise.all([
        this.source.watch(),
        this.theme.watch()
      ]);
    }).then(() => {
    });
}

当变化会去重新生成html，source的watch代码在lib/box/index.js

watch(callback) {
    if (this.isWatching()) {
        return Promise.reject(new Error('Watcher has already started.')).asCallback(callback);
    }

    const { base } = this;

    function getPath(path) {
        return escapeBackslash(path.substring(base.length));
    }

    return this.process().then(() => watch(base, this.options)).then(watcher => {
        this.watcher = watcher;

        watcher.on('add', path => {
        this._processFile(File.TYPE_CREATE, getPath(path));
        });

        watcher.on('change', path => {
        this._processFile(File.TYPE_UPDATE, getPath(path));
        });

        watcher.on('unlink', path => {
        this._processFile(File.TYPE_DELETE, getPath(path));
        });

        watcher.on('addDir', path => {
        let prefix = getPath(path);
        if (prefix) prefix += '/';

        this._readDir(path, prefix);
        });
    }).asCallback(callback);
}

我们看到当生成文件的时候，会调用

1	this._processFile(File.TYPE_CREATE, getPath(path));

最终保存到了db文件里面，并触发processAfter事件。

然后由于外面监听了这个processAfter事件，所以会调用_generate方法，代码如下:

watch(callback) {
    ....
    this._watchBox = debounce(() => this._generate({ cache: useCache }), 100);
    this.source.on('processAfter', this._watchBox);
}

然后调用_generate方法：

_generate(options = {}) {
    ...
    return this.execFilter('before_generate', this.locals.get('data'), { context: this })
      .then(() => this._routerReflesh(this._runGenerators(), useCache)).then(() => {
        this.emit('generateAfter');

        // Run after_generate filters
        return this.execFilter('after_generate', null, { context: this });
      }).finally(() => {
        this._isGenerating = false;
      });
}

_runGenerate方法里面, 会去调用对应的generator方法：

这个里面会去调用对应的process，这个文章的process是：lib/plugins/processor/post.js:

return Promise.all([
  file.stat(),
  file.read()
]).spread((stats, content) => {
  const data = yfm(content);
  const info = parseFilename(config.new_post_name, path);
  ....
});

关于文章，我们知道分为两部分，上面一部分是一些meta信息，下面是文章的内容。这块的解析： hexo-front-matter/lib/font_matter.js:

const rFrontMatter = /^(-{3,}|;{3,})\n([\s\S]+?)\n\1\n?([\s\S]*)/;
const rFrontMatterNew = /^([\s\S]+?)\n(-{3,}|;{3,})\n?([\s\S]*)/;

function split(str) {
  if (typeof str !== 'string') throw new TypeError('str is required!');

  const matchOld = str.match(rFrontMatter);
  if (matchOld) {
    return {
      data: matchOld[2],
      content: matchOld[3] || '',
      separator: matchOld[1],
      prefixSeparator: true
    };
  }

  if (rPrefixSep.test(str)) return { content: str };

  const matchNew = str.match(rFrontMatterNew);

  if (matchNew) {
    return {
      data: matchNew[1],
      content: matchNew[3] || '',
      separator: matchNew[2],
      prefixSeparator: false
    };
  }

  return { content: str };
}

function parse(str, options) {
  if (typeof str !== 'string') throw new TypeError('str is required!');

  const splitData = split(str);
  const raw = splitData.data;

  if (!raw) return { _content: str };

  let data;

  if (splitData.separator.startsWith(';')) {
    data = parseJSON(raw);
  } else {
    data = parseYAML(raw, options);
  }

  if (!data) return { _content: str };

  // Convert timezone
  Object.keys(data).forEach(key => {
    const item = data[key];

    if (item instanceof Date) {
      data[key] = new Date(item.getTime() + (item.getTimezoneOffset() * 60 * 1000));
    }
  });

  data._content = splitData.content;
  return data;
}

routerReflesh方法:

_routerReflesh(runningGenerators, useCache) {
  ...
  return this.execFilter('template_locals', new Locals(path, data), { context: this })
      .then(locals => { route.set(path, createLoadThemeRoute(generatorResult, locals, this)); })
      .thenReturn(path);
  ...
}

对应的createLoadThemeRoute, 下面是

const createLoadThemeRoute = function(generatorResult, locals, ctx) {
  const { log, theme } = ctx;
  const { path, cache: useCache } = locals;

  const layout = [...new Set(castArray(generatorResult.layout))];
  const layoutLength = layout.length;

  // always use cache in fragment_cache
  locals.cache = true;
  return () => {
    if (useCache && routeCache.has(generatorResult)) return routeCache.get(generatorResult);

    for (let i = 0; i < layoutLength; i++) {
      const name = layout[i];
      const view = theme.getView(name);

      if (view) {
        log.debug(`Rendering HTML ${name}: ${magenta(path)}`);
        return view.render(locals)
          .then(result => ctx.extend.injector.exec(result, locals))
          .then(result => ctx.execFilter('_after_html_render', result, {
            context: ctx,
            args: [locals]
          }))
          .tap(result => {
            if (useCache) {
              routeCache.set(generatorResult, result);
            }
          }).tapCatch(err => {
            log.error({ err }, `Render HTML failed: ${magenta(path)}`);
          });
      }
    }

    log.warn(`No layout: ${magenta(path)}`);
  };
};

然后当route.set()之后，会触发update事件。

然后当我们执行npm run start的时候，会在lib/plugins/console/generate.js中执行下面的代码：

execWatch() {
    const { route, log } = this.context;
    return this.context.watch().then(() => this.firstGenerate()).then(() => {
        log.info('Hexo is watching for file changes. Press Ctrl+C to exit.');

        // Watch changes of the route
        route.on('update', path => {
        const modified = route.isModified(path);
        if (!modified) return;

        this.generateFile(path);
        }).on('remove', path => {
        this.deleteFile(path);
        });
    });
}

然后此处就会去调用generateFile方法，这个其实就是前面的view.render方法的方法，然后去写入到文件中了。

如果是build也就是generate，那就走firstGenerate方法，把route什么的都编译生成一遍，内部会去check一下cache，是否需要生成。

四、关于tag怎么生效

我们代码里面会自定义tag:

hexo.extend.tag.register('tabs', require('./lib/tabs')(hexo), true)
hexo.extend.tag.register('ablock', require('./lib/ablock')(hexo), true)
hexo.extend.tag.register('about', require('./lib/about')(hexo), true)
hexo.extend.tag.register('folding', require('./lib/folding')(hexo), true)
hexo.extend.tag.register('folders', require('./lib/folders')(hexo), true)
hexo.extend.tag.register('grid', require('./lib/grid')(hexo), true)
hexo.extend.tag.register('swiper', require('./lib/swiper')(hexo), true)

那这种怎么生效的呢？

首先它是通过 Nunjucks 实现，代码路径在: hexo/lib/extend/tag.js

const { Environment } = require('nunjucks');

this.env = new Environment(null, {
    autoescape: false
});

this.env.addExtension(name, tag);

这块我再找找，怎么给加上去的，目前只看到view.render(locals)产出的是已经处理过的，所以少了一层什么地方在搞这个nunjucks的处理

五、总结

这块基本是hexo的核心，基本是靠watch文件变化和EventEmitter进行事件传递来传递去解决。

六、Hexo专栏

目前 Hexo 系列形成了一个小专栏，欢迎读者继续阅读: Hexo专栏地址

原创文章，作者：金炳，如若转载，请注明出处

本文链接: https://blog.fedfans.com/2023/03/28/hexo-markdown-render/