上一篇说到,我们 debug 了 FcConfigSubstitute
和 FcConfigReference
,最后找到了 FcFileScanFontConfig
,一番魔改后发现 <match target="scan">
虽然应用了,但是没有影响最终 FontSet 中 “Noto Scan CJK SC” 的 charset。
我们这篇要继续 debug:
FcConfigSubstitute (config, font, FcMatchScan)
FcFileScanFontConfig
中的这段代码。
无踪无迹的指针
我们知道,上面的 font 是一个指针。要想前后输出有所变化,是一定要修改这个指针指向的数据的。也就是说,我们只需要去 FcConfigSubsitute
里找修改指针指向数据的部分就可以了。所以我们只需要关注应用规则集时候的 case FcRuleEdit
里 case FcOpAssign
的部分就好。
通过 if (value[object])
这个判断我们找到了 test 跟 edit 的联系,原来是用 test 去找到要修改的部分:
/* different 'kind' won't be the target of edit */
if (!value[object] && kind == r->u.test->kind)
value[object] = vl;
但是剩下的都是 FcConfigAdd
和 FcConfigDel
是修改 config
的,并不直接修改 Font。
正当我措手无策的时候,无意间看到了下面:
case FcOpAssignReplace:
/*
* Delete all of the values and insert
* the new set
*/
FcConfigPatternDel (p, r->u.edit->object, table);
FcConfigPatternAdd (p, r->u.edit->object, l, FcTrue, table);
/*
* Adjust a pointer into the value list as they no
* longer point to anything valid
*/
value[object] = NULL;
break;
这家伙是修改 pattern
的啊!
语焉不详的文档
大家一定看到过文档中的这段:
Mode With Match Without Match
---------------------------------------------------------------------
"assign" Replace matching value Replace all values
"assign_replace" Replace all values Replace all values
"prepend" Insert before matching Insert at head of list
"prepend_first" Insert at head of list Insert at head of list
"append" Append after matching Append at end of list
"append_last" Append at end of list Append at end of list
"delete" Delete matching value Delete all values
"delete_all" Delete all values Delete all values
看起来 assign
和 assign_replace
没差。
但实际上差多了!
我把我的代码换成:
<match target="scan">
<test name="family">
<string>Noto Sans CJK SC</string>
</test>
<edit name="charset" mode="assign_replace">
<minus>
<name>charset</name>
<charset>
<int>0x203c</int>
</charset>
<charset>
<int>0x2122</int>
</charset>
</minus>
</edit>
</match>
再跑:
Noto Sans CJK SC
before match scan: Yes
charset count: 44810
after match scan: Yes
charset count: 44809
成功去掉了‼字符!
于是:
如果是 <match target="scan">
这种需要调整 pattern
的,用 assign_replace
,如果是只调整 config
就可以的,比如 <match target="pattern|font">
,用 assign
。
另外,在 fontconfig 里,万物皆 pattern
,不要被 <match target="pattern">
迷惑了,实际上它代码中的 pattern:
struct _FcPattern {
int num;
int size;
intptr_t elts_offset;
FcRef ref;
};
就是指针的 struct。而对于 FcPatternElt
的注释也说了:
/*
* Pattern elts are stuck in a structure connected to the pattern,
* so they get moved around when the pattern is resized. Hence, the
* values field must be a pointer/offset instead of just an offset
*/
这是阅读源代码才得到的金科玉律啊!也就是说,别人说的,Firefox/Chromium 不支持调整 charset
,是不对的,只是我们没会用 fontconfig,不要甩锅给 chromium 了!
为什么只去掉了一个字符
因为程序里写法错了,正确的写法是:
<match target="scan">
<test name="family">
<string>Noto Sans CJK SC</string>
</test>
<edit name="charset" mode="assign_replace">
<minus>
<name>charset</name>
<charset>
<int>0x203c</int>
<int>0x2122</int>
<range>
<int>0x1f250</int>
<int>0x1f251</int>
</range>
</charset>
</minus>
</edit>
</match>
只有一个 <charset></charset>
block。多余的会被丢弃。
我的问题解决了。下一步就是修改我的代码去了,顺便给大家一个去掉系统上已安装字体中 emoji 字符的 c 文件吧。因为 fontconfig 关于 FcCharSet*
相关方法的 Public 函数里没有 FcCharSetIterStart
和 FcCharSetIterNext
,没有办法 loop charset 里的 leaf 并打印。 也没有 FcNameUnparseCharSet
,没有办法直接打印 charset
。我是魔改了 fontconfig.h
、src/fcint.h
写出来的,需要你重新编译 fontconfig。
#include<stdio.h>
#include<fontconfig/fontconfig.h>
int main(int argc, char **argv){
FcFontSet* fs;
FcPattern* pat;
FcObjectSet* os;
FcChar8* strpat = (FcChar8*)":lang=und-zsye";
pat = FcNameParse(strpat);
os = FcObjectSetBuild(FC_FAMILY, FC_CHARSET, FC_FILE, (char*)0);
fs = FcFontList(0, pat, os);
FcPatternDestroy(pat);
FcObjectSetDestroy(os);
FcCharSet* cs = FcCharSetCreate();
int i;
for(i=0; i<fs->nfont; i++){
FcCharSet* tmp;
if (FcPatternGetCharSet(fs->fonts[i], FC_CHARSET, 0, &tmp) != FcResultMatch) {
goto nofont;
}
cs = FcCharSetUnion(cs, tmp);
printf("%d\n", FcCharSetCount(cs));
}
FcFontSetDestroy(fs);
FcPattern* pat1;
FcFontSet* fs1;
FcObjectSet* os1;
if (argc == 1) {
printf("you have to specify a font family to subtract!\n);
goto nofont;
}
FcChar8* strpat1 = (FcChar8*)argv[1];
pat1 = FcNameParse(strpat1);
os1 = FcObjectSetBuild(FC_FAMILY, FC_CHARSET, FC_FILE, (char*)0);
fs1 = FcFontList(0, pat1, os1);
FcPatternDestroy(pat1);
FcObjectSetDestroy(os1);
FcCharSet* cs1 = FcCharSetCreate();
for(i=0; i<fs1->nfont; i++){
if (FcPatternGetCharSet(fs1->fonts[i], FC_CHARSET, 0, &cs1) != FcResultMatch) {
goto nofont;
}
}
FcFontSetDestroy(fs1);
FcCharSet* cs2 = FcCharSetCreate();
cs2 = FcCharSetIntersect(cs1, cs);
FcStrBuf buf;
FcChar8 init_buf[1024];
FcStrBufInit(&buf, init_buf, sizeof(init_buf));
if (FcNameUnparseCharSet(&buf, cs) && FcStrBufChar(&buf, '\0')) {
printf("%s\n", buf.buf);
} else {
printf("charset (alloc error).\n");
}
FcStrBufDestroy(&buf);
FcCharSetDestroy(cs);
FcCharSetDestroy(cs1);
FcCharSetDestroy(cs2);
return 0;
nofont:
return 1;
}
使用方法:下载 fc-emoji-subtract.patch:
git clone https://github.com/freedesktop/fontconfig
patch -p1 < ../fc-emoji-subtract.patch
然后正常编译就可以,之后 fontconfig/fc-emoji-subtract
有一个 fc-emoji-subtract
,用法:
$ fc-emoji-subtract "Noto Sans CJK SC"
20 23 2a 30-39 a9 ae 203c 2049 2122 2194-2199 24c2 25aa-25ab 25b6 25c0 2600-2603 260e 261d 262f 2640 2642 2660 2663 2665-2666 2668 267b 26a0 26bd-26be 2702 27a1 2934-2935 2b05-2b07 3030 303d 3297 3299 1f170-1f171 1f17e-1f17f 1f18e 1f191-1f19a 1f201-1f202 1f21a 1f22f 1f232-1f23a 1f250-1f251
你就可以针对这些 charset
自己写规则了。
其他我在第一部分抛出问题的答案
首先,FcConfigSubstitute
删除 charset leaf,如果你是显式的用,就是在 FcListPatternMatchAny
阶段,因为 config
里无论你是 assign
还是 assign_replace
都认的。如果你是隐式的用,那就是在 FcFileScanFontConfig
的时候,但是必须用 assign_replace
。
其次,FcConfigSubstitue
只会做 <match target="pattern">
,如果你的第一个参数也就是 config
传 0 会做 <match target="scan">
。这是我验证过的。而 <match target="font">
只会在一个地方做:FcFontRenderPrepare
(可以搜 FcMatchFont
查到)。另外 Chromium 的 GetFontRenderParamsFromFcPattern
是没有调用 FcFontRenderPrepare
的,它只会取字体默认的 aa,autohint, embedded_bitmap, hinting 和 rgba。所以你们批评它也没错。
第三,双猫批评的 chromium 只返回一个 Fallback Font 的问题,确实存在,大概是在 ui/gfc/render_text_harfbuzz.cc#RenderTextHarfBuzz::ShapeRuns:
const Font& primary_font = font_list().GetPrimaryFont();
// Find fallback fonts for the remaining runs using a worklist algorithm. Try
// to shape the first run by using GetFallbackFont(...) and then try shaping
// other runs with the same font. If the first font can't be shaped, remove it
// and continue with the remaining runs until the worklist is empty. The
// fallback font returned by GetFallbackFont(...) depends on the text of the
// run and the results may differ between runs.
std::vector<internal::TextRunHarfBuzz*> remaining_unshaped_runs;
while (!runs.empty()) {
Font fallback_font(primary_font);
bool fallback_found;
internal::TextRunHarfBuzz* current_run = *runs.begin();
{
SCOPED_UMA_HISTOGRAM_LONG_TIMER("RenderTextHarfBuzz.GetFallbackFontTime");
TRACE_EVENT1("ui", "RenderTextHarfBuzz::GetFallbackFont", "script",
TRACE_STR_COPY(uscript_getShortName(font_params.script)));
const base::StringPiece16 run_text(&text[current_run->range.start()],
current_run->range.length());
fallback_found =
GetFallbackFont(primary_font, locale_, run_text, &fallback_font);
}
if (fallback_found) {
internal::TextRunHarfBuzz::FontParams test_font_params = font_params;
if (test_font_params.SetRenderParamsOverrideSkiaFaceFromFont(
fallback_font, fallback_font.GetFontRenderParams()) &&
!FontWasAlreadyTried(test_font_params.skia_face,
&fallback_fonts_already_tried)) {
ShapeRunsWithFont(text, test_font_params, &runs);
MarkFontAsTried(test_font_params.skia_face,
&fallback_fonts_already_tried);
}
}
// Remove the first run if not fully shaped with its associated fallback
// font.
if (!runs.empty() && runs[0] == current_run) {
remaining_unshaped_runs.push_back(current_run);
runs.erase(runs.begin());
}
}
runs.swap(remaining_unshaped_runs);
if (runs.empty()) {
RecordShapeRunsFallback(ShapeRunFallback::FALLBACK);
return;
}
std::vector<Font> fallback_font_list;
{
SCOPED_UMA_HISTOGRAM_LONG_TIMER("RenderTextHarfBuzz.GetFallbackFontsTime");
TRACE_EVENT1("ui", "RenderTextHarfBuzz::GetFallbackFonts", "script",
TRACE_STR_COPY(uscript_getShortName(font_params.script)));
fallback_font_list = GetFallbackFonts(primary_font);
// Use a set to track the fallback fonts and avoid duplicate entries.
SCOPED_UMA_HISTOGRAM_LONG_TIMER(
"RenderTextHarfBuzz.ShapeRunsWithFallbackFontsTime");
TRACE_EVENT1("ui", "RenderTextHarfBuzz::ShapeRunsWithFallbackFonts",
"fonts_count", fallback_font_list.size());
// Try shaping with the fallback fonts.
for (const auto& font : fallback_font_list) {
std::string font_name = font.GetFontName();
FontRenderParamsQuery query;
query.families.push_back(font_name);
query.pixel_size = font_params.font_size;
query.style = font_params.italic ? Font::ITALIC : 0;
FontRenderParams fallback_render_params = GetFontRenderParams(query, NULL);
internal::TextRunHarfBuzz::FontParams test_font_params = font_params;
if (test_font_params.SetRenderParamsOverrideSkiaFaceFromFont(
font, fallback_render_params) &&
!FontWasAlreadyTried(test_font_params.skia_face,
&fallback_fonts_already_tried)) {
ShapeRunsWithFont(text, test_font_params, &runs);
MarkFontAsTried(test_font_params.skia_face,
&fallback_fonts_already_tried);
}
if (runs.empty()) {
TRACE_EVENT_INSTANT2("ui", "RenderTextHarfBuzz::FallbackFont",
TRACE_EVENT_SCOPE_THREAD, "font_name",
TRACE_STR_COPY(font_name.c_str()),
"primary_font_name", primary_font.GetFontName());
RecordShapeRunsFallback(ShapeRunFallback::FALLBACKS);
return;
}
}
for (internal::TextRunHarfBuzz*& run : runs) {
if (run->shape.missing_glyph_count == std::numeric_limits<size_t>::max()) {
run->shape.glyph_count = 0;
run->shape.width = 0.0f;
}
}
RecordShapeRunsFallback(ShapeRunFallback::FAILED);
}
如果我没看错的话,它是先 GetFallbackFont
取一个字体,不管用才会 GetFallbackFonts
取全部挨个试。而 GetFallbackFont
我之前后面的代码没贴:
// Try each font in the cache to find the one with the highest coverage.
size_t fewest_missing_glyphs = text.length() + 1;
const FallbackFontEntry* prefered_entry = nullptr;
for (const auto& entry : cache_entry->second) {
// Validate that every character has a known glyph in the font.
size_t missing_glyphs = 0;
size_t matching_glyphs = 0;
size_t i = 0;
while (i < text.length()) {
UChar32 c = 0;
U16_NEXT(text.data(), i, text.length(), c);
if (entry.HasGlyphForCharacter(c)) {
++matching_glyphs;
} else {
++missing_glyphs;
}
}
if (matching_glyphs > 0 && missing_glyphs < fewest_missing_glyphs) {
fewest_missing_glyphs = missing_glyphs;
prefered_entry = &entry;
}
// The font has coverage for the given text and is a valid fallback font.
if (missing_glyphs == 0)
break;
}
// No fonts can be used as font fallback.
if (!prefered_entry)
return false;
sk_sp<SkTypeface> typeface = GetSkTypefaceFromPathAndIndex(
prefered_entry->font_path(), prefered_entry->ttc_index());
// The file can't be parsed (e.g. corrupt). This font can't be used as a
// fallback font.
if (!typeface)
return false;
Font fallback_font(PlatformFont::CreateFromSkTypeface(
typeface, font.GetFontSize(), prefered_entry->font_params()));
*result = fallback_font;
return true;
}
它会针对某段 text
找一个 charset coverage 最高的 font,并不是针对某一个 glyph 找字体,所以就会受到 text
本身的影响。这可能是 google 为了 balance 速度想到的方法吧。
于是这个系列就结束了。也许下一篇会谈谈 FcFontSort
?