2021-01-25

C# で MeCab.DotNet を使って形態素解析

インストール

NuGet で「MeCab.DotNet」をインストールします。 MeCab.DotNet は「MeCab」、「NMeCab」を .NET Core に移植したパッケージです。

以下パッケージサイトから抜粋。

"MeCab" は、日本語形態素解析エンジンのプロジェクトです。

"NMeCab" は、上記MeCabを、.NET Framework 2.0のマネージライブラリとして実装し直したものです。ただ、もう更新されていないようです...

"MeCab.DotNet" （このプロジェクト）は、上記NMeCabを最新の.NET Core 1/2/3と.NET Frameworkで使えるように移植し、NuGetのパッケージに固めて使いやすくしたものです。

なお、ライセンスは GPL2 または LGPL2.1 とのことで利用に際しては注意が必要です。

サンプル実装

WPFのサンプル実装です。

XAML

<Window x:Class="Samples.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        xmlns:local="clr-namespace:MeCab"
        mc:Ignorable="d"
        Title="MainWindow" Height="450" Width="800">
    <Grid>
        <TextBox x:Name="textbox" VerticalAlignment="Top" HorizontalAlignment="Left" Margin="10,10,150,0" TextWrapping="Wrap" Width="600" AcceptsReturn="True"/>
        <Button Content="解析" VerticalAlignment="Top" HorizontalAlignment="Left" Margin="620,10,0,0" Width="75" Click="Button_Click"/>
        <TextBlock x:Name="result" VerticalAlignment="Top" HorizontalAlignment="Left" Margin="10,40,10,0" />
    </Grid>
</Window>

コードビハインド

using MeCab;
using System.Windows;

namespace Samples
{
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
        }

        private void Button_Click(object sender, RoutedEventArgs e)
        {
            var tagger = MeCabTagger.Create();

            result.Text = "";
            foreach(var node in tagger.ParseToNodes(textbox.Text))
            {
                if (0 < node.CharType)
                {
                    result.Text += $"{node.Surface}\t{node.Feature}\r\n";
                }
            }
        }
    }
}

実行結果は次の通り。 f:id:yotiky:20210125165517p:plain

2021-01-25

Azure Cognitive Services - Text Analytics の試し打ち

Azure

TL;DR

名前付きエンティティの認識は、人、場所、組織などを抽出できる
日本語では Person、Location、Organization のみ解析可能で、英語に比べるといまいちの精度
個人を特定できる情報の検出は、個人情報と医療情報を検出する機能が提供されているが現在英語とスペイン語のみ
Azure Cognitive Search は、クラウドのストレージを高度に検索するサービス
インデックス作成で Analyzer を使用して形態素解析を行うが、検証目的でAPIが用意されている

名前付きエンティティの認識 (NER)

API : text/analytics/v3.1-preview.3/entities/recognition/general

NER v3 では英語とスペイン語のみがサポートされている。日本語は v2 が代替として動作し、"Person"、"Location"、"Organization"のみ返却される。

すべてのカテゴリは、固有表現認識でサポートされるカテゴリで確認できる。

En

Nice to meet you. Yamada is my family name and Taro is my given name. I'm from Osaka, Japan. I was born on April 14th in 1973.

Request

{ documents: [
        { 
            id: "1", 
            language: "en", 
            text: "Nice to meet you. Yamada is my family name and Taro is my given name. I'm from Osaka, Japan. I was born on April 14th in 1973."
        }
    ]
}

Response

    "documents": [
        {
            "id": "1",
            "entities": [
                {
                    "text": "Yamada",
                    "category": "Person",
                    "offset": 18,
                    "length": 6,
                    "confidenceScore": 0.52
                },
                {
                    "text": "Taro",
                    "category": "Person",
                    "offset": 47,
                    "length": 4,
                    "confidenceScore": 0.76
                },
                {
                    "text": "Osaka",
                    "category": "Location",
                    "subcategory": "GPE",
                    "offset": 79,
                    "length": 5,
                    "confidenceScore": 0.8
                },
                {
                    "text": "Japan",
                    "category": "Location",
                    "subcategory": "GPE",
                    "offset": 86,
                    "length": 5,
                    "confidenceScore": 0.55
                },
                {
                    "text": "April 14th",
                    "category": "DateTime",
                    "subcategory": "Date",
                    "offset": 107,
                    "length": 10,
                    "confidenceScore": 0.8
                },
                {
                    "text": "1973",
                    "category": "DateTime",
                    "subcategory": "DateRange",
                    "offset": 121,
                    "length": 4,
                    "confidenceScore": 0.8
                }
            ],
            "warnings": []
        }

Ja

はじめまして。山田が苗字で、太郎が名前です。日本の大阪出身です。私は、1973年4月14日生まれです。

Request

{ documents: [
        { 
            id: "1", 
            language: "ja", 
            text: "はじめまして。 山田が苗字で、太郎が名前です。 日本の大阪出身です。 私は、1973年4月14日生まれです。"
        }
    ]
}

Response

    "documents": [
        {
            "id": "1",
            "entities": [
                {
                    "text": "山田",
                    "category": "Person",
                    "offset": 8,
                    "length": 2,
                    "confidenceScore": 0.66
                },
                {
                    "text": "日本",
                    "category": "Location",
                    "subcategory": "GPE",
                    "offset": 24,
                    "length": 2,
                    "confidenceScore": 0.98
                },
                {
                    "text": "大阪",
                    "category": "Location",
                    "subcategory": "GPE",
                    "offset": 27,
                    "length": 2,
                    "confidenceScore": 0.98
                }
            ],
            "warnings": []
        }

個人を特定できる情報の検出

API : /text/analytics/v3.1-preview.3/entities/recognition/pii

NER v3.1 のプレビューでは、個人情報 (PII) と医療情報 (PHI) を検出する機能が備わっている。 APIは現在、英語とスペイン語のみで提供されている。

抽出されるカテゴリは、固有表現認識でサポートされるカテゴリで確認できる。

日本

日本の銀行口座番号

日本の運転免許証番号

日本の個人マイナンバー

日本の法人マイナンバー

日本の住民票コード

日本の在留カード番号

日本の社会保険番号 (SIN)

日本のパスポート番号

En

Nice to meet you. Yamada is my family name and Taro is my given name. I'm from Osaka, Japan. I was born on April 14th in 1973. My number is 345-6890. My bank account number is 1234567.

Request

{ documents: [
        { 
            id: "1", 
            language: "en", 
            text: "Nice to meet you. Yamada is my family name and Taro is my given name. I'm from Osaka, Japan. I was born on April 14th in 1973."
        },
        { 
            id: "2", 
            language: "en", 
            text: "My number is 345-6890. My bank account number is 1234567."
        }
    ]
}

Response

    "documents": [
        {
            "redactedText": "Nice to meet you. ****** is my family name and **** is my given name. I'm from Osaka, Japan. I was born on ********** in 1973.",
            "id": "1",
            "entities": [
                {
                    "text": "Yamada",
                    "category": "Person",
                    "offset": 18,
                    "length": 6,
                    "confidenceScore": 0.52
                },
                {
                    "text": "Taro",
                    "category": "Person",
                    "offset": 47,
                    "length": 4,
                    "confidenceScore": 0.76
                },
                {
                    "text": "April 14th",
                    "category": "DateTime",
                    "subcategory": "Date",
                    "offset": 107,
                    "length": 10,
                    "confidenceScore": 0.8
                }
            ],
            "warnings": []
        },
        {
            "redactedText": "My number is ********. My bank account number is ********",
            "id": "2",
            "entities": [
                {
                    "text": "345-6890",
                    "category": "Phone Number",
                    "offset": 13,
                    "length": 8,
                    "confidenceScore": 0.8
                },
                {
                    "text": "1234567.",
                    "category": "Japan Bank Account Number",
                    "offset": 49,
                    "length": 8,
                    "confidenceScore": 0.85
                }
            ],
            "warnings": []
        }
    ],

Ja

未対応

キーフレーズの抽出

API : /text/analytics/v3.1-preview.3/keyPhrases

En

Request

{ documents: [
        { 
            id: "1", 
            language: "en", 
            text: "Nice to meet you. Yamada is my family name and Taro is my given name. I'm from Osaka, Japan. I was born on April 14th in 1973."
        }
    ]
}

Response

    "documents": [
        {
            "id": "1",
            "keyPhrases": [
                "family",
                "Taro",
                "Yamada",
                "Osaka",
                "Japan"
            ],
            "warnings": []
        }
    ],

Ja

Request

{ documents: [
        { 
            id: "1", 
            language: "ja", 
            text: "はじめまして。 山田が苗字で、太郎が名前です。 日本の大阪出身です。 私は、1973年4月14日生まれです。"
        }
    ]
}

Response

    "documents": [
        {
            "id": "1",
            "keyPhrases": [
                "苗字",
                "太郎",
                "日本",
                "大阪出身",
                "山田",
                "名前",
                "まし",
                "生まれ"
            ],
            "warnings": []
        }
    ],

Azure Cognitive Search

Azure Cognitive Search 自体はクラウドのストレージを高度に検索するためのサービス。
インデックスを作成する際に使われる Analyzer で形態素解析を行っており、Analyzer の検証用に Analyzer API が用意されている。

docs.microsoft.com

blog.johtani.info

2021-01-24

Unity - よく使うパッケージ

Unity HoloLens2

UPM

manifest.json の dependencies に追加する。

UniRX

"com.neuecc.unirx": "https://github.com/neuecc/UniRx.git?path=Assets/Plugins/UniRx/Scripts",

UniTask

"com.cysharp.unitask": "https://github.com/Cysharp/UniTask.git?path=src/UniTask/Assets/Plugins/UniTask",

Extenject

"com.svermeulen.extenject": "https://github.com/svermeulen/Extenject.git?path=UnityProject/Assets/Plugins/Zenject",

VContainer

"jp.hadashikick.vcontainer": "https://github.com/hadashiA/VContainer.git?path=VContainer/Assets/VContainer#1.4.3",

Unity Package

package をダウンロードしてプロジェクトにインポートする。

Asset Store

Asset Store からインポートする。

LINQ to GameObject

LINQ to GameObject | Integration | Unity Asset Store

2021-01-22

Unity - TextMeshProで日本語を表示する

HoloLens2 HoloLens Unity

フォントのインポート

使用する日本語フォントを準備し、Unityにインポートする。

Windows系

商用利用はしない方が良さそう。*1

Meiryo UI

Windows 8 使われていたシステムフォントで、C:\Windows\Fontsに含まれている。

f:id:yotiky:20210122125209p:plain:w400

Yu Gothic UI

Windows 10 で使われているシステムフォントで、C:\Windows\Fontsに含まれている。

f:id:yotiky:20210122135639p:plain:w400

Google Fonts

Google が提供しているフォント。 Kosugi / Kosugi Maru は Apache License 2.0、ほかはSILライセンス。
個人利用、商用利用可能なフリーライセンスで、ライセンス表記が必要。

fonts.google.com

Google Fonts からフォントを選んでダウンロード。ttf と otf があるがどちらも使用可能。

f:id:yotiky:20210122153719p:plain:w400

日本語のフォントは11書体34種類。

Yusei Magic
Potta One
Hachi Maru Pop
Noto Sans JP
Noto Serif JP
M PLUS 1p
M PLUS Rounded 1c
Sawarabi Gothic
Sawarabi Mincho
Kosugi
Kosugi Maru

f:id:yotiky:20210122154758p:plain:w400

qiita.com

Fontworks

最近話題になった Fontworks が Google Fonts に提供したフォント。

SILライセンス。
個人利用、商用利用可能なフリーライセンスで、ライセンス表記が必要。

github.com

Google Fonts ではまた提供されていないらしいので Fontworks から ttf ファイルもしくはコードまるごとダウンロード。

f:id:yotiky:20210122144803p:plain:w300

日本語のフォントは7書体8種類。

クレー One Regular
クレー One SemiBold
トレイン One Regular
ロックンロール One Regular
ステッキ Regular
ランパート One Regular
レゲエ One Regular
ドットゴシック16 Regular

f:id:yotiky:20210122150634p:plain:w400

coliss.com

M+

Google Fonts で公開されている M+ は、和文 Type-1 と欧文 P Type-1 の組み合わせである「M+ P Type-1」と、丸ゴシックタイプの和文 Rounded M+ と欧文 C Type-1 の組み合わせである「M PLUS Rounded 1c」。

以下のサイトでは、和文2種類、欧文4種の組み合わせで構成されたフォント11種類がダウンロードできる。「M PLUS Rounded 1c」は含まれない。

mplus-fonts.osdn.jp

ライセンス

ライセンスやライセンスの表記について参考になりそうなサイト。

Font Asset を作成

FontAssetCreatorを開く。

f:id:yotiky:20210122160442p:plain:w300

Unity にインポートしたフォントを選択、Character List には Adobe-Japan1-0 を設定したら、ジェネレートして保存。

f:id:yotiky:20210122161022p:plain

blog.kyubuns.dev

Font を使用

TextMesPro の FontAsset にフォントを設定する。

f:id:yotiky:20210122162406p:plain

表示例

Windows系

MEIRYO
YUGOTHR

Google Fonts

YuseiMagic-Regular
PottaOne-Regular
HachiMaruPop-Regular
NotoSansJP-Regular
NotoSerifJP-Regular
MPLUS1p-Regular
MPLUSRounded1c-Regular
SawarabiGothic-Regular
SawarabiMincho-Regular

※2,250字の漢字しか含まれていないため「融」が出ていない。

Kosugi-Regular
KosugiMaru-Regular

Fontworks

KleeOne-Regular
TrainOne-Regular
RocknRollOne-Regular
Stick-Regular
RampartOne-Regular
ReggaeOne-Regular
DotGothic16-Regular

参考

本当に使える！TextMeshProでの「日本語」「多言語」対応方法 - きゅぶろぐ

yotiky.hatenablog.com

*1:Windowsの標準フォントってどこまで“タダ”なの？～最新の状況をまとめたブログ記事が人気 - やじうまの杜 - 窓の杜

2021-01-21

HoloLens2 の開発環境に追加するモジュール

HoloLens2

Visual Studio 2019
- C++ によるデスクトップ開発
- ユニバーサル Windows プラットフォーム開発
  - USBデバイスの接続
- Unity によるゲーム開発
Unity
- Universal Windows Platform Build Support

2020-11-30

C# 8.0 の主な新機能

.NET .NET Core

リリース時期

Visual Studio 2019 (16.3)
.NET Core 3.0
.NET Standard 2.1

null 許容参照型

機能

    // C# 7.3 以前は警告が出ない
    string s = null;
    Debug.WriteLine(s.Length);

    // C# 8.0 null 許容参照型を有効にした場合
    // null 非許容参照型に null を入れると警告が出る （宣言時に出る警告は annotations と呼ばれる
    // CS8600  Null リテラルまたは Null の可能性がある値を Null 非許容型に変換しています。
    string s1 = null;
    string s2 = "";
    string? s3 = null;

f:id:yotiky:20201130014844p:plain

    // null 非許容参照型に null を入れていると警告が出る (使用時に出る警告は warnings と呼ばれる
    // CS8602  null 参照の可能性があるものの逆参照です。
    Debug.WriteLine(s1.Length);
    
    // nullable でない型に null を入れてなければ警告は出ない
    Debug.WriteLine(s2.Length);
    
    // nullable の型を null チェックせずに使うと警告が出る
    if (s3 != null)
    {
        // ここでは null じゃないことが確定するので警告が出ない
        Debug.WriteLine(s3.Length);
    }
    // ここでは null の可能性がある
    // CS8602  null 参照の可能性があるものの逆参照です。
    Debug.WriteLine(s3.Length);
    // 前行が通れば null じゃないことが確定するので警告が出ない
    Debug.WriteLine(s3.Length);

f:id:yotiky:20201130015147p:plain:w350

f:id:yotiky:20201130015235p:plain:w350

有効化/無効化

csproj ファイルにオプションを指定する

プロジェクト全体で適用される

f:id:yotiky:20201130001457p:plain

#nullable enable|disable|restore [warnings|annotations]

有効にしたコードだけ適用される

#nullable enable
    // [warnings|annotations] を指定しなければ両方に対して
    // 有効化されているので警告が出る
    string s1 = null;
    Debug.WriteLine(s1.Length);

#nullable disable
    // 無効化されているので警告が出ない
    string s2 = null;
    Debug.WriteLine(s2.Length);

#nullable enable
#nullable restore
    // restore はプロジェクトの設定(無効)に戻す
    string s3 = null;
    Debug.WriteLine(s3.Length);


    // null 非許容参照型が無効の場合警告が出る
    // CS8632  '#nullable' 注釈コンテキスト内のコードでのみ、Null 許容参照型の注釈を使用する必要があります。
    string? s4 = null;
    Debug.WriteLine(s4.Length);

#nullable enable warnings
    // warnings を有効にすると変数参照時の警告が出る
    // annotations は無効のままなので nullable の変数宣言時の警告が出る
    string s5 = null;
    string? s6 = null;
    Debug.WriteLine(s5.Length);
    Debug.WriteLine(s6.Length);
#nullable disable warnings

#nullable enable annotations
    // annotations を有効にすると nullable の変数宣言時に警告が出ない
    // warnings は無効のままなので変数参照時の警告は出ない
    string s7 = null;
    string? s8 = null;
    Debug.WriteLine(s7.Length);
    Debug.WriteLine(s8.Length);
#nullable disable annotations

f:id:yotiky:20201130015631p:plain

! (null 免除) 演算子

    // null 非許容参照型に null を入れると警告が出る
    string s1 = null;
    // null 非許容参照型に ! 演算子を使うと一時的に null を許可することができる
    string s2 = null!;
    string? s3 = null;

    Debug.WriteLine(s1.Length);
    Debug.WriteLine(s2.Length);
    // null 許容参照型に ! 演算子を使うと許容性を無視するので警告が出ない
    Debug.WriteLine(s3!.Length);

f:id:yotiky:20201130015829p:plain:w350

フィールドやプロパティでの利用

    // null 非許容参照型をフィールドやプロパティで使う場合は、宣言時かコンストラクタで初期化しないと警告が出る
    public string X { get; set; } = "";
    // null 非許容参照型に一時的に null を許可する場合は !演算子 が使える
    public string Y = null!;

null 許容値型との違い

    private void NullableValueType(DateTime? x)
    {
        // null チェックしても Nullable<T> なのでエラー
        if (x is null) { return; }
        // CS1061  'DateTime?' に 'Minute' の定義が含まれておらず、型 'DateTime?' の最初の引数を受け付けるアクセス可能な拡張メソッド 'Minute' が見つかりませんでした。using ディレクティブまたはアセンブリ参照が不足していないことを確認してください
        //Debug.WriteLine(x.Minute);
    }
    private void NullableReferenceType(string? x)
    {
        // null チェックすれば警告が出ない
        if (x is null) { return; }
        Debug.WriteLine(x.Length);
    
        // 内部的に同じ型なので typeof は使えない
        // CS8639  NULL 許容参照型では typeof 演算子を使用できません
        //var t = typeof(string?);
    }
    
    private void NullableValueType(DateTime x) { }
    // null 許容値型は型が違うのでオーバーロードできるが null 許容参照型は型が同じなのでオーバーロードできない
    //private void NullableReferenceType(string x) { }　// エラー

建設中

再帰パターン
switch 式
範囲アクセス
インターフェイスのデフォルト実装
非同期ストリーム
using ステートメントの改善
- using 変数宣言
- パターンベースな using
その他
- null 合体代入 (??=)
- 静的ローカル関数
- @$
- アンマネージなジェネリック構造体
- readonly 関数メンバー
- 式中の stackalloc
- ジェネリック型に対する is null
- プロパティのアクセサーに Obsolete 指定

コード置き場

github.com

2020-11-24

C# の新機能

.NET .NET Core Unity

Unity の対応表

Unity	Runtime version (Equivalent)	C# version	Compiler
2017	.NET 3.5	C# 4.0	mcs (Mono)
	.NET 4.6	C# 6.0	mcs (Mono)
2018.1 / 2018.2	.NET 3.5	C# 4.0	mcs (Mono)
	.NET 4.6	C# 6.0	mcs (Mono)
2018.3 / 2018.4	.NET 3.5	C# 4.0	mcs (Mono)
	.NET 4.6	C# 7.3	Roslyn
2019.1	.NET 3.5	C# 4.0	mcs (Mono)
	.NET 4.6	C# 7.3	Roslyn
2019.2 / 2019.3 / 2019.4	.NET 4.6	C# 7.3	Roslyn
2020.1	.NET 4.6	C# 7.3	Roslyn
2020.2	.NET 4.6	C# 8.0	Roslyn

目次

インストール

サンプル実装

関連記事

目次

TL;DR

名前付きエンティティの認識 (NER)

En

Request

Response

Ja

Request

Response

個人を特定できる情報の検出

En

Request

Response

Ja

キー フレーズの抽出

En

Request

Response

Ja

Request

Response

Azure Cognitive Search

UPM

UniRX

UniTask

Extenject

VContainer

Unity Package

MessagePack for C

Utf8Json

ZString

ZLogger

MasterMemory

Ulid

Asset Store

LINQ to GameObject

目次

フォントのインポート

Windows系

Meiryo UI

Yu Gothic UI

Google Fonts

Fontworks

M+

ライセンス

Font Asset を作成

Font を使用

表示例

Windows系

Google Fonts

Fontworks

参考

関連記事

目次

リリース時期

null 許容参照型

機能

有効化/無効化

csproj ファイルにオプションを指定する

#nullable enable|disable|restore [warnings|annotations]

! (null 免除) 演算子

フィールドやプロパティでの利用

null 許容値型との違い

建設中

コード置き場

目次

Unity の対応表

キーフレーズの抽出