Analyzing Sentiment in Classical Chinese Poetry

Yufang Hou Anette Frank

Institute for Computational Linguistics, Heidelberg University, Germany

Abstract

Although sentiment analysis in Chinese social media has attracted a lot of in- terest in recent years, it has been less explored in traditional Chinese literature (e.g., classical Chinese poetry) due to the lack of sentiment lexicon resources. In this paper, we propose a weakly super- vised approach based on Weighted Person- alized PageRank (WPPR) to create a sen- timent lexicon for classical Chinese po- etry. We evaluate our lexicon intrinsically and extrinsically. We show that our graph- based approach outperforms a previous well-known PMI-based approach (Turney and Littman, 2003) on both evaluation set- tings. On the basis of our sentiment lexi- con, we analyze sentiment in the Complete Anthology of Tang Poetry. We extract top- ics associated with positive (negative) sen- timent using a position-aware sentiment- topic model. We further compare senti- ment among different poets in Tang Dy- nasty (AD 618 – 907).

1 Introduction

Classical Chinese poetry is a precious cultural her- itage. Among its over 3,000 years of history, the Tang Dynasty (AD 618 – 907) is widely viewed as the zenith of the art of classical Chinese poetry. The Complete Anthology of Tang Poetry, edited during the Qing Dynasty (1644 – 1911), contains over 42,860 poems in 900 volumes by more than 2,500 poets. The collection provides a magnificent insight into all aspects of social life of that period.

Research on sentiment/emotion and imagery analysis of Tang poetry is an active subfield in Chinese philology, with a vast literature (Watson, 1971; Kao and Mei, 1971; Kao and Mei, 1978). In this paper, we seek to analyze the sentiment (i.e.,

positive or negative) of textual elements in Tang poetry from a computational perspective. Specif- ically, we propose a novel graph-based method to create a sentiment lexicon for classical Chinese poetry. Such a lexicon is a valuable resource for other computational research on classical Chinese poetry, such as semantic analysis (Lee and Tak- sum, 2012) or poetry generation (He et al., 2012; Zhang and Lapata, 2014).

Turney and Littman (2003) propose a PMI- based algorithm to estimate the semantic orien- tation or polarity of a word. The semantic ori- entation of a given word is calculated by com- paring its similarity to positive reference words (e.g., excellent or beautiful) with its similarity to negative reference words (e.g., poor or bad). In- stead of calculating the similarity between a given word and each of the positive (negative) reference words separately, we apply Weighted Personalized PageRank (WPPR) to measure the similarity be- tween the given word and all positive (negative) reference words simultaneously in a lexical net- work that we build from a poetry corpus. Our graph-based method is able to find globally opti- mal solution because the lexical network is ana- lyzed as a whole (Section 3).

We evaluate our poetry sentiment lexicon intrin- sically and extrinsically. For the intrinsic eval- uation, we compile two test datasets. The first dataset contains 933 words (532 positive and 401 negative) taken from three Chinese sentiment lexi- cons1. The second dataset contains 55 words taken from literature of imagery analysis for Tang po- etry. These words reflect the common imageries in classical Chinese poetry and have certain fixed emotional connotations. For instance, the char- acter “猿” (ape) often relates to sadness, anxi- ety and distress, while the character “荷” (lotus)

1Although these lexicons are for contemporary Chinese, some words keep the same meaning and polarity as in classi- cal Chinese poetry.

Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 15–24, Beijing, China, July 30, 2015. sect;c 2015 Association for Computational Linguistics and The Asian Federation of Natural Language Processing

is the symbol of beauty, love and rectitude. We show that our method outperforms the very com- petitive PMI-based approach when evaluating on both datasets (Section 4.1). Our method also out- performs the baseline on an extrinsic evaluation task of predicting sentiment orientation of classi- cal Chinese poetry (Section 4.2).

On the basis of our sentiment lexicon, we ana- lyze sentiment in the Complete Anthology of Tang Poetry. We first analyze topic distributions under positive/negative sentiment in Tang poetry using a position-aware sentiment-topic model (Section 5.1). We then compare sentiment among different poets in Tang Dynasty (Section 5.2).

The main contributions of our work are:

We propose a graph-based method to build a sentiment lexicon for classical Chinese po- etry. Our method is weakly supervised and does not rely on existing lexical resources (e.g., WordNet). It can be easily ported to other domains/languages.
We evaluate our sentiment lexicon systemat- ically and demonstrate that it can be utilized to analyze sentiment orientation of classical Chinese poetry.
We analyze sentiment in Tang poetry on the basis of our sentiment lexicon. We apply a position-aware sentiment-topic model to extract themes which are tightly associated with positive/negative sentiment. Our model builds in specific assumptions that character- ize sentiment expression in classical Chinese poetry. It assumes that lexical items from the same region are generated from a single sentiment-topic pair. We compare sentiment among different famous poets and show that our results are in accordance with studies in Chinese philology.

The poetry sentimen

剩余内容已隐藏，支付完成后下载完整资料

中国古代诗歌的情感分析

Yufang Hou Anette Frank

德国海德堡大学计算语言学研究所

第九届SIGHUM文化遗产，社会科学和人文语言技术研讨会论文集，第15-24页，2015年7月30日。

摘要：虽然近几年来中国媒体对情感的分析吸引了不少的关注，但由于情感词汇的匮乏，较少探讨中国传统文学（例如，中国古典诗歌）中的情感词汇。在本文中，我们提出了一种基于加权人格化方案（WPPR）方法，为中国古典文化创造了一种实验词汇，并证明了我们的以图为基础的方法优于以前的著名的基于PMI的方法（2003特尼和利特曼）。在我们的情感词汇的基础上，我们分析《全唐诗》中的情感。我们使用位置感知情绪主题模型来提取与正（负）事件相关的特征。从而进一步比较唐代不同诗人的情感（公元618 - 907年）。

1介绍

中国古代诗歌是一种珍贵的文化遗产。拥有三百多年历史的唐代（公元618 - 907年）被广泛认为是中国古典诗歌艺术的顶峰。清代（1644 - 1911年）编纂的《全唐诗》共有超过二千五百位诗人，共超过42890首诗歌。该集合全面的展现了该时期的社会生活的各个方面。

诗歌情感和意象分析研究是中国语言学中的一个活跃的子领域，具有广泛的文献。在本文中，我们试图从计算的角度分析唐诗文本中的情绪（即正面或负面）。特别之处在于，我们提出了一种新颖的基于图形的方法来创建中国古代诗歌的情感词汇。这样一个词汇是对中国古典诗歌的其他计算研究的宝贵资源，如语义分析或诗歌变革。

Turney和Littman（2003）提出了一种基于PMI的算法来估计一个单词的语义环境或极性。通过将其与正参考词的相似性（例如，优秀或美丽）与其与负参考词的相似性（例如，差或不良）相比较来计算给定单词的语义。在我们从诗歌语料库构建的词汇网络中我们应用加权个性化方案（WPPR）来同时测量给定单词和所有正（负）参考单词之间的相似度，而不是单独计算一个给定单词与每个正（负）参考单词之间的相似度。我们的基于图形的方法能够找到全球最优解决方案，因为词汇网络是整体分析的（第3节）。

我们以内在和外在的方式评价我们的诗歌情绪词汇。对于内在评估，我们编译两个测试数据集。第一个数据集包含从三个中国情绪词典中提取的933个单词（532个正面和401个负面）。第二个数据集包含从唐代图像分析文献中获取的55个词。这些话反映了中国古典诗歌中的常见影像，并具有一定的固定情感含义。例如，猿人（猿）通常与悲伤，焦虑和痛苦有关，而“荷”（莲）是美，爱，正直的象征。我们表明，当对两个数据集进行评估时，我们的方法胜过采用基于PMI竞争的方法（第4.1节）。我们的方法也超出了预测中国古典诗歌情感取向的外在评价任务的基准（第4.2节）。

在我们的情感词典的基础上，我们分析了唐诗全集中的情感。我们首先使用位置感知情绪主题模型分析唐诗中积极/消极情绪下的主题分布（第5.1节）。然后比较唐代不同诗人的情感（第5.2节）。

我们工作的主要贡献是：

bull;我们提出一种基于图形的方法来构建古典中国诗歌的情感词汇。我们的方法弱势监督，不依赖于现有的词汇资源（例如WordNet）。它可以轻松地移植到其他域/语言。

bull;我们系统地评价我们的情绪词汇，并表明它可以用来分析中国古典诗歌的情感取向。

bull;我们根据情绪词汇分析唐诗中的情绪。我们应用一个位置感知情绪主题模型来提取与正/负情绪紧密相关的主题。我们的模式建立在古典中国诗歌中表征情感表征的具体假设。它假设来自相同区域的词汇项目是从单个情绪主题对生成的。我们比较不同着名诗人的情感，表明我们的结果符合中国语言学研究。

2相关工作

情绪词汇近年来，对于大型极性（正，负）词汇的创造，包括各种基于语料库的方法（Turney和Littman，2003; Kanayama和Nasukawa，2006; Kaji和Kitsuregawa，2007; Kiritchenko等，2014）和基于字典的方法（Kamps等人，2004; Esuli和Sebastiani，2005; Mohammad等，2009; Baccianella et al。，2010）。与以前基于图形的方法不同，它基于现有的词汇资源（例如WordNet，词典）（Takamura et al。，2005; Rao and avichandran，2009; Hassan et al。，2011）创建了词汇词典，中国古典诗歌词汇资源。因此，我们选择一种基于语料库的方法。

虽然我们建立情感词汇的方法是独立于领域的，本文将其应用于中国古典诗歌。这不是一件小事。有各种可靠的资源用于英语情感分析。不过，只有少数情感词汇可供中文使用。特别是这些词汇适用于当代中国人。此外，鉴于这些词汇是为当代中国人开发的，它们只能部分覆盖中国古典诗歌。由于几千年的语言变化，也可能会有分歧。为了改善中国人的情绪分析，一线工作旨在通过机器翻译来利用丰富的英语情报资源（Wan，2008; Wan，2009; He et al。，2010）。这些方法取决于机器转换的质量，中国古典诗歌翻译成英文甚至是专业翻译者。我们的工作类似于Zagibalov和Carroll（2008），因为这两种方法都是弱势的。他们从一小批种子项目和几种词法模式（否定的副词结构）开始，迭代地构建一个情感词汇，这可以表明词汇特征。然而，这种词汇模式（例如，不（不）很（相当）满意（满足）（目标词））在古典中国诗歌中不适用。

古典中国诗歌的计算分析。以前的工作主要集中在中国古代诗歌诗歌世代（Zhou et al。，2010; He et al。，2012; Zhang and Lapata，2014）。李和孔（2012）为《全唐诗》开发了一个数图资料库。在这种语料库的基础上，李和Tak-sum（2012）定量分析了《全唐诗》中的语义内容和词语用法。 Voigt和Jurafsky（2013）发现，通过比较诗歌和现代散文发现了诗歌的古典特征。

3构建古典中国诗歌的情感词汇

在本节中，我们简单介绍加权个人化方案（WPPR）。然后，我们将详细介绍如何构建一个词汇网络，以及如何在词汇网络上应用WPPR来构建中国古典诗歌的词汇词典。

4.1内在评价

测试数据集。为了评估我们的方法，我们编译了两个测试数据集。第一个数据集（SenLexexicon）包含从三个中国情绪词典取得的933个情绪词。虽然这些词典是针对当代中文的，但一些词语与中国古典诗歌保持着相同的意义和极性。我们通过删除重复或矛盾的条目来合并这三个词典。这产生了一个大的词汇词典，其中包含12,945个正面词和 17,114个负面词。然后，如果它们没有出现在一组情感（表1）中，则通过选择单个字符的单词和两个字符的单词来创建数据集，并在“完整的选集”中出现至少50次唐诗。这导致包含532个正面词汇项目和401个负面词汇项目的数据集。

然而，数据集并没有反映中国古典诗歌的重要方面，即情绪是通过隐喻来表达的。熟练的诗人经常应用具体的意象来唤起情绪和感觉。某些影响具有固定的情感含义。例如，落叶秋叶（“落叶”）往往指个人或王朝的衰落。我们称这样的单词为图像词。我们从唐诗的图像分析文献中收集了55种典型的文字（意象词）。意象词中的每个单词都不会出现在数据集中。表2显示了意象词的一些示例。

测试数据集的结果。表3显示了针对两个测试数据集的第3部分（WPPR）和基线（PMI）中描述的方法的结果。在这两种情况下，我们以图表为基础的方法胜过基线。我们的方法比基线更强大，因为它测量候选词汇项和整个正（负）情绪种子之间的相似性。

评估样本数据。我们的测试数据集（SentiLexicon和ImageryLexicon）仅覆盖了我们情绪词典中约11.5％的词汇项。为了评估不在测试集中的词汇项目，我们随机选择100个项目（50个单字和50个双字符词汇项，均等于正/负情绪分布）。他们由第一作者手动检查。在这个艰苦的评估环境中，我们获得了53％的准确性。

4.2外部评价

我们还进行外在评价，判断我们的情感词汇是否可以用来分析中国古代诗歌的情感导向。我们从唐诗分析词典（Xiao，1999）中选择160首诗，其中包含约1000首唐诗与专业评论。根据审稿人的分析，我们手动注释每首诗的正面或者负面。这导致一个数据集（sentiPoetry），其中包含83个负面特征和77个正面诗歌。对于每首诗，我们基于诗中所有词汇项（单字和双字）的累积情绪取向，预测其情绪。具体来说，如果一个诗的积累情绪取向大于一个阈值t，那么一首诗被预测为正，否则为否。使用包含30个正面和30个负面的数据集的一个子集来调整阈值t，剩下的100首诗被保留为测试数据。表4显示了使用4.1节描述的当代中国情绪词典以及基于基线（PMI）和我们的方法（WPPR）的两个词典来预测诗歌情绪对测试数据集的准确性。使用我们的词典可以预测71％的诗歌情绪，比使用PMI Lexicon好14％。

剩余内容已隐藏，支付完成后下载完整资料

资料编号：[26778]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

原文和译文剩余内容已隐藏，您需要先支付 30元 才能查看原文和译文全部内容！立即支付

免费ai写开题、写任务书：免费Ai开题 | 免费Ai任务书 | 免费降AI率 | 免费降重复率 | 论文免费排版

注册

找回密码

中国古代诗歌的情感分析外文翻译资料

Yufang Hou Anette Frank

Abstract

1 Introduction

您可能感兴趣的文章

登录

注册

找回密码

Yufang Hou Anette Frank

Abstract

1 Introduction

您可能感兴趣的文章