COLLABORATIVE FILTERING RECOMMENDER
SYSTEMS IN MUSIC RECOMMENDATION

Urszula Kuzelewska[1], Rafal Ducki[2]

Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland
[2] Student of Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland

Abstract: Nowadays, the primary place of information exchange is the internet. Its features, such as: availability, unlimited capacity and diversity of information influenced its unrivalled popularity, making the internet a powerful platform for storage, dissemination and retrieval of information. On the other hand, the internet data are highly dynamic and unstructured. As a result, the internet users face the problem of data overload. Recommender systems help the users to find the products, services or information they are looking for.

The article presents a recommender system for music artist recommendation. It is composed of user-based as well as item-based procedures, which can be selected dynamically during a users session. This also includes different similarity measures. The following measures are used to assess the recommendations and adapt the appropriate procedure: RMSE, MAE, Precision and Recall. Finally, the generated recommendations and calculated similarities among artists are compared with the results from LastFM service.

Keywords: collaborative filtering, music recommendations, recommender systems

Introduction

Recommender systems (RS) are methods approaching the problem of information filtering. Their task is to register and analyse a user^s preferences and generate a personalised list of items. In other words, the systems filter the information that may be presented to the user based on their interest. As input data, they register products5 ratings, views of Web sites, purchases of items, as well as specific characteristics or descriptions of the products [11].

Recommendation concerns, among the others, news, music, video, content of e-learning courses, books and subject of web sites or web site navigation.^[1]

Music is regarded as particularly difficult domain for recommender systems application [3]. It combines the fields of music information retrieval (MIR) and recommendations [14]. There are several approaches addressed this problem. The easiest solution is to gather ratings from users, however this type of data is difficult to obtain and can contain, sometimes intended, outliers and noise. The other approach is to count tracks played by users and process them to form ratings, e.g. LastFM (http://www.lastfm.com) . Finally, input data can be users⁵ playlists composed of their favourite songs and artists. There are also methods, which process music streams extracting fundamental complex features from the records, e.g. Mufin (http://www.mufin.com), Pandora (http://www.pandora.com).

The article presents a recommender system for music artist recommendation. It uses track play counts as input data. Different RS approaches, including similarity measures, have been implemented and evaluated using efficiency coefficients and compared to LastFM service results. The paper is organised as follows: the next section introduces recommender system domain: classification, problems, similarity measures and evaluation. The following part presents selected music recommendation solutions. The last two sections concern experiments as well as analysis of the results and the final conclusions.

Introduction to recommender systems

Recommender systems help customers to find interesting and valuable resources in the internet services. Their priority is to create and examine users individual profiles, which contain their preferences, then update the service content to finally increase the userrsquo;s satisfaction. This section introduces recommender systems: their classification and main problems. It presents selected similarity measures and lists the most common approaches to recommendations evaluation.

2.1 Classification and problems in recommender systems

Considering a type of input data as well used methods, recommendation systems are divided into content-based, collaborative filtering (CF), knowledge-based and hybrid [9].

Content-based recommendations (called content-based filtering) base on attribute (characteristic) vectors of items created from text connected with the items, e.g. their description, genre, etc[11].Asan example, in case ofbooks, the item characteristics include its genre, topic or author. The content-based algorithms recommend items, which are similar to highly rated by the user other items in past. As an example, if a user liked (rated or bought) X movie, a recommender system searched other movies, which were similar to X with regard to its genre, title, directors name or description of the story. The main advantages of content-based systems are: relatively simple implementation and independence of users. The disadvantages are: a problem of 'cold start' for users and the requirement of items5 features analysis.

Knowledge-based approach is better for one-time users stores, e.g. selling cameras (people do not buy cameras often) [1]. The approach bases on technical attributes of the items and user preferences, also weighted, related to the attributes. Knowledge acquirement is often realised by interaction with users. This is an approach, where the 'cold start' problem does not appear and usersrsquo; data are not required to store for long time, however they have to use specific techniques to gather the knowledge.

Collaborative filtering techniques search similarities

剩余内容已隐藏，支付完成后下载完整资料

基于协同过滤的音乐推荐系统

乌苏拉库泽莱斯卡[1]、达基拉法[2]

比亚莱斯托克理工大学计算机科学学院，比亚伊斯托克，波兰
波兰比亚伊斯托克比亚利斯托克理工大学计算机学院学生

摘要

如今，信息交流的主要场所是互联网。它的特点，例如：可用性、无限容量和信息的多样性影响了它无与伦比的普及，使互联网成为存储、传播和检索的强大平台信息。另一方面，互联网数据具有高度的动态性和非结构化。作为结果，互联网用户面临着数据过载的问题。推荐系统帮助用户查找他们要查找的产品、服务或信息。

本文提出了一个音乐艺术家推荐系统。它是由基于用户和基于项的过程，可以在用户的会话。这也包括不同的相似性度量。以下措施用于评估建议并采用适当的程序：RMSE、MAE，精确和召回。最后，生成的建议和计算出的相似性。其中艺术家与LastFM服务的结果进行了比较。

关键字 协同过滤；音乐推荐；推荐系统

介绍

推荐人系统(RS)是处理信息问题的方法筛选。他们的任务是注册和分析用户的偏好，并生成一个个性化的项目列表。换句话说，该系统过滤了可能会对以下信息进行过滤，根据用户的兴趣向用户展示，作为输入数据，他们注册产品的评级、网站的意见、项目的购买，以及具体特点或产品说明[11]。

推荐内容包括新闻、音乐、视频、内容网站或网站导航的电子学习课程、书籍和主题。

音乐被认为是推荐系统应用中特别困难的领域[3]。它结合了音乐信息检索（MIR）和推荐领域[14]。有几种方法可以解决这个问题。最简单的解决方案是收集用户的评分，然而这种类型的数据很难来获取，并可能包含（有时是有意的）离群值和噪声。另一种方法是统计用户播放的曲目，并对其进行处理，以形成评级，例如：LastFM ()(http://www.lastfm.com)。最后，输入的数据可以是用户的播放列表，由用户喜欢的歌曲和艺术家组成的播放列表。也有一些方法，它可以处理从记录中提取基本的复杂特征的音乐流，例如Mufin(http://www.mufin.com)、潘多拉(http://www.pandora.com)。

本文提出了一个音乐艺术家推荐系统。它使用曲目播放量作为输入数据。不同的RS方法，包括相似度指标，并利用效率系数对其进行了评估，并与LastFM服务结果进行比较。本文的组织结构如下：接下来的内容是部分介绍推荐系统领域：分类、问题、相似性、措施和评价；下面的部分介绍了选定的音乐推荐方案；最后两部分涉及到实验以及对音乐推荐方案的分析结果和最后的结论。

音乐推荐

音乐是人们日常生活中的一部分。我们可以在收音机里听音乐。在互联网上或在商店购买专辑。但是，只有推广的或最受欢迎的音乐是很容易找到的。推荐人系统是解决这个问题的好工具。

目前最流行的音乐推荐方法有：协同过滤、基于内容的信息检索、基于情感和基于情境的模型[14]。

协同过滤音乐推荐人根据历史上的曲目播放记录或直接的音乐评分。有趣的解决方案是自动生成播放列表[3]，其中搭配的艺术家根据他们在播放列表中的出现情况来确定。

基于内容的程序分析歌曲的描述、特征或声学特征[5]。基于提取的特征，数据挖掘算法，如聚类或采用kNN分类。

与基于内容的方法类似，基于模式的情感模型，但更喜欢感性的特征，如能量、节奏、时间、光谱和和谐[2]。

基于情境的方法，利用舆论发现和推荐音乐的方法[10]。流行的社交网络网站提供了丰富的人类知识，如评论、乐评、标签和友情关系等。基于语境的技术收集信息来识别艺术家相似性、流派分类、情感检测或语义空间。

音乐数据库就是一个巨大数据源的例子。有众多的音乐艺术家，然而却有更多的音乐迷。尽管有一些流行的音乐发现网站，如LastFM、Allmusic（http://www.Allmusic.com）或Pandora（http://www.Pandora.com），但科学文章中出现了许多新方法。最常被提出的是混合推荐系统，这是处理数据大小的一个很好的解决方案[6]。

实验

本节介绍了音乐推荐混合系统的实验结果。该系统是比亚利斯托克大学硕士论文的一部分关于技术[7]。它是一个使用Apache Mahout库的Web应用程序[16]。

训练数据以文本文件的形式从LastFM音乐服务中提取，如图1所示。这套系统包含500名用户，他们收听了13680次，4436位艺术家的足迹。平均来说，一个用户听了27.36首歌，而另一个用户这位艺术家被演奏了3.08次。文件的每一行都包含了用户的id、一个艺术家的名字和一个演奏曲目的名字。

图1 从LastFM服务提取的数据

本文实践部分的目的是在真实环境中构建和评估一个推荐系统（见图2）。它涉及到数据的来源、服务器上的应用部署以及评估的方法，其中包括与LastFM的推荐列表进行比较。为了使其推荐有效，该系统必须使推荐程序适应活跃用户的需要。

其中一个主要问题是对分配给用户的数据进行预处理与评级值，将用户分配到轨道上。它是利用轨道播放次数进行的。第一步是：将游戏次数归一化（见等式10），并将结果用整数进行排序。从区间[1,2,3,3,4,5]中选取。在方程中使用的符号如下。r(ui,tj)--表示评分值，|ui(tj)|是用户ui的轨道tj的播放次数，|ui|是用户ui收听的总播放次数，剩余部分表示某一特定曲目的最大播放次数，由其中一个用户输入数据。

图2 创建推荐系统的体系结构

规范化操作不影响数据关系；图形呈现最常演奏的艺术家及其受欢迎程度（见图3）。

图3 最常播放的50位艺术家人气（右）数据处理

用方程11计算出的用户评分矩阵的密度(p为评分等级，m - 用户数和n - 艺术家数）为0.62%。足够高，可以应用协同过滤程序，并且不影响对生成的建议清单的时间有负面影响。

在处理后的数据中，值为1的评价最多（47.41%），其次是值2（25.58%）、值3（11.24%）、值5（9.9%）和值4（5.86%）。值得一提的是表明评级范围的选择不仅是由其受欢迎程度决定的。使用RMSE值的范围[1,.,.,10]的实验结果更差。

实验所采用的算法是基于用户和项目的算法。相似度量有：余弦量、Pearson和Spearman相关度量、Tanimoto系数、Manhattan系数和基于欧几里得的距离度量。

在基于用户的方法中，有必要确定活动对象的邻域用户。最常用的方法是识别其k近邻（kNN方法）。一些邻国很重要，影响到建议的准确性。表1包含了不同数量k的RMSE结果。

表1 不同邻域数和不同相似度量的基于用户的RMSE方法

在大多数相似度量的情况下，当下列情况下，RMSE值会下降：邻域的大小上升。两者的相关系数都是例外。邻居用户需要更多的时间来识别和处理它们。考虑到上述信息，最佳的结果是100和250大小的邻域和基于欧几

剩余内容已隐藏，支付完成后下载完整资料

资料编号：[410062]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

原文和译文剩余内容已隐藏，您需要先支付 30元 才能查看原文和译文全部内容！立即支付

免费ai写开题、写任务书：免费Ai开题 | 免费Ai任务书 | 免费降AI率 | 免费降重复率 | 论文免费排版

注册

找回密码

基于协同过滤的音乐推荐系统外文翻译资料

介绍

推荐系统的介绍

2.1 推荐系统的分类与问题

2.2 协同过滤系统中的相似度计算方法

2.3 推荐系统的评价

音乐推荐

实验

您可能感兴趣的文章

登录

注册

找回密码

介绍

推荐系统的介绍

2.1 推荐系统的分类与问题

2.2 协同过滤系统中的相似度计算方法

2.3 推荐系统的评价

音乐推荐

实验

您可能感兴趣的文章