Loading...

Influence Factors For Online Dating Profit

by B. Sc. Mathias Riechert (Author) Xiaomin Su (Author) Han Chen Hsu (Author)

Project Report 2010 29 Pages

Computer Science - Commercial Information Technology

Excerpt

Table of Contents

Table of Figures

1 Introduction

2 Related Work

3 Conceptualization of Processes of Discovery

4 Pre-Processing and Post-Processing

5 Application
5.1 How do users interact?
5.2 Which users are likely to pay based on dynamic data?
5.3 Which users are likely to pay based on static data?
5.4 Combine the findings

6 Comparison to Traditional Tools

7 Conclusion

References

Appendix - Proc SQL

Table of Figures

Figure 1: Report outline

Figure 2: Process conceptualization

Figure 3: Data Preparation

Figure 4: Bucket transformation (behaviour variables) for Decision Tree

Figure 5: Regression Analysis with the variable „Stamps“ as target

Figure 6: Compare the Activity between Male and Female

Figure 7: Compare the Kiss Received of Male and Female in different Social Status

Figure 8: Regression Analysis results

Figure 9: Dynamic variables decision tree

Figure 10: Regression Analysis for static variables

Figure 11: Nearest Neighbour Cluster Results for static variables

Figure 12: Decision Tree application

Figure 13: Decision Tree Rule Alternatives

Figure 15: Classification / Prediction Rules combined

Table 1: Customer details

Table 2: Method alternatives

Table 3: Frequency Activity of Cluster

Table 4: Section Result

Table 5: Influence of dynamic variables

Table 6: Behaviour variable rules for addressing users buying many stamps

Table 7: Rules Evaluation

Table 8: Rules for the customer group most likely to pay money

Table 9: Static Rules Evaluation

1 Introduction

‘. . . Knowledge Discovery is the most desirable end-product of computing. Finding new phenomena or enhancing our knowledge about them has a greater long-range value than optimizing production processes or inventories, and is second only to task that preserve our world and our environment. It is not surprising that it is also one of the most difficult computing challenges to do well . . .’ (Wiederhold, 1996).

The main objective of knowledge discovery in Data Mining lies in the finding of data patterns. The knowledge about the current customers can be used to predict profitable customers based on their personal information. This explorative report focuses on analysing different methods of data mining to predict profitable customers of a dating site. The second key aspect is to match individual customers based on their personal information.

The dataset analysed is derived from the customer database of Australia’s largest dating site with over 1.9 million members. The dataset contains static activity and dynamic activity. Static activity includes all personal, demographic and interest information entered by the customer at its registration. The emails sent, channels communicated and kisses sent describe the dynamic activity.

illustration not visible in this excerpt

Table 1: Customer details

Table 1 shows the customer details in the table. Another data table holds the information for users without stamps.

The given data offers various topics to be analysed:

- How do users interact?
- Who is likely to pay money based on static behaviour?
- Who is likely to pay money based on dynamic behaviour?
- What makes a person purchase a stamp?

Based on these questions, the resulting report outline for this document is:

illustration not visible in this excerpt

Figure 1: Report outline

The basis for the report is the user behaviour analysis. After a general analysis the focus is laid on determining which users are likely to pay for the service. This includes dynamic and static data. In the final step the combination of the findings is used to propose an implementation strategy for the future development of the website.

2 Related Work

Online social networks and identity representation are active research areas with input from computer sciences, statistics, sociology and psychology. Studies on psychological aspects of social identity representation examine the social implication of displaying public identities (Donath, 2004). The aim of this paper is the analysis of the interactivity between the users of an online dating website and how that influences their payment behaviour.

Toma (2008) addresses the self-presentation issue by observing the characteristics of users to establish the truth about online dating profiles. Hu and Zeng (2007), also use a framework to predict users’ identity upon their self-presentation history. While their proposed algorithm achieved high accuracy on prediction, their method is not able to clarify if the predicted traits are real or fabricated.

There are some recent academic studies on online social interaction using popular networks. Carverlee and Webb (2008) studied the characteristic of MySpace profiles based on facets of this social network. This paper has similarity to our work, however the focused was to identify elements of sociability and explain the use of language within different type of gender. The works on other social networks such as Facebook also focus on identity presentation and information sharing in student networks. Acqusiti & Gross (2006) and Tufekci (2008) also examined the disclosure behaviour on MySpace and Facebook users in correlation to privacy issues. The authors proposed a methodology for clustering and identifying similarity in user’s behaviours on YouTube data. Lerman and Jones (2006) used a small data sample from Flickr and found that the social network is used to locate new content in the site. Nowwell (2003) investigated co-authorship networks in physics to test how well different graph proximity metrics can predict future collaborations.

The paper at hand focuses on analysing the monetary aspect of an online dating website based on the user profiles. The company will benefit from the resulting prediction rules. Similar to the work of Carverlee and Webb the basis is the analysis of the user behaviour. This is extended by a more prediction-orientated analysis, not to be found before in scientistic literature in the context of dating websites. Accordingly, the website can develop a focus on the target customers and try to attract the potential customers.

3 Conceptualization of Processes of Discovery

illustration not visible in this excerpt

Figure 2: Process conceptualization

Figure 2 depicts the process conceptualization of the report. Its structure is based on Figure 1. For the User Behaviour Analysis the first overview is done with a Regression Analysis. It is used to show the influence of the different behaviour attributes. Regression is a powerful tool to analyse data with interval target variables (“Stamps”). It requires the data to be cleaned before. So missing values have to be imputated and skewed variables have to be modified to achieve a good result. Afterwards a Cluster Analysis helps categorizing and analysing the data more detailed. The resulting behaviour is explained in written form to build the basis for the next steps of the analysis.

In the second step the focus is laid on the payment aspect. The aim is to analyse which customers will pay for the service based on the dynamic and static behaviour. The Neural Network is used to classify the data and give an indication which variables are relevant. The Decision Tree is used to compare the results of classification.

In the third step the resulting rules are combined to form a rule set predicting the likeliness of the customer with special data to use money. In the last step the rule set is converted into an implementation proposal.

illustration not visible in this excerpt

Table 2: Method alternatives

[...]


[1] (Nayak, 2010)

[2] (Nayak, 2010)

Details

Pages
29
Year
2010
ISBN (eBook)
9783640913701
ISBN (Book)
9783640912445
File size
3.7 MB
Language
English
Catalog Number
v171760
Institution / College
Queensland University of Technology
Grade
1,0
Tags
Data Mining Online Dating Communities Influence Factors Profit Analysis Decision Tree Clustering Rule Export Regression

Authors

Share

Previous

Title: Influence Factors For Online Dating Profit