Ask'n'Read
A Powerful Web Data Crawler
Ask'n'Read prioritizes web data retrieval. The criteria for data retrieval is source type (news sites, corporate websites, blog, etc.) or source type importance.
Most of the two millions websites that Ask'n'Read monitors are categorized. Around two hundred thousands of them provide new information every day. As a result, most of the one million new links retrieved every day are qualified in relevant categories.
Ask'n'Read monitors websites written in English, French, German, Spanish, Italian, Portuguese, as in Chinese, Arabic, Polish, Romanian and Russian.
Ask'n'Read can retrieve text data in different formats. The simplest one is news feeds (RSS/Atom, etc.), which are accessible on many websites. But as technologies continuously change, QWAM has developed alternatives to retrieve specific data.
Ask'n'Read can monitor the content or the links of a site, and detect changes on it. In addition to this, it is possible to define how a web page could be analyzed to retrieve the most relevant information.
The web data collected by Ask'n'Read is stored in a single repository. Once stored, hererogeneous data are made homogeneous to meet categorization criteria, encoding, and so forth. Various applications, of which Ask'n'Read is but one, source data from the repository via APIs. Easy to extract. Easy to export to different applications.
Every Ask'n'Read software and hardware choice has been developed to meet our clients' evolving expectations.
A Data Classifier
Ask'n'Read categorizes the data according to different criteria :
- Language, geography
- Type (corporate website, blog, dashboard)
- Sector, activities, etc.
Categories can be crossed to fine-tune data qualification.
In addition to this, Ask'n'Read can manage corpuses of websites or news feeds. If you want to follow specific sources of information, or if you know precisely what the scope of your data should be, a corpus is a perfect solution. If needed, QWAM will help you define it.
Initially, QWAM categorized a lot of data on referenced websites and news feeds manually. But categorization is continually improving, be it by using dictionnaries and catalogs or by developing machine learning scripts.
Ask'n'Read is available in a SaaS mode (Software as a Service), which means every improvement to the database and categorizations benefit all our clients.
Still, if any client has specific needs, personal categories can be developed the same way corpuses can be personalized.
An efficient full-text search engine
The easiest way to filter Ask'n'Read data is to define a query based on keywords. Our search engine allows you to make simple boolean queries processed by morphology scripts (stemmers and lemmatizers). You can specify precise keywords to avoid "false friends" that could be result of morphology simplification.
The Ask'n'Read web interface allows you to query in two ways. The first and simple way allows you to copy-paste in three distinct fields:
- the keywords the results must contain,
- the keywords the results may contain (at least one)
- the keywords the results must not contain
The second way is an advanced interface that allows for extended query syntax.
The extended syntax gives you access to more options and allows you to go deeper into the queries. Examples of extended syntax queries are:
- proximity criteria (this keyword should not be separated from this other keyword by more than X words).
- quorum criteria (from this list of keywords, at least X should appear in the text)
- field restriction criteria (this part of the query must match only the titles of the results, not the content).
The extended syntax is the perfect solution to avoid data noise in the results.
We know how fastidious it can be to submit full-text queries that retrieve relevant and noiseless results. QWAM teams can guide and support you in creating the most appropriate queries.