2015-07-21 Carlos Garnacho Release 1.5.1 libtracker-miner: Avoid full table scans on recursive sparql buffer queries If MATCH_CHILDREN is specified for a TrackerTask, we use tracker:uri-is-descendant(), it's however smarter to use fn:starts-with, as that'll resort to sqlite tricks that will avoid full table scans. libtracker-miner: Remove operations on children on deleted folders This is an optimization to reduce the number of queries that we perform across the deletion of large directory trees. libtracker-miner: Only set MATCH_CHILDREN on tasks for directory files It's a query that can be avoided for non-directories, so better do it. libtracker-miner: Add tracker_file_notifier_get_file_type() Just plug the hole from the internal TrackerFileSystem, will be handy for fast file type checks at the TrackerMinerFS level. libtracker-miner: Be smarter at not triggering TrackerDecorator activity There's times where tracker-extract GraphUpdated handler will fire due to its own inserts. Doing so is harmless, but triggers each time a query for the count of unhandled elements that effectively goes nowhere as it's already active. So handle_updates() is now smarter at not triggering activity unless a resource of the inspected classes is added, and graph_updated_cb() won't trigger anymore the count query everytime. libtracker-data: delete elements from the Resource table On deletion, items with an specific row ID are removed from all tables but the Resources one, which holds the urn:uuid:... mappings. The deletion of that table lead to confusions in the fts_view view and ultimately the FTS table, as both will indirectly depend on the elements stored there, so the deleted rows still had FTS representation, just filled with nulls. This looks like was just forgotten, if it was there to cover constraint errors, it'll be better to just open the pandora box, and fix the bugs we receive. Anyhow, from testing most common scenarios it works alright. parser: Optimize 0-length string parsing We were still creating the ICU parser and trying to feed it with data, which turned out surprisingly expensive on deletes, as "deleting" on FTS just replaces the text with "nothing", so we're creating a parser for each of these. This reduces the timing of the sparql delete in the previous commit further down to: real 1m7.029s user 0m0.023s sys 0m0.009s libtracker-data: Don't schedule all deletes only because of FTS The limitations in FTS why it made sense to perform the scheduled delete no longer apply since FTS4 and external content tables (or rather, we don't need the previous values explicitly). The scheduled delete is a lot more (if not extremely) thorough, decomposing the properties and items to be deleted into individual queries. This has quite an effect on deletes involving a large number of elements, a query like delete { ?u a rdfs:Resource; } where { ?u nie:url ?url . FILTER (fn:starts-with (?url, ".../linux/"))} on a linux git checkout indexed through tracker-miner-fs used to involve 7M sqlite queries, with this fast path it's down to 1.6M (and infinitely less sqlite3_stmt cache misses). In result the timing is improved substantially, time(1) from that query on the "tracker sparql" command went from: real 2m33.377s user 0m0.021s sys 0m0.008s Down to: real 1m23.625s user 0m0.021s sys 0m0.009s libtracker-data: Add function to delete an entire row from the FTS table This can be used as an optimization, instead of updating each column individually as we currently do. 2015-07-19 Carlos Garnacho tracker-extract-msoffice: Avoid frequent errors when feeding wrong files There's mimetypes which detection is too weak (i.e. purely based on filename extension matches), so it makes sense to avoid the frequent errors we get when the module gets fed a random file. tracker-extract-gstreamer: Avoid frequent errors when feeding wrong files There's mimetypes which detection is too weak (i.e. purely based on filename extension matches), so it makes sense to avoid the frequent errors we get when the module gets fed a random file. Merge branch 'wip/GrssPerson' 2015-07-18 Igor Gnatenko configure: bump required libgrss version to 0.7 and now grss has unversioned pc name 2015-07-18 Carlos Garnacho rss: use tracker_sparql_builder_object_blank_open()/close() Tested-by: Igor Gnatenko 2015-07-18 Igor Gnatenko rss: add extracting additional attrs for persons GrssPerson is introduced in libgrss 0.7 2015-07-18 Carlos Garnacho rss: Extract copyright/contributors/categories from feed messages These do get extracted as nie:copyright, nco:contributor and nie:keyword respectively. 2015-07-18 Pedro Albuquerque Updated Portuguese translation 2015-07-17 Balázs Úr Updated Hungarian translation 2015-07-17 Carlos Garnacho rss: Store html as nmo:htmlMessageContent We get raw HTML content from the feed, and nie:plainTextContent should be that, plain text. This change is twofold, we now store the HTML content as nmo:htmlMessageContent (as the ontology observes), and honor nie:plainTextContent (and FTS!) by storing the plain text stripped of all tags. rss: Maintain references to GrssFeedChannels We're currently leaking these, and recreating all from scratch again each time we query the mfo:FeedChannels. Just keep the references, and reuse GrssFeedChannels from previous additions. rss: Account for a same feed message coming from different channels Unfortunately the nmo:communicationChannel docs are very explicit about the property cardinality. So we just create the mfo:FeedMessage for the first channel, and make it bail out any next time it would be added, from the same mfo:FeedChannel or another. https://bugzilla.gnome.org/show_bug.cgi?id=752484 rss: replace string comparison by boolean check The cursor can return the correct type right away, no need to retrieve the boolean value as a string and compare to anything. rss: Handle mfo:FeedChannel deletes If we receive a delete for one of those, we'll delete all feed messages associated with it. 2015-07-16 Daniel Mustieles Updated Spanish translation 2015-07-16 Carlos Garnacho rss: Make --title argument optional We can fetch that now from the feed channel, so leave --add-feed as the minimum required. rss: Lower severity of frequent message No need for a g_message() for something that's completely expected. rss: Retrieve information around mfo:FeedChannels We leave these mostly untouched, but we should have some info to fill in when the GrssFeedChannel has been populated. We currently retrieve title, url (although we should have gotten that in the first place), feed type, description, image link, and last message date. rss: Be more careful about updates We just check all feeds on any hint from GraphUpdate, which means we also update everything if we dare to modify the mfo:FeedChannel, resulting in circular updates. Actually, we should be just inspecting additions of mfo:FeedChannel elements (or selected property changes in these), so modifications on these don't trigger the extraction of all the feeds again. If we happen to update all feeds on GraphUpdates, we also do so when modifying FeedChannels 2015-07-15 Carlos Garnacho rss: Unset timeout source id This timeout is meant to run once, but leaves the timeout_id behind, which warns when we g_source_remove() it. rss: Fix double free The variant is not for us to free. rss: fix typo in ontology nco:fullname is all lowercase, this one caught me too... rss: Fix compile error has_author was not defined. It's been renamed to "author" too, the former name makes more sense for booleans. 2015-07-15 Igor Gnatenko rss: author field should be nco:Contact, not string Reference: https://bugzilla.gnome.org/show_bug.cgi?id=752398 rss: add author field Reference: https://bugzilla.gnome.org/show_bug.cgi?id=752398 2015-07-14 Igor Gnatenko bump libgrss to latest 0.6 There are no API break since 0.5, but 0.5 doesn't work well and doesn't shipped in distros Reference: https://bugzilla.gnome.org/show_bug.cgi?id=752371