Improve import of MM handling / sub-nodes
At first, thank you for your great extension, its so awesome and powerful! But yeah, this is a new issue, so there is "always something". :)
I need to import a RSS feed into tt_news. Works well so fare, but I need to map more than one category, but this issn't posible. ...
<item> <title>...</title> <description>...</description> <enclosure length="337028" type="image/jpeg" url="http://www./151466P.jpg" /> <enclosure length="804283" type="application/pdf" url="http://www.151724P.pdf" /> <enclosure length="777400" type="application/pdf" url="http://www.151725P.pdf" /> <categories> <category>Category 1</category> <category>Category 2</category> </categories> <pubDate>...</pubDate> <guid>123</guid> </item>
$GLOBALS['TCA']['tt_news']['columns']['category']['external'] = array( 0 => array( // 'field' => 'categories', // 'xpath' => '/rss/channel/item/categories/child::*', 'xpath' => 'categories//category', // 'xmlValue' => 1, 'MM' => array( 'mapping' => array( 'valueMap' => array ( 'Category 1' => 1, 'Category 2' => 2, ), 'table' => 'tt_news_cat_mm', 'multiple' => 1, 'reference_field' => 'tx_presseservice_externalid', ), ), ), );
First I tried to use xpath, but the only the first sub-node is taken. When I take a look into the class "tx_externalimport_importer", methode "selectNodeWithXpath" returns only the first node item. So, nice to have the xpath to select all sub-nodes, but in this case it's useless.
The second option is to use the "xmlValue" attribute. My sub-nodes are returned "raw", so I have to use a userFunc to get my wanted information out of this. But I don't have the uid of the news, so I have to write a hook (datamapPostProcess) to do the MM-mapping for the categories on my own, isn't it?
I want you to ask, if there is a posibility to extend the importing process by an option to map multiple (sub-)nodes of the same name? This should also an option for normal fields, not only for MM-mappings. See my above sample, there are three nodes called "enclosure": I want to map all values (or better the url-attribute) to the tt_news image field (without FAL). Therefor I have written a userFunc and I could handle all three nodes, but once agein, I get only the first one.
So, what do you think? Am I right or is there something I slipped?
Updated by Francois Suter over 5 years ago
Indeed the scenarios you mention are currently not possible, except with hooks and user functions as you have experimented. I'm not sure how complicated it could be to implement such features. It sure is quite a change of logic.
Your use case raises an interesting point: the user functions are called after the data is mapped. This means that it is not possible to manipulate data before it is mapped. Obviously it is interesting to do it the current way, but the other way would be need too. So in essence we would need a double set of user functions, one called before the mapping and one after.
Updated by André Steiling over 5 years ago
thank you for your fast response! I'm not in the programming / logic of external_import in depth, so it's not really clear to me if calling the user functions before mapping could do the job. This feature might be an option, but the source is the source and TYPO3 is the target. It's not good to change the source data in generally, only transform it into something TYPO3 could handle with, e.g. transforming date formats or RTE needs.
Maybe in case of the "enclosure" nodes it might be possible to extract all nodes before mapping, download the files to upload/pics, get the names and map them to tt_news.image (without FAL).
The MM-mappings should really work without user function or hooks, because of the nature off MM-relation itself! TYPO3 offers you all kind of MM-relations, not only 1-1 mappings - from the importer's viewpoint. MM-relations are also hard stuff, so handling with user functions before or after and hooks could be are really horrible trip for many users.
All methodes which handle the XML import, work with a single object of type DOMElement. Do you think, we could change it to handle an single object and an array of objects? Maybe a news TCA attribute could differ between both modes.
Updated by Francois Suter over 5 years ago
This is definitely not a trivial question and I will be on holidays for 2 weeks starting tonight, so I hope that you are not in too much hurry.A few general things I can say:
- importing images is a specific case, because they need to be fetched from somewhere, referenced by FAL and then related to the imported data. It would be very useful IMO for external_import to be able to handle, but this is probably a lot of work.
- this is related to having support for importing nested structures into IRRE elements, which is also not supported currently. Honestly I don't know how complicated this is, as I have no idea what kind of data structure must be sent to TCEmain to generate IRRE elements. I have never looked into this yet.
- there are many specific use cases when importing data, because there are so many special needs. I tried to build external_import with lots on entry points like hooks and user functions, so that it can be adapted to specific needs without turning the whole thing into an unmanageable mess. So my view is that manipulating the incoming data is perfectly ok.
Updated by Francois Suter over 2 years ago
You never know what life has in store! Funnily enough one month after my last comment, the opportunity arose to implement such a feature. Issue has been ported to: https://github.com/cobwebch/external_import/issues/68